This is a proof-of-concept (POC) for automated car damage detection using instance segmentation and classification. The system first annotates and trains on a dataset of 20,000+ car images, then runs a two-stage inference pipeline: YOLOv12 segments individual car parts and a fine-tuned ResNet34 classifier determines whether each part is damaged. A Streamlit web app ties everything together and exports a PDF report.
All images were sourced and annotated on Roboflow using polygon (instance segmentation) labels for the YOLO task and bounding-box + class labels exported separately for ResNet classification. Each image was labelled with up to 21 car part classes:
| # | Part Class | # | Part Class |
|---|---|---|---|
| 1 | Front Bumper | 12 | Rear Bumper |
| 2 | Front-Left Fender | 13 | Rear-Right Fender |
| 3 | Front-Right Fender | 14 | Rear-Left Fender |
| 4 | Front-Left Door | 15 | Rear-Right Door |
| 5 | Front-Right Door | 16 | Rear-Left Door |
| 6 | Hood | 17 | Trunk / Boot Lid |
| 7 | Roof | 18 | Windshield (Front) |
| 8 | Front-Left Headlight | 19 | Windshield (Rear) |
| 9 | Front-Right Headlight | 20 | Side Mirror (Left) |
| 10 | Tail Light (Left) | 21 | Side Mirror (Right) |
| 11 | Tail Light (Right) |
Augmentation was applied inside Roboflow before export and again on-the-fly during training via Albumentations. The goals were to (a) increase dataset diversity, (b) improve robustness to real-world lighting and capture conditions, and (c) balance under-represented part classes.
| Augmentation | Parameters | Reason |
|---|---|---|
| Horizontal Flip | p = 0.5 | Mirrors left/right part asymmetry |
| Random Rotation | ±15° | Non-frontal camera angles |
| Random Brightness & Contrast | ±30% | Different lighting conditions |
| HSV Hue/Sat Shift | hue ±10, sat ±30% | Car color variety |
| Mosaic (YOLO) | 4-image mosaic | Context-rich scene composition |
| Random Crop & Resize | Scale 0.5–1.5 | Part size variation |
| Gaussian Blur | kernel 3–7, p = 0.2 | Camera focus simulation |
| Cutout / CoarseDropout | max 8 holes, 32×32 px | Occlusion robustness |
| JPEG Compression | quality 60–100, p = 0.3 | Compression artefact tolerance |
Roboflow export settings used for YOLO:
# Roboflow export config
format : yolov12-seg
resize : 640 × 640
augment : True
flip_horizontal: True
rotation : 15
brightness : 30
saturation : 30
blur : 1.5px
cutout : 5%
generations : 3x # each image → 3 augmented copies
output_split : 80/10/10
We used YOLOv12-m-seg (medium variant) as the base model, pre-trained on COCO. The segmentation head outputs polygon masks per detection, which we use to crop each car part precisely before passing it to ResNet34.
Training command:
yolo segment train \
model=yolov12m-seg.pt \
data=najm_parts.yaml \
epochs=100 \
imgsz=640 \
batch=16 \
lr0=0.01 \
lrf=0.001 \
momentum=0.937 \
weight_decay=0.0005 \
warmup_epochs=3 \
cos_lr=True \
mosaic=1.0 \
degrees=15 \
hsv_h=0.015 \
hsv_s=0.7 \
hsv_v=0.4 \
flipud=0.0 \
fliplr=0.5 \
project=najm_poc \
name=yolov12_seg_v1 \
device=0
Dataset YAML (najm_parts.yaml):
path: /data/najm_parts
train: images/train
val: images/val
test: images/test
nc: 21
names:
- front_bumper
- front_left_fender
- front_right_fender
- front_left_door
- front_right_door
- hood
- roof
- front_left_headlight
- front_right_headlight
- tail_light_left
- tail_light_right
- rear_bumper
- rear_right_fender
- rear_left_fender
- rear_right_door
- rear_left_door
- trunk
- windshield_front
- windshield_rear
- mirror_left
- mirror_right
| Metric | Value |
|---|---|
| mAP@50 (seg) | 0.82 |
| mAP@50-95 (seg) | 0.61 |
| Precision | 0.84 |
| Recall | 0.79 |
| Best epoch | 87 / 100 |
| Training time | ~6 hrs on RTX 3090 |
Each polygon mask produced by YOLOv12 is used to crop the corresponding part from the original image. These crops (resized to 224×224) are fed into a fine-tuned ResNet34 pre-trained on ImageNet. We replace the final fully-connected layer with a 2-class head (damaged / undamaged) and fine-tune the last two residual blocks along with the new head using a discriminative learning rate schedule.
import torch
import torch.nn as nn
from torchvision import models, transforms
# ── Model ────────────────────────────────────────────────────
model = models.resnet34(pretrained=True)
# Freeze early layers, fine-tune layer3 onward
for name, param in model.named_parameters():
if "layer1" in name or "layer2" in name:
param.requires_grad = False
# Replace head for binary classification
model.fc = nn.Sequential(
nn.Dropout(0.4),
nn.Linear(model.fc.in_features, 2) # damaged / undamaged
)
# ── Transforms ───────────────────────────────────────────────
train_tf = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3),
transforms.RandomRotation(10),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225]),
])
# ── Optimizer & Scheduler ────────────────────────────────────
optimizer = torch.optim.AdamW([
{"params": model.layer3.parameters(), "lr": 1e-4},
{"params": model.layer4.parameters(), "lr": 2e-4},
{"params": model.fc.parameters(), "lr": 5e-4},
], weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
optimizer, T_max=30, eta_min=1e-6
)
criterion = nn.CrossEntropyLoss(label_smoothing=0.1)
# ── Training loop (sketch) ────────────────────────────────────
for epoch in range(30):
model.train()
for imgs, labels in train_loader:
imgs, labels = imgs.to(device), labels.to(device)
optimizer.zero_grad()
loss = criterion(model(imgs), labels)
loss.backward()
optimizer.step()
scheduler.step()
| Metric | Value |
|---|---|
| Val Accuracy | 91.4% |
| Val F1-Score | 0.90 |
| Precision (damaged) | 0.89 |
| Recall (damaged) | 0.92 |
| Epochs | 30 |
| Batch size | 64 |
| Loss function | CrossEntropy + label smoothing 0.1 |
| Optimizer | AdamW + cosine annealing |
| Component | Model | Input | Output |
|---|---|---|---|
| Segmentation | YOLOv12-m-seg | 640×640 image | 21-class polygon masks |
| Classification | ResNet34 (fine-tuned) | 224×224 part crop | damaged / undamaged |