Najm POC – Car Damage Detection

Posted on December 9, 2025  |  Computer Vision  |  Deep Learning

YOLOv12 ResNet34 Segmentation Streamlit Computer Vision

This is a proof-of-concept (POC) for automated car damage detection using instance segmentation and classification. The system first annotates and trains on a dataset of 20,000+ car images, then runs a two-stage inference pipeline: YOLOv12 segments individual car parts and a fine-tuned ResNet34 classifier determines whether each part is damaged. A Streamlit web app ties everything together and exports a PDF report.

1 — Dataset & Annotation (Roboflow)

20k+
Raw car images
21
Part classes
~180k
Polygon masks (after augmentation)
80/10/10
Train / Val / Test split

All images were sourced and annotated on Roboflow using polygon (instance segmentation) labels for the YOLO task and bounding-box + class labels exported separately for ResNet classification. Each image was labelled with up to 21 car part classes:

# Part Class # Part Class
1 Front Bumper 12 Rear Bumper
2 Front-Left Fender 13 Rear-Right Fender
3 Front-Right Fender 14 Rear-Left Fender
4 Front-Left Door 15 Rear-Right Door
5 Front-Right Door 16 Rear-Left Door
6 Hood 17 Trunk / Boot Lid
7 Roof 18 Windshield (Front)
8 Front-Left Headlight 19 Windshield (Rear)
9 Front-Right Headlight 20 Side Mirror (Left)
10 Tail Light (Left) 21 Side Mirror (Right)
11 Tail Light (Right)
📌 Annotations were done collaboratively on Roboflow. Each annotator drew polygon masks per part; a review pass was done to remove low-quality or ambiguous labels before export.

2 — Data Augmentation

Augmentation was applied inside Roboflow before export and again on-the-fly during training via Albumentations. The goals were to (a) increase dataset diversity, (b) improve robustness to real-world lighting and capture conditions, and (c) balance under-represented part classes.

Augmentation Parameters Reason
Horizontal Flip p = 0.5 Mirrors left/right part asymmetry
Random Rotation ±15° Non-frontal camera angles
Random Brightness & Contrast ±30% Different lighting conditions
HSV Hue/Sat Shift hue ±10, sat ±30% Car color variety
Mosaic (YOLO) 4-image mosaic Context-rich scene composition
Random Crop & Resize Scale 0.5–1.5 Part size variation
Gaussian Blur kernel 3–7, p = 0.2 Camera focus simulation
Cutout / CoarseDropout max 8 holes, 32×32 px Occlusion robustness
JPEG Compression quality 60–100, p = 0.3 Compression artefact tolerance

Roboflow export settings used for YOLO:

# Roboflow export config
format        : yolov12-seg
resize        : 640 × 640
augment       : True
  flip_horizontal: True
  rotation     : 15
  brightness   : 30
  saturation   : 30
  blur         : 1.5px
  cutout       : 5%
generations   : 3x   # each image → 3 augmented copies
output_split  : 80/10/10

3 — YOLOv12 Segmentation Training

Architecture

We used YOLOv12-m-seg (medium variant) as the base model, pre-trained on COCO. The segmentation head outputs polygon masks per detection, which we use to crop each car part precisely before passing it to ResNet34.

Training command:

yolo segment train \
  model=yolov12m-seg.pt \
  data=najm_parts.yaml \
  epochs=100 \
  imgsz=640 \
  batch=16 \
  lr0=0.01 \
  lrf=0.001 \
  momentum=0.937 \
  weight_decay=0.0005 \
  warmup_epochs=3 \
  cos_lr=True \
  mosaic=1.0 \
  degrees=15 \
  hsv_h=0.015 \
  hsv_s=0.7 \
  hsv_v=0.4 \
  flipud=0.0 \
  fliplr=0.5 \
  project=najm_poc \
  name=yolov12_seg_v1 \
  device=0

Dataset YAML (najm_parts.yaml):

path: /data/najm_parts
train: images/train
val:   images/val
test:  images/test

nc: 21
names:
  - front_bumper
  - front_left_fender
  - front_right_fender
  - front_left_door
  - front_right_door
  - hood
  - roof
  - front_left_headlight
  - front_right_headlight
  - tail_light_left
  - tail_light_right
  - rear_bumper
  - rear_right_fender
  - rear_left_fender
  - rear_right_door
  - rear_left_door
  - trunk
  - windshield_front
  - windshield_rear
  - mirror_left
  - mirror_right
Metric Value
mAP@50 (seg) 0.82
mAP@50-95 (seg) 0.61
Precision 0.84
Recall 0.79
Best epoch 87 / 100
Training time ~6 hrs on RTX 3090

4 — ResNet34 Classification Training

Training Strategy

Each polygon mask produced by YOLOv12 is used to crop the corresponding part from the original image. These crops (resized to 224×224) are fed into a fine-tuned ResNet34 pre-trained on ImageNet. We replace the final fully-connected layer with a 2-class head (damaged / undamaged) and fine-tune the last two residual blocks along with the new head using a discriminative learning rate schedule.

import torch
import torch.nn as nn
from torchvision import models, transforms

# ── Model ────────────────────────────────────────────────────
model = models.resnet34(pretrained=True)

# Freeze early layers, fine-tune layer3 onward
for name, param in model.named_parameters():
    if "layer1" in name or "layer2" in name:
        param.requires_grad = False

# Replace head for binary classification
model.fc = nn.Sequential(
    nn.Dropout(0.4),
    nn.Linear(model.fc.in_features, 2)   # damaged / undamaged
)

# ── Transforms ───────────────────────────────────────────────
train_tf = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3),
    transforms.RandomRotation(10),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225]),
])

# ── Optimizer & Scheduler ────────────────────────────────────
optimizer = torch.optim.AdamW([
    {"params": model.layer3.parameters(), "lr": 1e-4},
    {"params": model.layer4.parameters(), "lr": 2e-4},
    {"params": model.fc.parameters(),     "lr": 5e-4},
], weight_decay=1e-4)

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer, T_max=30, eta_min=1e-6
)

criterion = nn.CrossEntropyLoss(label_smoothing=0.1)

# ── Training loop (sketch) ────────────────────────────────────
for epoch in range(30):
    model.train()
    for imgs, labels in train_loader:
        imgs, labels = imgs.to(device), labels.to(device)
        optimizer.zero_grad()
        loss = criterion(model(imgs), labels)
        loss.backward()
        optimizer.step()
    scheduler.step()
Metric Value
Val Accuracy 91.4%
Val F1-Score 0.90
Precision (damaged) 0.89
Recall (damaged) 0.92
Epochs 30
Batch size 64
Loss function CrossEntropy + label smoothing 0.1
Optimizer AdamW + cosine annealing

5 — Inference Pipeline

01
Upload car image via Streamlit
02
YOLOv12 produces polygon masks per part
03
Each mask is cropped → ResNet34 classifies damage
04
Damaged parts overlaid on image + PDF exported

6 — Model Summary

Component Model Input Output
Segmentation YOLOv12-m-seg 640×640 image 21-class polygon masks
Classification ResNet34 (fine-tuned) 224×224 part crop damaged / undamaged

Weights & Resources

🤗 🐙

Tech Stack

← Back to all posts