Najm POC – Car Damage Detection

Posted on December 9, 2025 | Computer Vision | Deep Learning

YOLOv12 ResNet34 Segmentation Streamlit Computer Vision

This is a proof-of-concept (POC) for automated car damage detection using instance segmentation and classification. The system first annotates and trains on a dataset of 20,000+ car images, then runs a two-stage inference pipeline: YOLOv12 segments individual car parts and a fine-tuned ResNet34 classifier determines whether each part is damaged. A Streamlit web app ties everything together and exports a PDF report.

1 — Dataset & Annotation (Roboflow)

20k+

Raw car images

Part classes

~180k

Polygon masks (after augmentation)

80/10/10

Train / Val / Test split

All images were sourced and annotated on Roboflow using polygon (instance segmentation) labels for the YOLO task and bounding-box + class labels exported separately for ResNet classification. Each image was labelled with up to 21 car part classes:

#	Part Class	#	Part Class
1	Front Bumper	12	Rear Bumper
2	Front-Left Fender	13	Rear-Right Fender
3	Front-Right Fender	14	Rear-Left Fender
4	Front-Left Door	15	Rear-Right Door
5	Front-Right Door	16	Rear-Left Door
6	Hood	17	Trunk / Boot Lid
7	Roof	18	Windshield (Front)
8	Front-Left Headlight	19	Windshield (Rear)
9	Front-Right Headlight	20	Side Mirror (Left)
10	Tail Light (Left)	21	Side Mirror (Right)
11	Tail Light (Right)

📌 Annotations were done collaboratively on Roboflow. Each annotator drew polygon masks per part; a review pass was done to remove low-quality or ambiguous labels before export.

2 — Data Augmentation

Augmentation was applied inside Roboflow before export and again on-the-fly during training via Albumentations. The goals were to (a) increase dataset diversity, (b) improve robustness to real-world lighting and capture conditions, and (c) balance under-represented part classes.

Augmentation	Parameters	Reason
Horizontal Flip	p = 0.5	Mirrors left/right part asymmetry
Random Rotation	±15°	Non-frontal camera angles
Random Brightness & Contrast	±30%	Different lighting conditions
HSV Hue/Sat Shift	hue ±10, sat ±30%	Car color variety
Mosaic (YOLO)	4-image mosaic	Context-rich scene composition
Random Crop & Resize	Scale 0.5–1.5	Part size variation
Gaussian Blur	kernel 3–7, p = 0.2	Camera focus simulation
Cutout / CoarseDropout	max 8 holes, 32×32 px	Occlusion robustness
JPEG Compression	quality 60–100, p = 0.3	Compression artefact tolerance

Roboflow export settings used for YOLO:

# Roboflow export config
format        : yolov12-seg
resize        : 640 × 640
augment       : True
  flip_horizontal: True
  rotation     : 15
  brightness   : 30
  saturation   : 30
  blur         : 1.5px
  cutout       : 5%
generations   : 3x   # each image → 3 augmented copies
output_split  : 80/10/10

3 — YOLOv12 Segmentation Training

Architecture

We used YOLOv12-m-seg (medium variant) as the base model, pre-trained on COCO. The segmentation head outputs polygon masks per detection, which we use to crop each car part precisely before passing it to ResNet34.

Training command:

yolo segment train \
  model=yolov12m-seg.pt \
  data=najm_parts.yaml \
  epochs=100 \
  imgsz=640 \
  batch=16 \
  lr0=0.01 \
  lrf=0.001 \
  momentum=0.937 \
  weight_decay=0.0005 \
  warmup_epochs=3 \
  cos_lr=True \
  mosaic=1.0 \
  degrees=15 \
  hsv_h=0.015 \
  hsv_s=0.7 \
  hsv_v=0.4 \
  flipud=0.0 \
  fliplr=0.5 \
  project=najm_poc \
  name=yolov12_seg_v1 \
  device=0

Dataset YAML (najm_parts.yaml):

path: /data/najm_parts
train: images/train
val:   images/val
test:  images/test

nc: 21
names:
  - front_bumper
  - front_left_fender
  - front_right_fender
  - front_left_door
  - front_right_door
  - hood
  - roof
  - front_left_headlight
  - front_right_headlight
  - tail_light_left
  - tail_light_right
  - rear_bumper
  - rear_right_fender
  - rear_left_fender
  - rear_right_door
  - rear_left_door
  - trunk
  - windshield_front
  - windshield_rear
  - mirror_left
  - mirror_right

Metric	Value
mAP@50 (seg)	0.82
mAP@50-95 (seg)	0.61
Precision	0.84
Recall	0.79
Best epoch	87 / 100
Training time	~6 hrs on RTX 3090

4 — ResNet34 Classification Training

Training Strategy

Each polygon mask produced by YOLOv12 is used to crop the corresponding part from the original image. These crops (resized to 224×224) are fed into a fine-tuned ResNet34 pre-trained on ImageNet. We replace the final fully-connected layer with a 2-class head (damaged / undamaged) and fine-tune the last two residual blocks along with the new head using a discriminative learning rate schedule.

import torch
import torch.nn as nn
from torchvision import models, transforms

# ── Model ────────────────────────────────────────────────────
model = models.resnet34(pretrained=True)

# Freeze early layers, fine-tune layer3 onward
for name, param in model.named_parameters():
    if "layer1" in name or "layer2" in name:
        param.requires_grad = False

# Replace head for binary classification
model.fc = nn.Sequential(
    nn.Dropout(0.4),
    nn.Linear(model.fc.in_features, 2)   # damaged / undamaged
)

# ── Transforms ───────────────────────────────────────────────
train_tf = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3),
    transforms.RandomRotation(10),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225]),
])

# ── Optimizer & Scheduler ────────────────────────────────────
optimizer = torch.optim.AdamW([
    {"params": model.layer3.parameters(), "lr": 1e-4},
    {"params": model.layer4.parameters(), "lr": 2e-4},
    {"params": model.fc.parameters(),     "lr": 5e-4},
], weight_decay=1e-4)

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer, T_max=30, eta_min=1e-6
)

criterion = nn.CrossEntropyLoss(label_smoothing=0.1)

# ── Training loop (sketch) ────────────────────────────────────
for epoch in range(30):
    model.train()
    for imgs, labels in train_loader:
        imgs, labels = imgs.to(device), labels.to(device)
        optimizer.zero_grad()
        loss = criterion(model(imgs), labels)
        loss.backward()
        optimizer.step()
    scheduler.step()

Metric	Value
Val Accuracy	91.4%
Val F1-Score	0.90
Precision (damaged)	0.89
Recall (damaged)	0.92
Epochs	30
Batch size	64
Loss function	CrossEntropy + label smoothing 0.1
Optimizer	AdamW + cosine annealing

5 — Inference Pipeline

Upload car image via Streamlit

YOLOv12 produces polygon masks per part

Each mask is cropped → ResNet34 classifies damage

Damaged parts overlaid on image + PDF exported

6 — Model Summary

Component	Model	Input	Output
Segmentation	YOLOv12-m-seg	640×640 image	21-class polygon masks
Classification	ResNet34 (fine-tuned)	224×224 part crop	damaged / undamaged

Weights & Resources

🤗

ResNet34 Classification Weights huggingface.co · zulqarnain-kernel/Resnet-classify-demage

🐙

YOLOv12 Segmentation Weights (best.pt) github.com · kernel-loophole/computer-vision

Tech Stack

Roboflow – Dataset annotation, augmentation & export
YOLOv12 – Instance segmentation of 21 car parts
ResNet34 (PyTorch / torchvision) – Transfer-learned damage classifier
Albumentations – On-the-fly augmentation during training
Streamlit – Frontend web application
OpenCV / PIL – Mask cropping & image annotation
ReportLab / FPDF – PDF damage report generation

← Back to all posts