Accurate 3D Object Detection with Minimal Labels: A CAM-Guided Weak Supervision Strategy

Home / Projects / Accurate 3D Object Detection with Minimal Labels: A CAM-Guided Weak Supervision Strategy

A CAM-guided weak supervision framework for object detection using Grad-CAM and YOLOv8, enabling accurate object localization and detection with minimal labeled data.

About This Project

Accurate 3D object detection is essential in applications such as augmented reality, virtual reality, robotics, and human-computer interaction. Traditional object detection methods depend heavily on large-scale datasets with precise bounding box annotations, a process that is both costly and labor-intensive. This project explores a weakly supervised learning strategy that eliminates the need for manual bounding box annotations by using only image-level labels to train a high-performance object detector.

The pipeline combines EfficientNet-B0 for multi-label classification, Grad-CAM for spatial localization, and YOLOv8 for accurate object detection, trained entirely on pseudo-labels generated without any bounding box supervision.

This project uses the PASCAL VOC 2012 Dataset, a standard benchmark for object detection and segmentation containing 20 object categories with image-level labels, bounding boxes, and segmentation masks. A total of 11,540 images were annotated with valid multi-label vectors for this work.

Key Features

Classifies 20 object categories from images using only image-level labels (no bounding boxes)
Generates Grad-CAM heatmaps to localize discriminative object regions automatically
Converts heatmaps into pseudo bounding box annotations in YOLO format
Trains YOLOv8 on these pseudo-labels to perform accurate object detection
Achieves competitive precision, recall, and mAP without any manual spatial annotations

Pipeline Overview

Stage	Method	Purpose
1. Classification	EfficientNet-B0	Multi-label image classification (20 VOC classes)
2. Localization	Grad-CAM	Generate class-specific heatmaps
3. Pseudo-labeling	Contour detection + thresholding	Convert heatmaps to YOLO bounding boxes
4. Detection	YOLOv8 (small variant)	Train object detector on pseudo-labels
5. Evaluation	Precision, Recall, mAP@0.5, mAP@0.5:0.95	Assess detection performance

Final Results

Metric	Value	Description
Precision (all classes)	0.99 at confidence 1.0	High precision across all 20 VOC categories
Recall (all classes)	0.89 at confidence 0.0	Strong recall across varying thresholds
F1 Score	0.71 at confidence 0.439	Optimal precision-recall balance
mAP@0.5	0.727	Mean Average Precision at IoU ≥ 0.5
mAP@0.5:0.95	~0.45–0.50	Averaged across IoU thresholds 0.5–0.95

Publication


Authors	E. M. P. J. De Saram, R. G. N. Meegama
Conference	International Conference on Advanced Research in Computing (ICARC) 2026

Accurate 3D Object Detection with Minimal Labels: A CAM-Guided Weak Supervision Strategy

About This Project

Key Features

Pipeline Overview

Final Results

Publication

Technologies Used