Back to Projects

Home / Projects / Accurate 3D Object Detection with Minimal Labels: A CAM-Guided Weak Supervision Strategy

Associated with University of Sri JayewardenepuraPublished

Accurate 3D Object Detection with Minimal Labels: A CAM-Guided Weak Supervision Strategy

MAY 2024 - MARCH 2025

Accurate 3D Object Detection with Minimal Labels: A CAM-Guided Weak Supervision Strategy

A CAM-guided weak supervision framework for object detection using Grad-CAM and YOLOv8, enabling accurate object localization and detection with minimal labeled data.

About This Project

Accurate 3D object detection is essential in applications such as augmented reality, virtual reality, robotics, and human-computer interaction. Traditional object detection methods depend heavily on large-scale datasets with precise bounding box annotations, a process that is both costly and labor-intensive. This project explores a weakly supervised learning strategy that eliminates the need for manual bounding box annotations by using only image-level labels to train a high-performance object detector.

The pipeline combines EfficientNet-B0 for multi-label classification, Grad-CAM for spatial localization, and YOLOv8 for accurate object detection, trained entirely on pseudo-labels generated without any bounding box supervision.

This project uses the PASCAL VOC 2012 Dataset, a standard benchmark for object detection and segmentation containing 20 object categories with image-level labels, bounding boxes, and segmentation masks. A total of 11,540 images were annotated with valid multi-label vectors for this work.

Key Features

  • Classifies 20 object categories from images using only image-level labels (no bounding boxes)
  • Generates Grad-CAM heatmaps to localize discriminative object regions automatically
  • Converts heatmaps into pseudo bounding box annotations in YOLO format
  • Trains YOLOv8 on these pseudo-labels to perform accurate object detection
  • Achieves competitive precision, recall, and mAP without any manual spatial annotations

Pipeline Overview

StageMethodPurpose
1. ClassificationEfficientNet-B0Multi-label image classification (20 VOC classes)
2. LocalizationGrad-CAMGenerate class-specific heatmaps
3. Pseudo-labelingContour detection + thresholdingConvert heatmaps to YOLO bounding boxes
4. DetectionYOLOv8 (small variant)Train object detector on pseudo-labels
5. EvaluationPrecision, Recall, mAP@0.5, mAP@0.5:0.95Assess detection performance

Final Results

MetricValueDescription
Precision (all classes)0.99 at confidence 1.0High precision across all 20 VOC categories
Recall (all classes)0.89 at confidence 0.0Strong recall across varying thresholds
F1 Score0.71 at confidence 0.439Optimal precision-recall balance
mAP@0.50.727Mean Average Precision at IoU ≥ 0.5
mAP@0.5:0.95~0.45–0.50Averaged across IoU thresholds 0.5–0.95

Publication

AuthorsE. M. P. J. De Saram, R. G. N. Meegama
ConferenceInternational Conference on Advanced Research in Computing (ICARC) 2026

Technologies Used

PythonPyTorchEfficientNet-B0YOLOv8Grad-CAMDeepLabV3OpenCVNumPyPandasMatplotlib