Enhanced Weakly Supervised Learning for 3D Hand Pose Estimation

Home / Projects / Enhanced Weakly Supervised Learning for 3D Hand Pose Estimation

An enhanced weakly supervised learning framework using EfficientNet-B0 with a regression head for accurate 3D hand joint prediction from RGB images, effectively leveraging limited annotated data.

About This Project

Accurate 3D hand pose estimation plays a vital role in applications such as augmented reality, virtual reality, robotics, and human-computer interaction. This project explores weakly supervised learning as an alternative to traditional fully supervised approaches, which require large amounts of expensive and time-consuming 3D annotations.

This project develops a hybrid deep learning pipeline that estimates 3D hand joint positions from standard RGB images using minimal labeled data, combining EfficientNet-B0 for spatial feature extraction, pseudo-labeling for weak supervision, and an LSTM network for temporal motion refinement across video frames.

This project uses the FreiHAND Dataset, a benchmark for 3D hand pose estimation from single color images, containing 3,960 evaluation samples with RGB images, hand scale, and camera intrinsics.

Key Features

Predicts the 3D coordinates of 21 hand keypoints from a single RGB image
Proposed a hybrid architecture combining EfficientNet-B0 with a regression head for accurate joint prediction.
Integrated LSTM-based temporal modeling to capture sequential hand motion dynamics.
Improved performance under limited annotation scenarios through efficient feature learning strategies.
Designed a scalable pipeline suitable for real-world applications with minimal supervision.

Models Used

Model	Role
ResNet18	Baseline regression model
EfficientNet-B0	Enhanced backbone with pseudo-labeling
DeepLabV3 (ResNet-101)	Segmentation mask generation
LSTM (2-layer)	Temporal motion modeling across video frames

Final Results

Metric	Value	Description
MPJPE	0.1126	Mean joint position error
PCK@0.05	14.39%	Keypoints within 5% of hand size from ground truth
Inference Speed	27.19 ms/frame	Real-time capable
LSTM Loss	0.0006879	Final temporal model evaluation loss
LSTM MAE	0.0189	Mean absolute error on keypoint predictions

Publication


Authors	E. M. P. J. De Saram, R. G. N. Meegama
Conference	6th International Conference on Advanced Research in Computing (ICARC) 2026

Enhanced Weakly Supervised Learning for 3D Hand Pose Estimation

About This Project

Key Features

Models Used

Final Results

Publication

Technologies Used