Skip to Main content Skip to Navigation
Journal articles

Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos

Abstract : Two-dimensional (2D) multi-person pose estimation and three-dimensional (3D) root-relative pose estimation from a monocular RGB camera have made significant progress recently. Yet, real-world applications require depth estimations and the ability to determine the distances between people in a scene. Therefore, it is necessary to recover the 3D absolute poses of several people. However, this is still a challenge when using cameras from single points of view. Furthermore, the previously proposed systems typically required a significant amount of resources and memory. To overcome these restrictions, we herein propose a real-time framework for multi-person 3D absolute pose estimation from a monocular camera, which integrates a human detector, a 2D pose estimator, a 3D root-relative pose reconstructor, and a root depth estimator in a top-down manner. The proposed system, called Root-GAST-Net, is based on modified versions of GAST-Net and RootNet networks. The efficiency of the proposed Root-GAST-Net system is demonstrated through quantitative and qualitative evaluations on two benchmark datasets, Human3.6M and MuPoTS-3D. On all evaluated metrics, our experimental results on the MuPoTS-3D dataset outperform the current state-of-the-art by a significant margin, and can run in real-time at 15 fps on the Nvidia GeForce GTX 1080.
Complete list of metadata
Contributor : Vincent BARRA Connect in order to contact the contributor
Submitted on : Wednesday, June 1, 2022 - 4:12:29 PM
Last modification on : Thursday, June 2, 2022 - 3:37:58 AM

Links full text



Amal El Kaid, Denis Brazey, Vincent Barra, Karim Baïna. Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos. Sensors, MDPI, 2022, 22 (11), pp.4109. ⟨10.3390/s22114109⟩. ⟨hal-03684802⟩



Record views