Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos

Amal El Kaid; Denis Brazey; Vincent Barra; Karim Baïna

doi:10.3390/s22114109

Article Dans Une Revue Sensors Année : 2022

Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos

(1, 2) , (3) , (1) , (2)

1
2
3

Amal El Kaid

Fonction : Auteur
PersonId : 827929
ORCID : 0000-0003-0605-8919

Laboratoire d'Informatique, de Modélisation et d'Optimisation des Systèmes

Alqualsadi Research Team

Denis Brazey

Fonction : Auteur
PersonId : 802098
ORCID : 0000-0001-8880-7826

Société Prynel

Vincent Barra

Fonction : Auteur
PersonId : 171668
IdHAL : vincent-barra
ORCID : 0000-0002-8975-222X
IdRef : 150156243

Laboratoire d'Informatique, de Modélisation et d'Optimisation des Systèmes

Karim Baïna

Fonction : Auteur
PersonId : 755744
ORCID : 0000-0002-4736-1079

Alqualsadi Research Team

Résumé

Two-dimensional (2D) multi-person pose estimation and three-dimensional (3D) root-relative pose estimation from a monocular RGB camera have made significant progress recently. Yet, real-world applications require depth estimations and the ability to determine the distances between people in a scene. Therefore, it is necessary to recover the 3D absolute poses of several people. However, this is still a challenge when using cameras from single points of view. Furthermore, the previously proposed systems typically required a significant amount of resources and memory. To overcome these restrictions, we herein propose a real-time framework for multi-person 3D absolute pose estimation from a monocular camera, which integrates a human detector, a 2D pose estimator, a 3D root-relative pose reconstructor, and a root depth estimator in a top-down manner. The proposed system, called Root-GAST-Net, is based on modified versions of GAST-Net and RootNet networks. The efficiency of the proposed Root-GAST-Net system is demonstrated through quantitative and qualitative evaluations on two benchmark datasets, Human3.6M and MuPoTS-3D. On all evaluated metrics, our experimental results on the MuPoTS-3D dataset outperform the current state-of-the-art by a significant margin, and can run in real-time at 15 fps on the Nvidia GeForce GTX 1080.

Mots clés

3D multi-person pose estimation absolute poses camera-centric coordinates computer vision artificial intelligence deep-learning

Domaines

Apprentissage [cs.LG] Traitement du signal et de l'image [eess.SP]

Fichier principal

sensors-22-04109-v3.pdf (3.13 Mo)

Origine : Publication financée par une institution
Licence : CC BY - Paternité

Vincent BARRA : Connectez-vous pour contacter le contributeur

https://uca.hal.science/hal-03684802

Soumis le : jeudi 8 février 2024-08:45:00

Dernière modification le : lundi 12 février 2024-11:58:21

Dates et versions

hal-03684802 , version 1 (08-02-2024)

Licence

Paternité

Identifiants

HAL Id : hal-03684802 , version 1
DOI : 10.3390/s22114109

Citer

Amal El Kaid, Denis Brazey, Vincent Barra, Karim Baïna. Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos. Sensors, 2022, 22 (11), pp.4109. ⟨10.3390/s22114109⟩. ⟨hal-03684802⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

PRES_CLERMONT CNRS LIMOS CLERMONT-AUVERGNE-INP

82 Consultations

0 Téléchargements

Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager