Hand gesture recognition approaches have recently achieved promising results by extracting spatial and temporal features. However, existing methods often miss the relationships between distant joints and are hindered by high computational costs. In this paper, we introduce a novel graph-based model for hand gesture recognition that overcomes these limitations by efficiently capturing dependencies between distant joints within a unified semantic space and requiring less computation time. Our model consists of three main components: the Mixture of Experts (MoE) framework, which dynamically selects specialized sub-models focused on joints, bones, and motion features to optimize performance and efficiency; the Graph Hierarchical Edge Representation (GHER), which captures multi-scale relationships between joints and models both local and global interactions within the hand skeleton; and the Light Graph Temporal Fusion Transformer (LGTFT), which integrates an attention-based graph transformer with a lightweight Gated Recurrent Unit (GRU) to efficiently capture temporal dynamics, while reducing computational costs. We validate the proposed model through extensive experiments on three benchmark hand gesture recognition datasets: IPN, SHREC’17, and Briareo. Our approach achieves superior results when compared to state-of-the-art methods, while reducing computational time.

Optimizing Hand Gesture Recognition With an Accurate Graph Model and Mixture of Experts for Joint Relationships and Temporal Dynamics / Youssef Memmi, Mohamed; Slama, Rim; Berretti, Stefano. - In: IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE. - ISSN 2637-6407. - STAMPA. - 7:(2025), pp. 780-793. [10.1109/tbiom.2025.3596014]

Optimizing Hand Gesture Recognition With an Accurate Graph Model and Mixture of Experts for Joint Relationships and Temporal Dynamics

Berretti, Stefano
2025

Abstract

Hand gesture recognition approaches have recently achieved promising results by extracting spatial and temporal features. However, existing methods often miss the relationships between distant joints and are hindered by high computational costs. In this paper, we introduce a novel graph-based model for hand gesture recognition that overcomes these limitations by efficiently capturing dependencies between distant joints within a unified semantic space and requiring less computation time. Our model consists of three main components: the Mixture of Experts (MoE) framework, which dynamically selects specialized sub-models focused on joints, bones, and motion features to optimize performance and efficiency; the Graph Hierarchical Edge Representation (GHER), which captures multi-scale relationships between joints and models both local and global interactions within the hand skeleton; and the Light Graph Temporal Fusion Transformer (LGTFT), which integrates an attention-based graph transformer with a lightweight Gated Recurrent Unit (GRU) to efficiently capture temporal dynamics, while reducing computational costs. We validate the proposed model through extensive experiments on three benchmark hand gesture recognition datasets: IPN, SHREC’17, and Briareo. Our approach achieves superior results when compared to state-of-the-art methods, while reducing computational time.
2025
7
780
793
Goal 9: Industry, Innovation, and Infrastructure
Youssef Memmi, Mohamed; Slama, Rim; Berretti, Stefano
File in questo prodotto:
File Dimensione Formato  
tbiom_hand_2025.pdf

Accesso chiuso

Tipologia: Pdf editoriale (Version of record)
Licenza: Tutti i diritti riservati
Dimensione 6.14 MB
Formato Adobe PDF
6.14 MB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1436381
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact