Hand gesture recognition approaches have recently achieved promising results by extracting spatial and temporal features. However, existing methods often miss the relationships between distant joints and are hindered by high computational costs. In this paper, we introduce a novel graph-based model for hand gesture recognition that overcomes these limitations by efficiently capturing dependencies between distant joints within a unified semantic space and requiring less computation time. Our model consists of three main components: the Mixture of Experts (MoE) framework, which dynamically selects specialized sub-models focused on joints, bones, and motion features to optimize performance and efficiency; the Graph Hierarchical Edge Representation (GHER), which captures multi-scale relationships between joints and models both local and global interactions within the hand skeleton; and the Light Graph Temporal Fusion Transformer (LGTFT), which integrates an attention-based graph transformer with a lightweight Gated Recurrent Unit (GRU) to efficiently capture temporal dynamics, while reducing computational costs. We validate the proposed model through extensive experiments on three benchmark hand gesture recognition datasets: IPN, SHREC’17, and Briareo. Our approach achieves superior results when compared to state-of-the-art methods, while reducing computational time.
Optimizing Hand Gesture Recognition With an Accurate Graph Model and Mixture of Experts for Joint Relationships and Temporal Dynamics / Youssef Memmi, Mohamed; Slama, Rim; Berretti, Stefano. - In: IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE. - ISSN 2637-6407. - STAMPA. - 7:(2025), pp. 780-793. [10.1109/tbiom.2025.3596014]
Optimizing Hand Gesture Recognition With an Accurate Graph Model and Mixture of Experts for Joint Relationships and Temporal Dynamics
Berretti, Stefano
2025
Abstract
Hand gesture recognition approaches have recently achieved promising results by extracting spatial and temporal features. However, existing methods often miss the relationships between distant joints and are hindered by high computational costs. In this paper, we introduce a novel graph-based model for hand gesture recognition that overcomes these limitations by efficiently capturing dependencies between distant joints within a unified semantic space and requiring less computation time. Our model consists of three main components: the Mixture of Experts (MoE) framework, which dynamically selects specialized sub-models focused on joints, bones, and motion features to optimize performance and efficiency; the Graph Hierarchical Edge Representation (GHER), which captures multi-scale relationships between joints and models both local and global interactions within the hand skeleton; and the Light Graph Temporal Fusion Transformer (LGTFT), which integrates an attention-based graph transformer with a lightweight Gated Recurrent Unit (GRU) to efficiently capture temporal dynamics, while reducing computational costs. We validate the proposed model through extensive experiments on three benchmark hand gesture recognition datasets: IPN, SHREC’17, and Briareo. Our approach achieves superior results when compared to state-of-the-art methods, while reducing computational time.| File | Dimensione | Formato | |
|---|---|---|---|
|
tbiom_hand_2025.pdf
Accesso chiuso
Tipologia:
Pdf editoriale (Version of record)
Licenza:
Tutti i diritti riservati
Dimensione
6.14 MB
Formato
Adobe PDF
|
6.14 MB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



