Bibliographic information:
Pengcheng Han (Waseda Univ), Xin He, Takafumi Matsumaru, Vibekananda Dutta (Warsaw Univ of Tech): "Spatio-temporal Transformer with Kolmogorov-Arnold Network for Skeleton-Based Hand Gesture Recognition", Sensors (MDPI) (ISSN 1424-8220), Vol.25, Issue 3, 702 (23 pages), (2025.01.24).
https://doi.org/10.3390/s25030702
https://www.mdpi.com/1424-8220/25/3/702
Abstract:
Manually crafted features often suffer from being subjective, having an inadequate accuracy, or lacking in robustness in recognition. Meanwhile, existing deep learning methods often overlook the structural and dynamic characteristics of the human hand, failing to fully explore the contextual information of joints in both the spatial and temporal domains. To effectively capture dependencies between the hand joints that are not adjacent but may have potential connections, it is essential to learn long-term relationships. This study proposes a skeleton-based hand gesture recognition framework, the ST-KT, a spatio-temporal graph convolution network, and a transformer with the Kolmogorov–Arnold Network (KAN) model. It incorporates spatio-temporal graph convolution network (ST-GCN) modules and a spatio-temporal transformer module with KAN (KAN–Transformer). ST-GCN modules, which include a spatial graph convolution network (SGCN) and a temporal convolution network (TCN), extract primary features from skeleton sequences by leveraging the strength of graph convolutional networks in the spatio-temporal domain. A spatio-temporal position embedding method integrates node features, enriching representations by including node identities and temporal information. The transformer layer includes a spatial KAN–Transformer (S-KT) and a temporal KAN–Transformer (T-KT), which further extract joint features by learning edge weights and node embeddings, providing richer feature representations and the capability for nonlinear modeling. We evaluated the performance of our method on two challenging skeleton-based dynamic gesture datasets: our method achieved an accuracy of 97.5% on the SHREC’17 track dataset and 94.3% on the DHG-14/28 dataset. These results demonstrate that our proposed method, ST-KT, effectively captures dynamic skeleton changes and complex joint relationships.
Keywords:
hand gesture recognition, human–computer interaction (HCI), skeleton based, deep learning, graph convolutional networks, transformer, attention mechanism, feature extraction, continuous hand gesture recognition, Takafumi Matsumaru, Waseda University, Graduate School of Information Production and Systems (IPS), Bio-Robotics and Human-Mechatronics laboratory (BRHM lab),
Others:
((Date)) 2025/02/10
((Copyright)) Bio-Robotics and Human-Mechatronics Laboratory (BRHM lab) (Takafumi Matsumaru Laboratory), Graduate School of Information, Production and Systems (IPS), Waseda University.
https://sem-matsumaru.w.waseda.jp/
https://matsumaru.w.waseda.jp/
Информация по комментариям в разработке