CNN-LSTM Model for Recognizing Video-Recorded Actions Performed in a Traditional Chinese Exercise

Author: Jing Chen1, Jiping Wang2, Qun Yuan3, Zhao Yang3
1 School of Electronic and Information EngineeringSuzhou University of Science and TechnologySuzhou 215009 China.
2 Suzhou Institute of Biomedical Engineering and TechnologySuzhou 215000 China.
3 Department of Respiratory MedicineSuzhou Hospital, Affiliated Hospital of Medical School, Nanjing UniversitySuzhou 215163 China.
Conference/Journal: IEEE J Transl Eng Health Med
Date published: 2023 Jun 2
Other: Volume ID: 11 , Pages: 351-359 , Special Notes: doi: 10.1109/JTEHM.2023.3282245. , Word Count: 284

Identifying human actions from video data is an important problem in the fields of intelligent rehabilitation assessment. Motion feature extraction and pattern recognition are the two key procedures to achieve such goals. Traditional action recognition models are usually based on the geometric features manually extracted from video frames, which are however difficult to adapt to complex scenarios and cannot achieve high-precision recognition and robustness. We investigate a motion recognition model and apply it to recognize the sequence of complicated actions of a traditional Chinese exercise (ie, Baduanjin). We first developed a combined convolutional neural network (CNN) and long short-term memory (LSTM) model for recognizing the sequence of actions captured in video frames, and applied it to recognize the actions of Baduanjin. Moreover, this method has been compared with the traditional action recognition model based on geometric motion features in which Openpose is used to identify the joint positions in the skeletons. Its performance of high recognition accuracy has been verified on the testing video dataset, containing the video clips from 18 different practicers. The CNN-LSTM recognition model achieved 96.43% accuracy on the testing set; while those manually extracted features in the traditional action recognition model were only able to achieve 66.07% classification accuracy on the testing video dataset. The abstract image features extracted by the CNN module are more effective on improving the classification accuracy of the LSTM model. The proposed CNN-LSTM based method can be a useful tool in recognizing the complicated actions.

Keywords: Action recognition; CNN; Clinical and Translational Impact Statement-The proposed algorithm can recognize the complicated actions in rehabilitation training and thus has the potential to realize intelligent rehabilitation assessment for home applications; LSTM; geometric feature extraction; video processing.

PMID: 37435544 PMCID: PMC10332470 DOI: 10.1109/JTEHM.2023.3282245