Hand action recognition in rehabilitation exercise method using R(2+1)D deep learning network and interactive object information

Nguyen Sinh Huy; Le Thi Thu Hong; Nguyen Hoang Bach; Nguyen Chi Thanh; Doan Quang Tu; Truong Van Minh; Vu Hai

doi:10.54939/1859-1043.j.mst.CSCE6.2022.77-91

Authors

Nguyen Sinh Huy (Corresponding Author) Military Information Technology Institute, Academy of Military Science and Technology
Le Thi Thu Hong Military Information Technology Institute, Academy of Military Science and Technology
Nguyen Hoang Bach Military Information Technology Institute, Academy of Military Science and Technology
Nguyen Chi Thanh Military Information Technology Institute, Academy of Military Science and Technology
Doan Quang Tu Military Information Technology Institute, Academy of Military Science and Technology
Truong Van Minh School of Electrical and Electronics Engineering, Hanoi University of Science and Technology
Vu Hai School of Electrical and Electronics Engineering, Hanoi University of Science and Technology

DOI:

https://doi.org/10.54939/1859-1043.j.mst.CSCE6.2022.77-91

Keywords:

Hand action recognition ; Rehabilitation exercises; Object detection and tracking; R(2 1)D

Abstract

Hand action recognition in rehabilitation exercises is to automatically recognize what exercises the patient has done. This is an important step in an AI system to assist doctors to handle, monitor and assess the patient’s rehabilitation. The expected system uses videos obtained from the patient's body-worn camera to recognize hand action automatically. In this paper, we propose a model to recognize the patient's hand action in rehabilitation exercises, which is a combination of the results of a deep learning network recognizing actions on Video RGB, R(2+1)D, and a main interactive object in the exercises detection algorithm. The proposed model is implemented, trained, and tested on a dataset of rehabilitation exercises collected from wearable cameras of patients. The experimental results show that the accuracy in exercise recognition is practicable, averaging 88.43% on the test data independent of the training data. The action recognition results of the proposed method outperform the results of a single R(2+1)D network. Furthermore, the better results show the reduced rate of confusion between exercises with similar hand gestures. They also prove that the combination of interactive object information and the action recognition improve the accuracy significantly.

References

[1]. Fathi, A., Farhadi, A. and Rehg, J.M. “Understanding egocentric activities”. In 2011 international conference on computer vision (pp. 407-414). IEEE, (2011). DOI: https://doi.org/10.1109/ICCV.2011.6126269

[2]. Fathi, A., Li, Y. and Rehg, J. M. “Learning to recognize daily actions using gaze”. In European Conference on Computer Vision (pp. 314-327). Springer, Berlin, Heidelberg, (2012). DOI: https://doi.org/10.1007/978-3-642-33718-5_23

[3]. Fathi, A., Ren, X. and Rehg, J. M. “Learning to recognize objects in egocentric activities”. In CVPR 2011 (pp. 3281-3288). IEEE, (2011). DOI: https://doi.org/10.1109/CVPR.2011.5995444

[4]. Li, Y., Ye, Z. and Rehg, J.M. “Delving into egocentric actions”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 287-295), (2015). DOI: https://doi.org/10.1109/CVPR.2015.7298625

[5]. McCandless, T. and Grauman, K. “Object-Centric Spatio-Temporal Pyramids for Egocentric Activity Recognition”. In BMVC (Vol. 2, p. 3), (2013). DOI: https://doi.org/10.5244/C.27.30

[6]. Pirsiavash, H. and Ramanan, D. “Detecting activities of daily living in first-person camera views”. In 2012 IEEE conference on computer vision and pattern recognition (pp. 2847-2854). IEEE, (2012). DOI: https://doi.org/10.1109/CVPR.2012.6248010

[7]. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y. and Paluri, M. “A closer look at spatiotemporal convolutions for action recognition”. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6450-6459), (2018). DOI: https://doi.org/10.1109/CVPR.2018.00675

[8]. Hara, K., Kataoka, H. and Satoh, Y. “Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?” In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6546-6555), (2018). DOI: https://doi.org/10.1109/CVPR.2018.00685

[9]. Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X. and Van Gool, L. Temporal segment networks for action recognition in videos. IEEE transactions on pattern analysis and machine intelligence, 41(11), pp.2740-2755, (2018). DOI: https://doi.org/10.1109/TPAMI.2018.2868668

[10]. Bochkovskiy, A., Wang, C.Y. and Liao, H.Y.M. “Yolov4: Optimal speed and accuracy of object detection”. arXiv preprint arXiv:2004.10934, (2020).

[11]. Sinh Huy Nguyen, Hoang Bach Nguyen, Thi Thu Hong Le, Chi Thanh Nguyen, Van Loi Nguyen, Hai Vu, "Hand Tracking and Identifying in the Egocentric Video Using a Graph-Based Algorithm,” In Proceeding of the 2022 International Conference on Communications and Electronics (ICCE 2022).

Hand action recognition in rehabilitation exercise method using R(2+1)D deep learning network and interactive object information

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

ISSN: 1859-1043

Language

Make a Submission

Indexed by

Information

Visitors

GTM