Artificial Intelligence (AI) Powered Precise Classification of Recuperation Exercises for Musculoskeletal Disorders

Artificial Intelligence (AI) Powered Precise Classification of Recuperation Exercises for Musculoskeletal Disorders

Dilliraj Ekambaram Vijayakumar Ponnusamy* Suresh Thevarayan Natarajan Mariyam Farzana Subhan Firos Khan

Department of ECE, SRM Institute of Science and Technology, SRM Nagar, Kattankulathur 603203, Chengalpattu, Chennai, Tamil Nadu, India

SRM College of Physiotherapy, Faculty of Medicine Health and Science, SRM Nagar, SRM Institute of Science and Technology, SRM Nagar, Kattankulathur 603203, Chengalpattu, Chennai, Tamil Nadu, India

Corresponding Author Email:
6 October 2022
2 December 2022
10 December 2022
Available online: 
30 April 2023
| Citation



Musculoskeletal pain is one of the significant health issues faced by the Information Technology (IT) industries and health-care professional personnel. The current IT sector requires people working on sitting in one place for long hours (~3-4 hours). This causes severe hip, neck, and shoulder pain and may lead to paralysis. Convergence of a three-dimensional (3D) image into a plane-based projection to precisely classify the trunk extension and flexion, wrist extension and flexion exercises posture images. Because the predictions in different planes are incorrectly detected during the convergence process, a deep learning algorithm is a superior technique for improving recognition accuracy and processing speed. 200 image datasets of the wrist, trunk extension, and flexion exercise posture are created at various planes. The proposed deep learning algorithm performance is compared with CNN with accelerometer sensing image data, DNN with RGB images, CNN-GRU with Kinect Depth images, Deep Hybrid CNN with body portion keyframe images, Spatial Transform Networks (STN) with attention-based multi-scale CNN with Grad CAM images. The observation demonstrates the efficiency of these systems in musculoskeletal rehabilitation therapy in that the suggested deep learning-based system successfully identifies the completion of rehabilitation activities with a recognition of training accuracy of 98.12% and validation accuracy of 95%. Our approach can track and enhance the efficiency of patients' rehabilitation training with greater satisfactory precision than some other cutting-edge conventional CNN-based baseline architecture.


Artificial Intelligence (AI), deep neural network, work-related musculoskeletal disorder, home-based recuperation training, upper limb exercise

1. Introduction

In a developing country like India, there will be rapid growth in all sectors, and the number of people employed in IT industries will increase tremendously. IT workers are the most marginalized section of society and are vulnerable to exploitation. Even in developed countries, the safety and health of these workers are at stake. In rural areas, personnel, still rehabilitation training facilities, are unreachable. This causes the prevalence of MSD to be wider across the globe [1]. The most significant health issue faced by IT sector personnel is WMSD. It affects the muscle, ligaments, and nerve systems of humans. The personnel in the IT sector have their pattern of work in stable body gestures and insufficient movements of smaller body parts like the wrist and fingers. This way of working style makes the person experience symptoms like stiffness of joints, redness, and swelling in the affected areas of body parts, the surface color of a human also may change, and it may cause the reduction of sweat flowing [2]. Also, psychology is like social risk factors Work-related stress, lack of support from colleagues or Managers, excessive mental workload, and lack of recognition Jobs done are add-ons in trigger- Development of WMSDs in this field. The majority of these disorders may lead to a neurological disorder named Parkinson's disease. It is a rapidly increasing disease across IT sector personnel [3].

The average incidence of work-related musculoskeletal disorders overall is 1.13% cases out of 0.1 million workers in the industry, 1.83% in construction segments, and 1.5% cases of personnel in human health and social work activities during 2018/19 [4]. A recent survey was done by the Tamil Nadu physical education and sports University with 296 participants by health care professionals. Approximately 156/296 personnel are facing uncomfortable working postures, which leads to a high-risk factor of WMSD. 205/296 reported low back pain, and 137/296 were required to consult the surgeons for future treatment [5]. Another study based on the region surat, with 100 subjects, estimated the scapular asymmetry among the multinational corporation personnel, out of which 58 subjects are affected by scapular asymmetry, which may lead to MSD [6].

The methods implemented in past studies to identify and analyze the exercise posture like the Inertial Measurement Unit (IMU) sensory-based system and accelerometer-based, Kinect-based, and RGB camera images. There are still some issues with intelligent application sensitivity tools for upper limb rehabilitation [7]. Image-based analysis with a deep neural network, GRU, and attention-based multi-scale CNN [8-10] are the methods implied in the existing methods for image classification of exercise poses. Compared to many types of sensors, single sensor data has problems like significant error, noise, and misidentification with lesser accuracy is achieved. However, self-monitored patient rehab training is still a complex task [11, 12]. The surface EMG signal-based upper limb rehabilitation action recognition approach extracts the feature method in the order of time-domain, frequency-domain, time-frequency, and entropy through physiological signals [13, 14]. Figure 1 shows the various deep learning techniques used in image-based exercise pose analysis.

Figure 1. Various existing deep learning methods used for Exercise pose classification

Our contributions to this article are deliberated as follows:

  • This article uses modern cutting-edge technology to provide comfort for the intelligent rehabilitation of MSD patients in the current COVID-19 situation.

  • RGB image analysis with the Grad-CAM approach provides lesser accuracy. Most existing image-based analysis methods require external hardware to classify the exercise pose. In the proposed system, RGB images taken from any camera are directly analyzed and provide the classification result.

  • Various exercise pose images with different plane projections is a new approach proposed in this work to classify the various exercise pose images in different projection angles.

  • This model can classify and evaluate the different rehabilitation activities with minimal time.

Newer advanced options, such as connected care and artificial intelligence, 3D printing, and prosthetics that mimic human movement, are encouraging. In the case of patients with musculoskeletal problems, the availability of data at both the micro and macro levels can serve as a catalyst for tailored care based on behavioral, cultural, genetic, and psychological needs [15]. Care providers would benefit significantly from practical algorithms paired with biomarkers that can pinpoint precisely when and how to intervene. Exoskeletons, 3D printing, and virtual rehabilitation all promise to provide patients with precise, low-cost interventions, but they are still in the early stages of development. Modern simulators, tools, and systems are now part of physical rehabilitation technologies, such as microprocessor-based and computer-assisted equipment with real-time monitoring [16]. Robotic rehabilitation devices are gaining popularity in dealing with injuries to the limbs. An important area of study concerns the characteristics of various devices that make them useful in the event of extreme trauma [17]. Mechanotherapy devices are effective when used correctly and consistently.

The following outline constitutes the paper's organizational structure: The literature associated with the AI-based system for exercise therapy is discussed in Section 2. Section 3 will discuss the proposed methodology, which includes data augmentation and pre-processing stages, hyperparameter tweaking, and a hybrid approach to deep learning for training the model. The findings and discussion of the examination of various rehabilitative activities are presented in the fourth section. In the fifth and final sections, we present our conclusion and make recommendations for future investigations.

2. Related Works

An accelerometer worn by the person was used to track their mobility and record data about their upper limbs. Twelve complete exercise positions and twelve incomplete ones were employed to evaluate 24 models [1]. Kinect software development kit for 3D automated joint evaluation. Standard deviations of the mean absolute error range from ±5.6 deg, ±5.1 deg to ±8.5 deg and ±8.1 deg [2]. An RGB-Depth camera-based automated system is developed to detect and identify activities involving the upper limbs [6]. Exercise therapy is a very cost-effective and more convenient rehabilitation technique in the current scenario for WMSD prescribed by most physiotherapists. An intelligent fitness system that keeps track of your activity levels, repetitions, and overall workout time. Built-it-yourself data collection, which had 420 samples in total [8].

Most recent research provides excellent accuracy in assessing the exercise pose; most works use the sensory-based system. Acceleration, tilt, vibration, rotation, and degrees of freedom of movement can all be detected and measured by the passive sensor device [10]. In existing methods, various approaches were used to recognize the exercise poses mentioned in Table 1.

Table 1. Various existing approaches in image-based exercise pose analysis


Purpose/ Objective of work


Parameters measured

Achieved Outcome


To help the patient for analyzing each of the twelve hand and arm rehabilitation exercises has been completed properly by the user.

CNN with 15 layers from the accelerometer sensing data

Wrist extension, Wrist flexion, Pronation, Supination, Radial deviation, Ulnar deviation, Arm flexion, Arm extension, Arm movement toward the chest, Arm movement away from the trunk, Shoulder abduction, Shoulder adduction.

An accuracy of 98.61% was achieved through the data collected from the accelerometer sensor.


This system can monitor the type, number, and period of actions.


Body movements selected for identification included four types of dumbbell movements and three types of leg exercises: dumbbell curling, side lifting, dumbbell shoulder press, dumbbell flying, sitting posture, calf raising, standing posture, calf raising, and heel raising.

Deep neural network learns well on small datasets, achieving 97.61% action recognition accuracy, and SVM achieves over 96% recognition accuracy.


The RGB-Depth camera captures the images of the exercise performed by the patient and provides guidance/ feedback to the patient without any human intervention.


Upper limb body parts exercises like, shoulder, elbow, and abduction flexion or extension were measured.

CNN-GRU model with an accuracy of 100%.


The fitness motions are then evaluated using real-time pictures and the DL algorithm. The suggested system's performance is then assessed through simulation.

Deep Hybrid Convolutional Network

Data gathering allows for fitness motions and possibly injury diagnostics.

Achieved an accuracy of 97.80%.


Exercise Pose is compared with the standard pose image and provides feedback and guidance to the patients.

Spatial Transform Networks (STN) with attention-based multi-scale CNN

Pose Assessment with eight section brocade exercise with different parts of the human body like Left upper arm, left forearm, left thigh, Left calf Trunk, right upper arm, Right forearm, Right thigh, and Right calf.

ST-AMCNN with different human parts poses matching with an average accuracy of 70.02%.

3. Methodology

Given the current state and complexities of MSD upper limb function rehabilitation, this study will process collected exercise pose data using an in-depth learning approach. To accomplish a classification of upper limb rehabilitation activities for MSD patients and support clinicians in developing the most appropriate rehabilitation plan and assistance. The following subsections deliberate the functionalities of various stages in the proposed system. Figure 2 shows the structure of our proposed approach.

The suggested system's process flow is as follows.

  1. Utilizing the image data generator function, exercise pose image datasets from volunteers and internet resources are imported into the local directory.

  2. Resize input images as 200x200 and split the images as 80% training and 20% validation dataset in the pre-processing stage.

  3. CNN DenseNet architecture is used to train the modal to classify the exercise pose image from the input.

  4. Results are generated by importing test sample image data from the appropriate directory.

The following subsection deliberates each phase in the proposed system.

3.1 Data augmentation and pre-processing

In this phase, data were collected from the 15 volunteers (12 male and three female) with ages varying from 25 – 40. and some images are utilized from the open source online available databases that perform upper limb exercises in RGB images. Sample images showed in Figure 3. Each image is separated based on its class. Our work comprises four categories, namely, wrist flexion (WF), wrist extension (WE), trunk extension (TE), and trunk flexion (TF).

Figure 2. Flow diagram of proposed model

Figure 3. Sample images

Figure 4. Distribution of different classes in the training and validation dataset

Then, these images are resized as 200x200 to process all the image datasets by the CNN architecture. A total of 200 samples were used in this work. All images are split as 80% for training data and 20% for testing data, which will be shown in Figure 4.

The reason behind using the Pareto principle of 80% of training data and 20% of the test data is to avoid overtraining in a neural network. They determine that the proportion of patterns put aside for validation should be proportional to the square root of the total number of independent variables. So that our findings generalize and validate our outcomes.

3.2 Model training using CNN

Figure 5 portrays the CNN design with DenseNet architecture flow to classify the exercise pose.

Figure 5. Dense CNN architecture

The best-trained model utilizing a CNN goes into further detail in this section about the various steps in preparation.

3.2.1 Convolution 2D layer

During image processing, this is utilized for filtering. In this case, it is produced with an output dimension of 100x100 with eight filters in one stage; in a later step, a 25x25 output shape and 16 filters are used in 2D convolution. To create a feature map from pre-processed data, 2D convolution utilized the following equation.

$“ f[x, y]=(g * \mathrm{k})[x, y]=\sum_m \sum_n k[m, \mathrm{n}] g[\mathrm{x}-\mathrm{m}, \mathrm{y}-n] ”$           (1)

An input image is represented as 'g', and the kernel is denoted as ‘k’. The rows and columns indexes for the output matrix are defined as “m, n” and for the convolution kernel as “x & y”.

Downsampling/ Max_Pooling: This function reduces the image's dimensions. This design consists of two layers: the first layer shrinks the kernel from 100 X 100 (Conv2D) to 50 X 50 sizes, and the second shrink the kernel from 25 X 25 (Conv2D_1) to 12 X 12.

One-dimensional layer: Often employed in the transition from the convolution layer to the fully connected layer, the flattened layer is utilized to reduce the multidimensional input to one dimension.

Dense Layer: Based on the results of convolutional layers, a picture is classified using a dense layer. Neurons are found in each layer of the neural network, and they compute the weighted average of the input before passing it through a non-linear function known as the "activation function."

Table 2 depicts two convolutional block brain networks with their boundaries.

Table 2. Summary of CNN sequential model

Layer (type)

Output Shape



100, 100, 8



50, 50, 8



25, 25, 16



12, 12, 16








Dense 1



Dense 2



The evaluation of parameters in the CNN architecture using the following expression,

$“ Conv2D =i * k(h * w) * N+b ”$                    (2)

where, i – Number of filters used; k(h*w) – Kernel size; N – Number of input channel; b – Number of biases.

$“ Dense =P \times O+b ”$            (3)

where P – Number of outputs from the previous layer; O – Output Channels; b – Number of biases.

3.2.2 Details of convolution layer

Layer 1: Filter size multiplied by the number of filters equals the number of outputs. Each filter size in the network is 3x3, and the number of filters in layer 1 is 8 (8*3*3 = 72). Multiplying this with several inputs is 3. Then, 72*3 = 216, the number of biases used in CNN equals the number of filters. So, the total parameters from layer 1 are 216+8 = 224.

Layer 2: Similarly, for the Conv2D_1 layer initially, 16*3*3 =144; this value is multiplied with several filters in the previous layer (i.e., 8), 144*8 = 1152. Then, number biases are added to the final value. Therefore, 1152+16 = 1168 is the parameter at the second layer.

3.2.3 Dense layer

It is evaluated as the number of inputs multiplied by the number of outputs in the layer, then added with the bias. For our scenario, Dense: 2304*512+512 = 1180160; Dense 1: 512*512+512 = 262656, and Dense 2: 512*4+4 = 2052

The downsampling and flattening only calculate a single number; no backpropagation learning is involved, so it has no learnable parameters. These parameters in the pooling layer are "0".

4. Result and Discussion

In this section, the performance outcomes are displayed. Jupyter notebook was used to design each system described in this study. Models are trained offline, and then they can be used to detect the class of an exercise through RGB images. The proposed DenseNet CNN utilizes the following hyperparameters to train the model, Adaptive Moment Estimation (ADAM) as an optimizer with maximum epochs 200. Relu activation function used for linear output in sequential model preparation. The softmax activation function is utilized to gather the classification output in the multiclass environment to predict the input data class. Figure 6(a) shows the accuracy achieved in various iterations. Figure 6(b) shows the losses of overall system performance against different iterations. Figure 6(c) shows the comparison results of loss vs. accuracy.

It has been demonstrated that the training and validation data's accuracy gradually improves over the first 25 epochs before leveling out. The ensuing trends show minor changes in the ratio of lost training data to validation data and a slight improvement in accuracy. Likewise, the amount of data lost during training and validation drop dramatically within the first 25 epochs, and during this time, we observe some oscillatory declines in the value. The errors become less frequent with time. The model may perform well when classifying images from the test dataset because preparation and acceptance of deformation are effectively adjusted, showing that it does not lead to overfitting.

Figure 6(a). Accuracy vs. Epoch

Figure 6(b). Loss vs. Epoch

Figure 6(c). The comparison results of Loss vs. Accuracy

4.1 Evaluation of performance metrics

The system that generates the result of the classification model uses a confusion matrix to describe the class of the testing data, shown in Figure 7.

Figure 7. Classification of various classes of validation images

There are ten images in each class utilized to generate the confusion matrix. The following terms and equations are used to determine the performance metrics for this confusion matrix.

  1. True Positive (TP): Prognostic of correct positive

  2. True Negative (TN): Prognostic of correct negative

  3. False Positive (FP): Prognostic of incorrect positive

  4. False Negative (FN): Prognostic of correct negative

The Performance metrics of the system are evaluated using the following equations.

$“ \operatorname{Accuracy}(\%)=\frac{T P+T N}{T P+T N+F P+F N} * 100 ”$               (4)

$ “ Precision (\%)=\frac{T P}{T P+F P} * 100 ”$                 (5)

$ “ Precision (\%)=\frac{T P}{T P+F P} * 100 ”$         (6)

$“ F 1-$ Score $(\%)=\frac{2 * \text { Precision } * \text { Recall }}{\text { Precision }+ \text { Recall }} \quad ”$          (7)

A categorization report using the suggested method is shown in Table 3. The following discussion covers the mathematical formulas used to calculate Precision, Recall, and F1-score.

Table 3. Categorization report

Class name





Trunk Extension (TE)





Trunk Flexion (TF)





Wrist Extension (WE)





Wrist Flexion (WF)





4.2 Performance comparison

This subsection compares the proposed system recognition accuracy with other existing approaches. Figure 8 shows the performance comparison of the proposed system with different techniques.

Figure 8. Performance comparison of the Proposed system with existing approaches

Here, we discussed comparing the proposed approach with existing deep learning approaches. One method used the accelerometer sensor’s time series image data with CNN to recognize the exercise pose. This framework achieves 98.61% accuracy [1]. Optimization is required for the DNN with RGB image approach to improve accuracy [8]. CNN-GRU-based model training approach can provide good accuracy (100%) for predictions from the model; it requires more hardware implementation to complete the process [9]. The deep hybrid convolutional network approach only tracks single-target limb motion and provides an accuracy of 97.80% [10]. ST-AMCNN with different human parts poses matching with an average accuracy of 70.02%. But it needs more computation time to complete the process of finding [11]. The proposed system uses real-time data and efficiently provides a recognition accuracy of 98.12%. Our method is superior to other cutting-edge traditional CNN-based baseline designs in terms of its ability to track and improve the efficacy of patients' rehabilitation training and provide a higher level of satisfaction to the user.

5. Conclusion

The goal of the physiotherapist is to advance the use of found very interesting rehabilitation equipped to handle a wide range of problems. Modern approaches, including in-depth learning, reduce the requirement for less detailed knowledge as discovery levels rise. Most existing techniques dealt with sensory-based waveform images [1], Kinect sensor depth images, RGB images for fitness exercises, Body key frame images [8-10], and Spatial transform with gradient feature images [11] to classify the exercise pose. In this work, RGB images of various exercises are captured through an intelligent phone mobile camera to analyze the classification of complex recuperation exercise poses. Our approach can monitor and enhance the effectiveness of the patient's rehabilitation therapy with more satisfying accuracy in different image plane projections than some other cutting-edge conventional CNN-based baseline architectures. Nevertheless, despite using a smaller sample set, the proposed method had acceptable recognition accuracy. Even yet, the system's accuracy can be increased by importing this methodology with lots of data.

Even though our proposed approach efficiently recognizes and classifies the exercise pose through RGB images. There are some limitations still to be addressed in the future. Different body parts of the various exercise pose tend to overlap and may fail to locate the minute deviation caused by the patients—improvements to the algorithm used for real-time video monitoring using depth image analysis. Finally, Further development is necessary to recognize feelings such as dread and contempt emanating from the patients. Since the frameworks are aimed at home-based rehabilitation, it is essential to perform clandestine data-sharing operations among patients around the globe. In this context, we plan to implement our approach in blockchain with federated learning in the future.



Input image




Rows and columns indices of kernel


Rows and columns indices of output


Number of filters


Kernel size


Number of input channels


Number of biases


Number of outputs from the previous layer


Output channels


True positive


True negative


False Positive


False Negative


[1] Nair, B.B. and Sakthivel, N.R. (2022). A deep learning-based upper limb rehabilitation exercise status identification system. Arabian Journal for Science and Engineering.

[2] Rodrigues, P.B., Xiao, Y., Fukumura, Y.E., Awada, M., Aryal, A. (2022). Ergonomic assessment of office worker postures using 3D automated joint angle assessment. Advanced Engineering Informatics, 52: 101596.

[3] Adem, H.M., Tessema, A.W., Simegn, G.L. (2022). Classification of Parkinson's disease using EMG signals from different upper limb movements based on multiclass support vector machine. International Journal Bioautomation, 26(1): 109-125.

[4] Govaerts, R., Tassignon, B., Ghillebert, J., Serrien, B., De Bock, S., Ampe, T., El Makrini, I., Vanderborght, B., Meeusen, R., De Pauw, K. (2021). Prevalence and incidence of work-related musculoskeletal disorders in secondary industries of 21st century Europe: A systematic review and meta-analysis. BMC Musculoskeletal Disorders, 22(1): 1-30.

[5] Shankar, C.M., Venkatesan, R. (2021). Work-related musculoskeletal disorders among male health-care professionals in a private health care organization, India: Prevalence and associated risk factors. Indian Journal of Physical Therapy and Research, 3(1): 3-7.

[6] Parikh, S.M., Mehta, J.N., Thakkar, M., Thakkar, N., Gauswami, M. (2022). Prevalence of musculoskeletal disorders and its risk factors among Class 4 workers of rural tertiary health-care hospitals in Western India: A cross-sectional study. National Journal of Physiology, Pharmacy and Pharmacology, 12(12): 2131-2131.

[7] Shen, C., Ning, X., Zhu, Q., Miao, S., Lv, H. (2022). Application and comparison of deep learning approaches for upper limb functionality evaluation based on multi-modal inertial data. Sustainable Computing: Informatics and Systems, 33: 100624.

[8] Chen, C. (2022). Research on intelligent bodybuilding systems based on machine learning. Journal of Sensors, e6293856-e6293856.

[9] Bijalwan, V., Semwal, V.B., Singh, G., Mandal, T.K. (2022). HDL-PSR: Modelling Spatio-Temporal Features Using Hybrid Deep Learning Approach for Post-Stroke Rehabilitation. Neural Processing Letters, 1-20.

[10] Cai, H. (2022). Application of intelligent real-time image processing in fitness motion detection under internet of things. The Journal of Supercomputing, 78(6): 7788-7804.

[11] Qiu, Y., Wang, J., Jin, Z., Chen, H., Zhang, M., Guo, L. (2022). Pose-guided matching based on deep learning for assessing quality of action on rehabilitation training. Biomedical Signal Processing and Control, 72: 103323.

[12] Ponnusamy, V., Coumaran, A., Shunmugam, A.S., Rajaram, K., Senthilvelavan, S. (2020). Smart glass: Real-time leaf disease detection using YOLO transfer learning. In 2020 International Conference on Communication and Signal Processing (ICCSP), 1150-1154.

[13] Zhang, C., Zou, J., Ma, Z., Wu, Q., Sheng, Z. and Yan, Z. (2021). Upper limb action identification based on physiological signals and its application in limb rehabilitation training. Traitement du Signal, 38(6): 1887-1894.

[14] Zhang, C., Zou, J. and Ma, Z. (2021). Identification and analysis of limb rehabilitation signal based on wavelet transform. Traitement du Signal, 38(3): 689-697.

[15] Ponnusamy, V., Marur, D.R., Dhanaskodi, D., Palaniappan, T. (2021). Deep learning-based x-ray baggage hazardous object detection-an FPGA implementation. Revue d'Intelligence Artificielle, 35(5): 431-435.

[16] Kataria, S., Ravindran, V. (2022). Musculoskeletal care–at the confluence of data science, sensors, engineering, and computation. BMC Musculoskeletal Disorders, 23(1): 1-11.

[17] Zhang, Y., Li, W., Yang, J., Liu, Z., Wu, L. (2022). Cutting-edge approaches and innovations in sports rehabilitation training: Effectiveness of new technology. Education and Information Technologies, 1-18.