© 2024 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
The precise estimation of age is pivotal in identity verification across critical security checkpoints, including seaports, land borders, and airports. This study introduces an innovative methodology for age estimation based on gait analysis, enhancing security measures and providing reliable identity confirmation. Utilizing a novel preprocessing technique on gait datasets, this approach amalgamates three integral components: the Accumulated Frame Difference Energy Image (AFDEI), the Gait Energy Image (GEI), and the Invariant Moment of the image. These elements collectively facilitate the efficient extraction and analysis of critical gait data. Evaluation of the model, employing a Convolutional Neural Network (CNN), was conducted on the publicly available OU-ISRI with age dataset. The model demonstrated remarkable proficiency, achieving an average accuracy of 90.40% across 14 distinct view angles within a 5 K-Fold framework. This methodological advancement significantly outperforms existing state-of-the-art techniques in accuracy. The findings highlight the efficacy and potential of the proposed method for age grouping estimation through human gait analysis. Despite these advancements, it is crucial to acknowledge the study's limitations, particularly the dependency on silhouette images. Preprocessing is essential prior to implementing the proposed methodology. The outcomes of this study are instrumental in reinforcing age estimation as a key factor in bolstering identity verification processes at essential security junctures.
age group estimation, age verification, accumulated frame difference energy image (AFDEI), convolutional neural network (CNN), gait energy image (GEI), human gait, invariant moments, view angle variability
The utilization of technology and computers has revolutionized the field of security, making it one of the most significant obstacles in contemporary civilization. As humans, we constantly seek methods to surpass security measures. In the past, passwords, personal identification numbers (PINs), and other security measures were commonly used to secure sensitive information. Given the increasing number of dangers, we consistently engage in brainstorming sessions to devise novel methods for recognizing persons and verifying their legitimacy [1-3]. A system was devised that capitalizes on the distinct characteristics of the individual authorized to access it, considering the possibility of sharing each password and PIN.
In the past ten years, there has been extensive research conducted on gait recognition [4, 5], which aims to classify individuals based on their walking patterns. When compared to other forms of biometric identification such as fingerprints and irises [6], gait offers the advantage of being non-intrusive and easily obtainable from a distance. This makes it a very attractive method of identification. The advantages are substantial for video-based systems, particularly for intelligent security surveillance systems where gait detection is the dominant method. The gait recognition system is a cutting-edge technological solution that does not necessarily necessitate assuming a specific posture or positioning oneself in front of biometric equipment. Instead, it utilizes covert technology that examines human gait motions to ascertain if an individual presents a security risk or is behaving abnormally [7, 8]. Both model-free and model-based approaches can be used for gait identification. However, these two forms of gait analysis are currently the most used. Model-based methods leverage physiological aspects to recognize human movements, namely using gait recognition models [9]. The stick-figure model can be used to illustrate the dynamics of gait and human kinematics by explicitly illustrating their properties. "Model-free gait analysis" refers to gait recognition methods that analyze gait sequences without relying on any predefined models. Typically, the movement characteristics and spatio-temporal structures of silhouettes are examined. Walking is an innate and effective human locomotion that entails a repetitive sequence of bodily movements to sustain equilibrium. The act of moving is commonly known as the "gait," and the duration between identical occurrences in this process is called the "gait cycle." The cycle commences when a foot establishes contact with the ground and concludes when the identical foot reestablishes contact with the ground. The stance phase accounts for approximately 60% of the gait cycle, while the swing phase makes up the remaining 40% [10-12].
A gait cycle detection technique, a collection of training data, a feature extractor, and a classifier are the four components that make up a standard model-free gait identification approach [13].
Human gait age estimation is an exciting new area of study that attempts to determine a person's age only from observations of their gait. Several elements, including physiological, biomechanical, and neuromuscular, contribute to the inevitable alterations in gait that occur with advancing years [14-17]. Researchers can create methods to precisely and non-invasively determine a person's age by examining and measuring these age-related changes in gait [18-22].
Human gait recognition has been studied for age estimation problems. Different machine learning and deep learning models are used [19, 23-27] to extract age from human gait. Evaluation metrics such as mean absolute error (MAE) and correct classification rate (CCR) are used to assess performance. Different gait recognition datasets are adopted for evaluation purposes. Each model has its own benefits and drawbacks related to accuracy and computational complexity.
Image invariant moments, often referred to simply as "moments," are a set of numerical descriptors used in image processing and computer vision to capture important characteristics of an image [4, 28]. These moments are designed to be invariant or robust to certain geometric transformations such as translation, rotation, and scaling. Invariant moments are useful for pattern recognition, object detection, and image analysis because they allow the extraction of features from an image that remain consistent regardless of how the image is transformed. Invariant moments are computed from the pixel values of an image and are sensitive to the spatial distribution of those values. By extracting these moments, it becomes possible to represent an image's content in a form that is less sensitive to variations caused by translation, rotation, and scaling. This makes them valuable for tasks such as object recognition and image matching, where the same object can appear in different positions and orientations within an image [28, 29].
During the past years, researchers have delved into gait-based age estimation, but the persistent challenge of accuracy hampers its broad adoption. The complex interplay of physiological, biomechanical, and neuromuscular factors influencing gait requires meticulous understanding. Diverse machine learning models yield varied accuracy, complicating the trade-off between precision and computational complexity. Evaluation metrics like MAE and CCR underscore this complexity, while variations in datasets add another layer of challenge. As efforts to accurately group people by age based on gait increase, it becomes more important to do more complex studies that focus on improving accuracy and figuring out the subtleties of the age-gait relationship.
Motivated by the persistent challenges in gait-based age estimation, this paper contributes a novel approach to enhance age grouping recognition. Leveraging a preprocessed method involving pretraining images, our proposed technique extracts image features through a fusion of energy images and Hu moments. This fusion process yields a distinctive image from a sequence of gait images, a critical step in refining age estimation accuracy. Employing convolutional neural networks, our approach outperforms existing techniques, achieving notable advancements in accuracy. This study not only addresses the existing gaps in age estimation through gait but also introduces an innovative methodology with practical implications for robust identity verification systems.
1.1 Gait recognition processes
It is the recognition algorithm that makes or breaks a human gait recognition system. Multiple capturing devices, such as a video camera, sensors, wearables, and so on, might provide data for a gait recognition system. The efficiency with which the system can make use of this raw data depends on the accuracy of the recognition algorithm. Overall, in a gait recognition system [12, 30-32], certain steps have to be carried out to process gait, as shown in Figure 1.
Figure 1. Gait recognition phases [30]
1.2 Applications of gait recognition
Gait recognition is a non-invasive biometric [33] identifier used in various settings such as surveillance [34], security, law enforcement [35], access control and authentication [36], healthcare, and rehabilitation monitoring. It can monitor therapy efficacy [37], identify illness signs, guide treatment plans, and help police identify criminals. Gait age estimation has potential benefits in security systems, targeted healthcare interventions, and understanding aging effects on mobility [37, 38]. Accurate gait age determination is essential in fields like criminology, medicine, biometrics, and geriatrics [17, 21]. Collecting gait data from people of varying ages is crucial for accurate gait age estimation.
1.3 Limitations of gait recognition
Gait recognition systems must consider age-related physical changes [39], accidents, medical conditions [40], long-term habits [41], muscle memory, adaptability, system calibration, updates, and diverse datasets when designing, implementing, and assessing them [39, 41-43]. As people's gait patterns change over time, it's crucial to monitor the system, update data, and fine-tune algorithms to account for these variations. This ensures continuous performance evaluation and adaptation to changing gait patterns.
Age group estimations using gait recognition have been widely investigated over the past few years using different approaches and datasets. With the CASIA-B gait dataset, the authors [44] suggest a hybrid GEI and AFDEI approach to gait recognition. Compared to other relevant studies, their technique fared better in experiments.
Pinčić et al. [45] developed a new method that uses self-supervised learning (SSL) to complete the gait identification test. They used the vision transformer (ViT) architecture presented in the self-supervised DINO approach for image classification. To learn the useful gait features, training samples that have not been annotated are used. Gait features are taken out and fed into a feature extractor. This helps train a fully connected neural network (FCNN) classifier [46]. The CASIA-B and OU-MVLP gait datasets were used by Chao et al. [47] to implement their technique, and the walking scenarios included "normal walking," "walking with a bag," and "walking with a coat." Both datasets provided encouraging findings for their experimental approaches. A novel concept is presented in which an innovative network named GaitSet is presented that can learn and gather identity information from the set. The method achieved good accuracy in various situations (both normal and complex) compared to state-of-the-art techniques.
SelfGait is a self-supervised gait identification system proposed in literature [48] to improve spatiotemporal backbone representation skills. It employs horizontal pyramid mapping (HPM) and micro-motion template building (MTB) as spatiotemporal backbones to capture multi-scale spatial and temporal representations. SelfGait has been demonstrated to outperform existing state-of-the-art gait recognition approaches in experiments conducted on the CASIA-B and OU-MVLP benchmark gait datasets. In literature [49], a loss function called angle center loss (ACL) is proposed to learn discriminative gait features. A simpler spatial transformer network is proposed for locating the correct horizontal body portions. Long-short-term memory (LSTM) units are used as the temporal attention model to learn the attention score for each frame. When used together with the CASIA-B and OU-MVLP gait datasets, all three achieve the best results on a number of cross-view gait identification criteria.
In literature [50], a multi-view gait generative adversarial network (MvGGAN) is used to artificially produce gait samples and supplement existing datasets. To mitigate the negative effects of distribution divergence, domain alignment based on the predicted maximum mean discrepancy is also utilized. The proposed MvGGAN approach makes existing top-of-the-line cross-view gait recognition systems much more effective, as shown by tests on the CASIA-B and OUMVLP datasets. An approach to gait identification using graph convolutional networks (GCNs) is presented by the authors in literature [51].
In this paper, we propose to use a combination of GEI and AFDEI using a weighted average calculation to generate a combined energy image (CEI), which will feed to the CNN to classify and estimate the age group. A block diagram of the proposed method, representing the steps from reading the dataset to feeding it to the CNN and classifying the image, is shown in Figure 2.
Figure 2. Block diagram of the proposed method
Referring to the diagram of the proposed method shown in Figure 3, the dataset has been selected and processed, and there is a gait cycle image for each person. From the gait cycle. We generate the gait energy images using different approaches as they appear in the block diagram. Then the final image has been grouped into age groups for classification using CNN.
Figure 3. Dataset images classes
3.1 Dataset
The OU-ISIR Gait Database, Large Population Dataset with Age [52], is the proposed dataset for this paper. This dataset is a preprocessed video-based human gait. The information was gathered in conjunction with a video-based gait analysis exhibit at a museum dedicated to science and technology. Everyone who participated in this study gave their signed, official, informed permission. There are 10,307 individuals in total in the dataset, with 5,114 men and 5,193 females with ages ranging from 2 to 87 years old, as seen from 14 distinct perspectives (0°- 90°, 180°- 270°). Seven network cameras (Cam1–7) are arranged in a quarter circle with the walking track at its center, each at an azimuth angle of 15 degrees with a diameter of 8 meters and a height of 5 meters. Each camera captures 25 frames per second of 1,280 by 980-pixel gait data.
3.2 Preprocessing operation
The detection of moving objects, the extraction of features, and the identification of gait are the three stages typically involved in the process of normal gait recognition. Dataset [52] recommended the sequence of silhouette images to be used to generate the gait energy images. Therefore, this paper used the same recommended image list.
3.2.1 Preparing OUMVLP with age image dataset
The OUMVLP with age image dataset has been divided into 18 classes (0-17) into 5-year slices to meet our aim and to compare with state-of-the-art methods, as shown in Figure 3.
3.2.2 Gait energy image
One of the most commonly used representations is known as the GEI, and the reason for this is that it strikes a good balance between the amount of computing effort required and the amount of recognition it provides. A spatio-temporal description of the gait, known as GEI, may be produced by averaging the silhouettes throughout a gait cycle. GEI can then be used to analyze gait patterns. Although this visualization is one of the most often used gait representations, it has been discovered that clothes and carrying circumstances have an impact on detection accuracy [53].
The weighted average approach is used to create the gait energy picture, which is then used to represent the gait sequence of a cycle in a straightforward energy image. Processing is done on the gait sequences that are included inside a gait cycle to align the binary silhouette. The following equation may be used to get the gait energy picture when the gait cycle image sequence is given as $B(x, y, t)$ Eq. (1):
$G(x, y)=\frac{1}{N} \sum_{t=1}^N B(x, y, t)$ (1)
where, $B(x, y, t)$ denotes the gait cycle pictures, N denotes the number of frames in a cycle’s gait series, and t denotes the current frame of the gait cycle. Figure 4(a) shows the silhouette gait cycle frame sequences, and Figure 4(b) represents the GEI extracted from the gait frame sequence.
Figure 4. (a) Silhouette gait cycle frame sequences, (b) GEI image extracted from gait cycle
3.2.3 Accumulated frame difference energy image
To create the AFDEI, preprocessed silhouette sequences are used. The frame difference energy image (FDEI) method that was suggested tries to lessen the effect of the silhouette's incompleteness while keeping most of the shape's properties and lowering its temporal volatility. Recovering shape information from an incomplete frame requires completing additional frames to compensate for the incomplete one [54]. When the forward frame difference image and the reverse frame difference image are combined, the energy picture is produced as the frame difference. Image computation for the forward direction is shown by Eq. (2), computation for the backward direction is shown by Eq. (3), and computation for the frame difference is shown by Eq. (4):
$\begin{gathered}F_a(x, y, t)= \left\{\begin{array}{cc}0, & \text { if } B(x, y, t) \leq B(x, y, t-1) \\ B(x, y, t)-B(x, y, t-1), & \text { otherwise }\end{array}\right.\end{gathered}$ (2)
$\begin{gathered}F_b(x, y, t)= \left\{\begin{array}{cc}0, & \text { if } B(x, y, t) \geq B(x, y, t-1) \\ B(x, y, t-1)-B(x, y, t), & \text { otherwise }\end{array}\right.\end{gathered}$ (3)
where, $F_b(x, y, t)$ is the backward frame difference image.
$F(x, y, t)=F_a(x, y, t)+F_b(x, y, t)$ (4)
where, $F(x, y, t)$ is the frame difference image.
The cumulative frame differential energy picture may be created using the weighted average approach [54]. This image can depict the temporal characteristic. The cumulative frame difference picture may be calculated as shown in Eq. (5):
$A(x, y)=\frac{1}{N} \sum_{t=1}^N F(x, y, t)$ (5)
where, $F(x, y, t)$ is the frame difference image. N is the total frame difference generated by Eq. (4), and t is the current frame difference.
The gait sequence frames are shown in Figure 4(a), the frame difference energy picture is displayed in Figure 5(a), and the cumulative frame difference energy image is displayed in Figure 5(b).
Figure 5. (a) Accumulative frame difference Sequence, (b) AFDEI
3.2.4 Gait energy image using max pixel (GEI-Max)
GEI is the average of the total number of gait cycle sequence images. However, in the proposed method, we fetch the highest value among the whole cycle of gait frames. Eq. (6) represents the calculation of maximal pixel values in position. Figure 6(a) represents the result of GEI-Max generated from the silhouette frame sequences shown in Figure 4(a).
$\begin{gathered}A(x, y)= max \left[F_1(x, y), F_2(x, y), F_3(x, y), \ldots, F_n(x, y)\right]\end{gathered}$ (6)
where, $F_n(x, y)$ represents the frame number in the silhouette gait sequence.
3.2.5 Edge detection
Edge detection is a technique for identifying boundaries between regions of an image with different brightness levels [55, 56]. It is a useful process for reducing data size and facilitating various image-related operations. Various methods, such as Prewitt, Canny, Laplacian, and Sobel edge detection, can be employed for this purpose [57]. Sobel edge detection was used in this study because it is easy to use, doesn't take a lot of processing power, is reliable at localization, can differentiate in two directions, reduce noise, and can work with images of different sizes and formats. It can also handle both gradual and sudden changes in brightness. The Sobel edge detection method was employed to detect the edge of the GEI-Max image in Figure 6(a), and the results are presented in Figure 6(b).
Figure 6. (a) GEI-Max, (b) Edge of GEI-Max
3.2.6 Combined energy image (CEI)
Firstly, the Edge GEI-Max Figure 6(b) is merged with GEI in Figure 4(b) using the weighted average calculation presented in Eq. (1) and the results are presented in Figure 7.
Figure 7. Combined CEI-average image
Figure 8. Combined CEI-Max
Then the resulting image (Figure 7) is merged with AFDEI (Figure 5) by picking the highest value of the pixels in both images as shown in Eq. (6) to produce an image as shown in Figure 8.
3.2.7 Invariant moment of image
As a method of feature extraction, moment invariants are put to use in the fields of shape recognition and identification. With its ability to generate feature vectors that accurately reflect a picture, it has found widespread use in a variety of contexts, particularly recognition. The form attributes of binary images can be extracted using the moment-invariant method. Moments that have been computed from binary images are referred to here as "silhouette moments" [58]. For any conceivable moment, type p and q, intensity image functions f(x,y) with N x M pixel dimensions can be computed, as can their characterization in the context of universal computation, as in Eq. (7).
$m_{p q}=N F \sum_{x=0}^{N-1} \sum_{y=0}^{M-1} \Pi_{p q}(x, y) f(x, y)$ (7)
where, NF is the normalization factor and $\Pi_{p q}$ is the moments, kernel formed by the product of the specified polynomials of order p and q that serve as the orthogonal basis. Given the variety of moment types that emerge from the various types of polynomials in the Kernel, we give this new family of moments a descriptive name.
3.2.8 Hu invariants moment
Flusser et al. [59] introduced the first ever 2-D moments in 1962. As a logical component of the "Moment Invariants," he proposed the 2-dimensional geometric moments of an image's distribution function. Hu Moments invariants are a set of seven numbers determined by calculating central moments that remain unchanged upon changing the underlying image. Invariance under translation, scaling, rotation, and reflection has been demonstrated for the first six moments. But the sign for image reflection flips at the seventh moment. The calculation of Hu invariants' moments is represented by Eq. (8).
$\begin{gathered}M_0=m_{20}+m_{02} \\ M_1=\left(m_{20}-m_{02}\right)^2+4 m_{11}^2 \\ M_2=\left(m_{30}-3 m_{12}\right)^2+\left(3 m_{21}-m_{03}\right)^2 \\ M_3=\left(m_{30}+m_{12}\right)^2+\left(m_{21}+m_{03}\right)^2 \\ M_4=\left(m_{30}-3 m_{12}\right)\left(m_{30}+m_{12}\right) \\ \left(\left(m_{30}+m_{12}\right)^2-3\left(m_{21}+m_{03}\right)^2\right) \\ +\left(3 m_{21}-m_{03}\right)\left(m_{21}+m_{03}\right) \\ \left(3\left(m_{30}+m_{12}\right)^2-\left(m_{21}+m_{03}\right)^2\right) \\ M_5=\left(m_{20}-m_{02}\right) \\ \left(\left(m_{30}+m_{12}\right)^2-\left(m_{21}+m_{03}\right)^2\right) \\ +4 m_{11}\left(m_{30}+m_{12}\right)\left(m_{21}+m_{03}\right) \\ M_6=\left(3 m_{21}-m_{03}\right)\left(m_{30}+m_{12}\right) \\ \left(\left(m_{30}+m_{12}\right)^2-3\left(m_{21}+m_{03}\right)^2\right) \\ -\left(m_{30}-3 m_{12}\right)\left(m_{21}+m_{03}\right) \\ \left(3\left(m_{30}+m_{12}\right)^2-\left(m_{21}+m_{03}\right)^2\right)\end{gathered}$ (8)
Hu's approach provides seven vector values ($M_i$) for a given image, as the calculated results for five random images in the proposed dataset are shown in Table 1.
Table 1. Sample of invariant moment results
CEI Image |
M0 |
M1 |
M2 |
M3 |
M4 |
M5 |
M6 |
1 |
3.176056 |
7.516807 |
12.32739 |
13.01199 |
-25.7417 |
-17.2446 |
25.99022 |
2 |
3.153797 |
7.597082 |
10.45985 |
12.06968 |
23.6814 |
-16.463 |
23.38354 |
3 |
3.178445 |
8.46446 |
10.98882 |
12.87856 |
-24.9104 |
-17.1599 |
25.03184 |
4 |
3.164626 |
7.574627 |
10.68473 |
13.04071 |
-25.1183 |
-17.1062 |
-25.0044 |
5 |
3.174391 |
7.810398 |
11.3712 |
14.33523 |
28.11615 |
-18.3237 |
27.19149 |
3.2.9 Proposed combined technique using CEI with invariant moment
The result of Hu invariant moments, which are mentioned in Table 1, is combined with CEI in Figure 8 to produce the final preprocessed image, which then feeds it to the neural network. To implement such a combination, we employed a novel technique by averaging the sum of seven vectors with an age group of the processing image as a first step, then we applied the activation function to keep the pixels in the range of (0-255). Finally, a new pixel is generated by averaging the current pixel with the generated pixel in the first step. Eqs. (9)-(11) are used to implement such a combination:
$A=\left|\frac{(C i+1) \sum_{m=1}^7 H m}{7}\right|$ (9)
where, A represents the combined class number with Hu invariant moments, $\sum_{m=1}^7$ is the sum of seven invariant moments, $C \mathrm{i}$ represents the class index of the given image. An activation formula is applied, as seen in Eq. (10), and finally, generate the new pixel by implementing Eq. (11).
$A^{\prime}= \begin{cases}255, & \text { for } A>255 \\ A, & { otherwise }\end{cases}$ (10)
$p i(x, y)=\frac{p i(x, y)+A^{\prime}}{2}$ (11)
where, $A^{\prime}$ represents the activation of $A, \operatorname{pi}(x, y)$ is the current pixel in the image. Figure 9 shows the result of applying the above-mentioned equations.
Figure 9. Final CEI
3.2.10 Class imbalance
In machine learning, one of the most common difficulties is a class imbalance, which occurs when some classes contain far fewer samples than others. There are a few ways to address class imbalance issues, such as class weighting, undersampling, oversampling, hybrid methods, and resampling data. Data undersampling entails reducing the sample size of the majority group so that it is comparable to the minority group, while data oversampling involves raising the sample size of the minority group [60, 61]. Both class weighting and hybrid methods provide more weight to the minority class in training by giving more samples from that class. As it appears in Figure 10, the classes are not equal, and there are fatal differences between classes. To overcome this issue, we have used the class weighting approach to train the model, as well as the resampling method, which means either removing some class images or increasing the minority by flipping, zooming, or rotating the images that do not fit our aims. Figure 10 represents the class weights for 75 view degrees.
Figure 10. Class weighted imbalance ratio on 75 degrees
3.2.11 CNN network architecture
In this research, we implement the CNN model to extract features from the OUMVLP image dataset after preprocessing. The model input layer has an image size of 100×100 and four hidden layers with several feature-extracting filters of 16, 32, 64, and 256, respectively. The activation function used is "LeakyRelu". Furthermore, the pooling layer is set to default (2×2). The dense layer is set to 512, and finally, the classification layer uses the SoftMax activation function. The full architecture diagram is shown in Figure 11.
Figure 11. CNN model architecture
The selection of CNN for feature extraction and classification stems from its inherent strengths in processing visual data, especially in image-related tasks. Unlike traditional machine learning algorithms, CNNs excel at learning hierarchical representations of features directly from raw data. In gait recognition, where the spatial and temporal patterns of movement are crucial, CNNs can automatically learn discriminative features through convolutional layers. The ability to capture complex patterns in gait dynamics makes CNNs well-suited for this task. Additionally, the translation invariance property of CNNs aligns with the variability in gait across different individuals and conditions, enhancing the model's generalization. The proven success of CNNs in image-related tasks and their capacity to automatically learn relevant features make them a rational and effective choice for gait-based age estimation in this study.
The authors conducted many experiments using the proposed CNN model. The model was trained with 9777 out of 10,307, divided into 18 classes. The data was normalized, and all the images that had no information were removed for training purposes. The dataset was split randomly into training and testing for 80 percent and 20 percent, respectively. The model is implemented over 50 epochs to train and test in 5 K-fold iterations. The model was trained and tested using the generated images after it had been preprocessed with and without using invariant moments. Table 2 shows the result for 75º using images without invariant moments.
Table 2. Accuracy score of 5 k-fold for 75º without invariant moment
View Degree |
Worst Score |
Best Score |
Average Score |
75º |
75.20 |
82.30 |
80.0 |
Furthermore, the authors observed that the experimental results show non-competitive accuracy performance. Meanwhile, an enhancement of preprocessing was done on the image dataset by combining the images from CEI-Max with invariant moments. The experimental results showed that the preprocessing technique had successfully enhanced the accuracy of the proposed model. The model also trains in all degrees available in the OUMVLP dataset separately. The experimental results (worst and best) for each view degree are recorded in Table 3.
Table 3. Accuracy score of 5 K-Fold for all view degrees
View Degree |
Worst Score |
Best Score |
Average Score |
0º |
74.49 |
98.19 |
91.26 |
15º |
72.21 |
98.34 |
90.40 |
30º |
91.14 |
98.82 |
94.68 |
45º |
79.74 |
96.15 |
90.03 |
60º |
13.75 |
98.00 |
76.64 |
75º |
90.49 |
98.87 |
94.54 |
90º |
83.02 |
99.23 |
93.27 |
180º |
88.84 |
97.35 |
93.31 |
195º |
77.17 |
97.52 |
89.59 |
210º |
81.17 |
97.4 |
87.99 |
225º |
64.93 |
96.50 |
89.24 |
240º |
81.29 |
97.22 |
92.66 |
255º |
73.96 |
98.32 |
88.97 |
270º |
87.79 |
97.00 |
92.30 |
Also, Figure 12 shows the best K-fold accuracy progress during the training and testing operations for 50 epochs at 75 degrees, and Figure 13 shows the cross-validation accuracy performance for all other tests at 75 degrees. Figure 14 shows the accuracy evaluation on 5 k-folds for 75 degrees.
Figure 12. Best crossover accuracy performance of 5 k-Folds on 75º
Figure 13. Crossover accuracy performance of 5 k-Folds on 75º
Figure 14. Evaluation accuracy of 5 k-Folds on 75º
Moreover, the model is compared with other previous studies' approaches related to the same field (age estimation) using the same dataset for all degrees. The proposed technique achieved higher accuracy compared to the state-of-the-art methods. Table 4 shows the best accuracy score results for the proposed approach compared with other studies.
During the experiments that were conducted by the authors, we found that the importance of sample size was highly evident in the results, and we implemented class balancing to finalize this issue. We have used class weight balancing to give a level of importance to classes that have lower samples, as mentioned in Section 2.3. The accuracy indicator shows a reasonable result compared to other approaches. The result was compared with four different approaches using the same dataset and focusing on accuracy for both testing and evaluation. The result indicates that the proposed preprocessed image approach was found efficient for age estimation using gait recognition and age categorical classification. Using GEI and AFDEI with Hu moment vectors plays a role in enhancing accuracy; moreover, employing edge detection to generate CEI compensation images increases the texture details of the images. The proposed approach focused on age estimates for a person in a particular age category.
Table 4. Comparison results in accuracy best score for all degrees
View Degrees |
Approaches |
||||
ViTs16 [26] |
GaitSet [28] |
SelfGait [29] |
ACL [30] |
Our Approach |
|
0º |
78.0 |
79.5 |
85.1 |
74.0 |
91.3 |
15º |
88.1 |
87.9 |
89.3 |
88.3 |
90.4 |
30º |
91.1 |
89.9 |
92.0 |
94.6 |
94.7 |
45º |
90.7 |
90.2 |
94.3 |
95.4 |
90.0 |
60º |
86.8 |
88.1 |
89.1 |
88.0 |
76.6 |
75º |
87.7 |
88.7 |
90.2 |
91.3 |
94.5 |
90º |
85.9 |
87.8 |
90.9 |
90.0 |
93.3 |
180º |
83.2 |
81.7 |
87.4 |
76.7 |
93.3 |
195º |
90.6 |
86.7 |
91.8 |
89.5 |
89.6 |
210º |
92.2 |
89.0 |
89.3 |
95.0 |
88.0 |
225º |
91.9 |
89.3 |
88.7 |
94.9 |
89.2 |
240º |
87.1 |
87.2 |
90.8 |
88.0 |
92.7 |
255º |
88.3 |
87.8 |
91.6 |
90.8 |
89.0 |
270º |
86.4 |
86.2 |
87.7 |
89.8 |
92.3 |
AVG |
87.7 |
87.1 |
89.9 |
89.0 |
90.4 |
In terms of efficiency, we found that the standard CNN model may also give significant results in less time. However, time consideration was not the aim of this research. It could be taken as future work aims to reduce the time and resources used in processing.
Compared to the state-of-the-art, we found that our approach has overcome other approaches in 0º, 15º, 30º, 75º, 90º, 180º, 240º, and 270º view degrees. In other words, the approach still scored significant results in 45º, whose score is 90.03%, compared to GaitSet [47] and SelfGait [48], which scored 90.20% and 90.69%, respectively. In 195º, the approach overcame two out of the four approaches that we used to compare. The approach has overcome ACL [30], which scored the highest average in the state-of-the-art comparison table, as shown in Table 4. 225º and 255º, and scored about 90.0% and 89.0%, respectively. The approach still competes highly with other approaches and exceeds ViTs16 [26] and GaitSet [28]. In 195º, the approach is close to 90%, and the highest score was 91.80%, achieved by SelfGait [48]. The second score achieved by ViTs16 [45] was 90.57%.
In the other view angles, the approach is still closely related to the other approaches, and there is a slice difference that is highly acceptable.
The minimum score value was 76.64% in the 60º view angle. The authors notice that all approaches have scored less than 90%, which all the winning approaches have achieved from different angles. This indicates that the image has less texture at this view angle. As observed from the obtained results, the view angle has less texture to be extracted, which can be proved by the other state-of-the-art results.
Overall, our approach has scored the best among all, with an average of 90.35%. The proposed methodology resulted in highly significant results.
The findings of this research contribute to the growing body of knowledge in gait analysis and age estimation. Future studies can build upon these results by exploring additional factors that influence age prediction accuracy, expanding the dataset to include more diverse samples, and investigating potential applications in real-world environments.
In conclusion, a new technique is introduced in this study that can determine a person's age just by analyzing their gait. Combining the GEI, the AFDEI, and the invariant moment of the image, the proposed method outperformed state-of-the-art methods in terms of accuracy. An average accuracy of 90.35 percent was found by testing with a convolutional neural network (5 K-fold cross-validation setup, 14 different camera viewpoints). There is promise in these findings for using gait analysis to predict age. One should take note that the method's highest score (94.68%) was achieved at a 30º view degree, while the lowest score (76.64%) was kept even at a 60º view degree. Potential applications in healthcare, security systems, and biometric identification have been opened up by this study.
[1] Makhdoomi, N.A., Gunawan, T.S., Habaebi, M.H. (2013). Human gait recognition and classification using similarity index for various conditions. IOP Conference Series: Materials Science and Engineering, 53: 012069. https://doi.org/10.1088/1757-899X/53/1/012069
[2] Long, Y., Chen, X., Xu, J. (2015). Application of gait factor in reducing gait interferences. In 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics, Hangzhou, China, pp. 103-107. https://doi.org/10.1109/IHMSC.2015.147
[3] Yodpijit, N., Jongprasithporn, M., Pongmit, K., Sittiwanchai, T. (2017). A low-cost portable 3D human motion analysis system: An application of gait analysis. In 2017 IEEE International Conference on Industrial Engineering and Engineering Management, Singapore, pp. 1164-1168. https://doi.org/10.1109/IEEM.2017.8290075
[4] Wang, X., Feng, S., Yan, W. Q. (2019). Human gait recognition based on self-adaptive hidden Markov model. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(3): 963-972. https://doi.org/10.1109/TCBB.2019.2951146
[5] Waheed, S., Amin, R., Iqbal, J., Hussain, M., Bashir, M.A. (2023). An automated human action recognition and classification framework using deep learning. In 2023 4th International Conference on Computing, Mathematics and Engineering Technologies, Sukkur, Pakistan, pp. 1-5. https://doi.org/10.1109/iCoMET57998.2023.10099190
[6] Wang, X., Yan, W.Q. (2020). Human gait recognition based on frame-by-frame gait energy images and convolutional long short-term memory. International Journal of Neural Systems, 30(1): 1950027. https://doi.org/10.1142/S0129065719500278
[7] Jawed, B., Khalifa, O.O., Bhuiyan, S.S.N. (2018). Human gait recognition system. In 2018 7th International Conference on Computer and Communication Engineering, Kuala Lumpur, Malaysia, pp. 89-92. https://doi.org/10.1109/ICCCE.2018.8539245
[8] Yang, L., Song, Q., Wang, Z., Hu, M., Liu, C. (2020). Hier R-CNN: Instance-level human parts detection and a new benchmark. IEEE Transactions on Image Processing, 30: 39-54. https://doi.org/10.1109/TIP.2020.3029901
[9] Tran, T.H., Nguyen, D.T., Nguyen, T.P. (2021). Human posture classification from multiple viewpoints and application for fall detection. In 2020 IEEE Eighth International Conference on Communications and Electronics, Phu Quoc Island, Vietnam, pp. 262-267. https://doi.org/10.1109/ICCE48956.2021.9352140
[10] Kharb, A., Saini, V., Jain, Y.K., Dhiman, S. (2011). A review of gait cycle and its parameters. IJCEM International Journal of Computational Engineering & Management, 13: 78-83.
[11] Dormans, J.P. (1993). Orthopedic management of children with cerebral palsy. Pediatric Clinics of North America, 40(3): 645-657. https://doi.org/10.1016/s0031-3955(16)38556-x
[12] Training Portable Lab - Walking, Running, Marching in Place. http://www.optojump.com/Applications/Gait-Analysis.aspx, accessed on 8 Jan. 2024.
[13] Wang, X., Wang, J., Yan, K. (2018). Gait recognition based on Gabor wavelets and (2D) 2 PCA. Multimedia Tools and Applications, 77: 12545-12561. https://doi.org/10.1007/s11042-017-4903-7
[14] Zhang, S., Wang, Y., Li, A. (2019). Gait-based age estimation with deep convolutional neural network. In 2019 International Conference on Biometrics, Crete, Greece, pp. 1-8. https://doi.org/10.1109/ICB45273.2019.8987240
[15] Sakata, A., Makihara, Y., Takemura, N., Muramatsu, D., Yagi, Y. (2020). How confident are you in your estimate of a human age? Uncertainty-aware gait-based age estimation by label distribution learning. In 2020 IEEE International Joint Conference on Biometrics, Houston, TX, USA, pp. 1-10. https://doi.org/10.1109/IJCB48548.2020.9304914
[16] Ebenezer, V., Edwin, B., Sharan, D., Thanka, R. (2023). Pose estimation approach for gait analysis using machine learning. In 2023 Second International Conference on Electronics and Renewable Systems, Tuticorin, India, pp. 1071-1075. https://doi.org/10.1109/ICEARS56392.2023.10085311
[17] Kondragunta, J., Hirtz, G. (2020). Gait parameter estimation of elderly people using 3D human pose estimation in early detection of dementia. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, Montreal, QC, Canada, pp. 5798-5801. https://doi.org/10.1109/EMBC44109.2020.9175766
[18] Xu, C., Makihara, Y., Liao, R., Niitsuma, H., Li, X., Yagi, Y., Lu, J. (2021). Real-Time gait-based age estimation and gender classification from a single image. In 2021 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, pp. 3459-3469. https://doi.org/10.1109/WACV48630.2021.00350
[19] Xu, C., Sakata, A., Makihara, Y., Takemura, N., Muramatsu, D., Yagi, Y., Lu, J. (2021). Uncertainty-aware gait-based age estimation and its applications. IEEE Transactions on Biometrics, Behavior, and Identity Science, 3(4): 479-494. https://doi.org/10.1109/TBIOM.2021.3080300
[20] Chen, Y., Ou, R., Deng, Y., Yin, X. (2021). WIAGE: A gait-based age estimation system using wireless signals. In 2021 IEEE Global Communications Conference, Madrid, Spain, pp. 1-6. https://doi.org/10.1109/GLOBECOM46510.2021.9685336
[21] Riaz, Q., Hashmi, M.Z.U.H., Hashmi, M.A., Shahzad, M., Errami, H., Weber, A. (2019). Move your body: Age estimation based on chest movement during normal walk. IEEE Access, 7: 28510-28524. https://doi.org/10.1109/ACCESS.2019.2901959
[22] Khabir, K.M., Siraj, M.S., Ahmed, M., Ahmed, M.U. (2019). Prediction of gender and age from inertial sensor-based gait dataset. In 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Spokane, WA, USA, pp. 371-376. https://doi.org/10.1109/ICIEV.2019.8858521
[23] Li, X., Makihara, Y., Xu, C., Yagi, Y., Ren, M. (2018). Gait-based human age estimation using age group-dependent manifold learning and regression. Multimedia tools and applications, 77: 28333-28354. https://doi.org/10.1007/s11042-018-6049-7
[24] Sakata, A., Takemura, N., Yagi, Y. (2019). Gait-based age estimation using multi-stage convolutional neural network. IPSJ Transactions on Computer Vision and Applications, 11: 4. https://doi.org/10.1186/s41074-019-0054-2
[25] Russel, N.S., Selvaraj, A. (2021). Gender discrimination, age group classification and carried object recognition from gait energy image using fusion of parallel convolutional neural network. IET Image Processing, 15(1): 239-251. https://doi.org/10.1049/ipr2.12024
[26] Li, X., Makihara, Y., Xu, C., Yagi, Y., Ren, M. (2019). Make the bag disappear: Carrying status-invariant gait-based human age estimation using parallel generative adversarial networks. In 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), Tampa, FL, USA, pp. 1-9. https://doi.org/10.1109/BTAS46853.2019.9185973
[27] Zhu, H., Zhang, Y., Li, G., Zhang, J., Shan, H. (2020). Ordinal distribution regression for gait-based age estimation. Science China Information Sciences, 63: 1-14. https://doi.org/10.1007/s11432-019-2733-4
[28] Li, Y., Lu, J., Song, Y., Zhang, Z., Liu, K. (2022). Automobile Connectors Recognition Algorithm Based on Improved Hu Invariant Moments. In 2022 International Conference on Informatics, Networking and Computing, Nanjing, China, pp. 178-182. https://doi.org/10.1109/ICINC58035.2022.00043
[29] Zhang, R., Wang, L. (2011). An image matching evolutionary algorithm based on Hu invariant moments. In 2011 International Conference on Image Analysis and Signal Processing, Wuhan, China, pp. 113-117. https://doi.org/10.1109/IASP.2011.6109009
[30] Hassanien, A.E., Kim, T.H., Kacprzyk, J., Awad, A.I. (2014). Bio-inspiring Cyber Security and Cloud Services: Trends and Innovations. Springer.
[31] Recfaces. https://recfaces.com/articles/what-is-gait-recognition, accessed on 8 Jan. 2024.
[32] Bayometric. https://www.bayometric.com/gait-recognition-identify-with-manner/, accessed on 8 Jan. 2024.
[33] Aung, H.M.L., Pluempitiwiriyawej, C. (2020). Gait biometric-based human recognition system using deep convolutional neural network in surveillance system. In 2020 Asia Conference on Computers and Communications, Singapore, pp. 47-51. https://doi.org/10.1109/ACCC51160.2020.9347899
[34] Nadhif, M.H., Hadiputra, A.P., Whulanza, Y., Supriadi, S. (2019). Gait analysis for biometric surveillances using Kinect™: A study case of axial skeletal movements. In 2019 16th International Conference on Quality in Research (QIR): International Symposium on Electrical and Computer Engineering, Padang, Indonesia, pp. 1-4. https://doi.org/10.1109/QIR.2019.8898273
[35] Majeed, A., Chong, A.K. (2021). Study of CCTV footage based on lower-limb gait measure for forensic application. In 2021 IEEE 12th Control and System Graduate Research Colloquium, Shah Alam, Malaysia, pp. 160-164. https://doi.org/10.1109/ICSGRC53186.2021.9515212
[36] Lee, S., Lee, S., Park, E., Lee, J., Kim, I. Y. (2022). Gait-based continuous authentication using a novel sensor compensation algorithm and geometric features extracted from wearable sensors. IEEE Access, 10: 120122-120135. https://doi.org/10.1109/ACCESS.2022.3221813
[37] Bruesch, A., Nguyen, N., Schürmann, D., Sigg, S., Wolf, L. (2019). Security properties of gait for mobile device pairing. IEEE Transactions on Mobile Computing, 19(3): 697-710. https://doi.org/10.1109/TMC.2019.2897933
[38] Chen, X., Weng, J., Lu, W., Xu, J. (2017). Multi-gait recognition based on attribute discovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(7): 1697-1710. https://doi.org/10.1109/TPAMI.2017.2726061
[39] Ko, S.U., Hausdorff, J.M., Ferrucci, L. (2010). Age-associated differences in the gait pattern changes of older adults during fast-speed and fatigue conditions: Results from the Baltimore longitudinal study of ageing. Age and Ageing, 39(6): 688-694. https://doi.org/10.1093/ageing/afq113
[40] Sanders, R.D., Gillig, P.M. (2010). Gait and its assessment in psychiatry. Psychiatry (Edgmont), 7(7): 38.
[41] Son, S., Noh, H. (2013). Gait changes caused by the habits and methods of carrying a handbag. Journal of Physical Therapy Science, 25(8): 969-971. https://doi.org/10.1589/jpts.25.969
[42] McDermott, P., Wolfe, E., Lowry, C., Robinson, K., French, H.P. (2018). Evaluating the immediate effects of wearing foot orthotics in children with Joint Hypermobility Syndrome (JHS) by analysis of temperospatial parameters of gait and dynamic balance: A preliminary study. Gait & Posture, 60: 61-64. https://doi.org/10.1016/j.gaitpost.2017.11.005
[43] Hemler, S.L., Sider, J.R., Redfern, M.S., Beschorner, K.E. (2021). Gait kinetics impact shoe tread wear rate. Gait & Posture, 86: 157-161. https://doi.org/10.1016/j.gaitpost.2021.03.006
[44] Luo, J., Zhang, J., Zi, C., Niu, Y., Tian, H., Xiu, C. (2015). Gait recognition using GEI and AFDEI. International Journal of Optics, 2015: 763908. https://doi.org/10.1155/2015/763908
[45] Pinčić, D., Sušanj, D., Lenac, K. (2022). Gait recognition with self-supervised learning of gait features based on vision transformers. Sensors, 22(19): 7140. https://doi.org/10.3390/s22197140
[46] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://doi.org/10.48550/arXiv.2010.11929
[47] Chao, H., He, Y., Zhang, J., Feng, J. (2019). Gaitset: Regarding gait as a set for cross-view gait recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1): 8126-8133. https://doi.org/10.1609/aaai.v33i01.33018126
[48] Liu, Y., Zeng, Y., Pu, J., Shan, H., He, P., Zhang, J. (2021). Selfgait: A spatiotemporal representation learning method for self-supervised gait recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, pp. 2570-2574. https://doi.org/10.1109/ICASSP39728.2021.9413894
[49] Zhang, Y., Huang, Y., Yu, S., Wang, L. (2019). Cross-view gait recognition by discriminative feature learning. IEEE Transactions on Image Processing, 29: 1001-1015. https://doi.org/10.1109/TIP.2019.2926208
[50] Chen, X., Luo, X., Weng, J., Luo, W., Li, H., Tian, Q. (2021). Multi-view gait image generation for cross-view gait recognition. IEEE Transactions on Image Processing, 30: 3041-3055. https://doi.org/10.1109/TIP.2021.3055936
[51] Teepe, T., Gilg, J., Herzog, F., Hörmann, S., Rigoll, G. (2022). Towards a deeper understanding of skeleton-based gait recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, pp. 1569-1577. https://doi.org/10.1109/CVPRW56347.2022.00163
[52] Xu, C., Makihara, Y., Ogi, G., Li, X., Yagi, Y., Lu, J. (2017). The OU-ISIR gait database comprising the large population dataset with age and performance evaluation of age estimation. IPSJ Transactions on Computer Vision and Applications, 9(1): 24. https://doi.org/10.1186/s41074-017-0035-2
[53] Stöckel, T., Jacksteit, R., Behrens, M., Skripitz, R., Bader, R., Mau-Moeller, A. (2015). The mental representation of the human gait in young and older adults. Frontiers in Psychology, 6: 943. https://doi.org/10.3389/fpsyg.2015.00943
[54] Chen, C., Liang, J., Zhao, H., Hu, H., Tian, J. (2009). Frame difference energy image for gait recognition with incomplete silhouettes. Pattern Recognition Letters, 30(11): 977-984. https://doi.org/10.1016/j.patrec.2009.04.012
[55] Kim, D. (2013). Sobel operator and canny edge detector. http://www.egr.msu.edu/classes/ece480/capstone/fall13/group04/docs/danapp.pdf.
[56] Torre, V., Poggio, T. A. (1986). On edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2): 147-163. https://doi.org/10.1109/TPAMI.1986.4767769
[57] Muthukrishnan, R., Radha, M. (2011). Edge detection techniques for image segmentation. International Journal of Computer Science & Information Technology, 3(6): 259-267.
[58] Nasrudin, M.W., Yaakob, N. S., Rahim, N.A.A., Ahmad, M.Z.Z., Ramli, N., Rashid, M.S.A. (2021). Moment invariants technique for image analysis and its applications: A review. Journal of Physics: Conference Series, 1962: 012028. https://doi.org/10.1088/1742-6596/1962/1/012028
[59] Flusser, J., Zitova, B., Suk, T. (2009). Moments and Moment Invariants in Pattern Recognition. John Wiley & Sons.
[60] Lango, M., Stefanowski, J. (2022). What makes multi-class imbalanced problems difficult? An experimental study. Expert Systems with Applications, 199: 116962. https://doi.org/10.1016/j.eswa.2022.116962
[61] Johnson, J.M., Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(1): 27. https://doi.org/10.1186/s40537-019-0192-5