© 2025 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
Respiratory diseases account for 14.1% of global deaths annually, as reported by the World Health Organization, underscoring the urgent need for advanced diagnostic tools. This study investigates the classification of respiratory sounds using machine learning techniques applied to the ICBHI 2017 dataset. To address class imbalance and improve generalizability, augmentation methods such as time masking and stretching were implemented. Feature extraction techniques, including Mel-Frequency Cepstral Coefficients (MFCCs), were combined with lightweight classifiers like KNN and SVM, achieving a classification accuracy of 99%. The proposed approach surpasses prior benchmarks while maintaining computational efficiency, making it suitable for deployment on edge devices in resource-constrained healthcare environments.
audio augmentation, edge devices, feature extraction techniques, ICHBI 2017 dataset, machine learning classifiers, MFCC mean, MFCC (Mean+Std), respiratory diseases
Respiratory diseases remain a significant global health burden, accounting for a considerable portion of global morbidity and mortality. According to the World Health Organization (WHO) Global Health Statistics Report 2022 [1], respiratory diseases contribute to approximately 14.1% of global deaths annually, underscoring the urgent need for advanced diagnostic tools. This study focuses on leveraging machine learning techniques to enhance the efficiency, accuracy, and objectivity of respiratory sound analysis, particularly in resource-constrained environments.
Machine learning offers a promising avenue for enhancing respiratory sound analysis. Using automated techniques through machine learning for diagnostic precision in image analysis might enhance accuracy, objectivity, and efficiency, potentially minimizing patient interventions and improving patient care. The research article aims to develop a robust and computationally complex classification model leveraging machine-learning techniques to classify respiratory sounds while accounting for class imbalances and computational complexities.
One such public dataset is the ICBHI 2017 Respiratory Sound Database [2], which contains a wide range of respiratory sounds (tailed breaths, wheezes, and crackles) and serves as a test case for developing and comparing respiratory sound classifiers. So far, several studies have exploited machine learning methods [3] to analyze respiratory sounds derived from this dataset, including wavelet coefficients [4], convolutional neural networks, and deep neural networks [5]. However, difficulties, including class imbalance and computational cost, hinder large-scale deployment.
Our selected classification methods were hyperparameter-tuned using the ICBHI 2017 dataset. Our suggested model uses MFCC Mean, MFCC, and Chroma-Mean_std to obtain great accuracy with little computational cost on extremely low specifications. However, our solutions work best suited for a limited applications compound with resource-constrained environments. Early test results are promising and suggest therapeutic application.
Introduction to Respiratory Sound Analysis and Edge Computing: Respiratory diseases are a significant global health burden that requires rapid and precise diagnosis. Conventional auscultation methods are subjective and variable [6]. Automated respiratory sound analysis via machine learning is a promising approach toward more objective and efficient assessments. The growing availability of mobile and edge computing devices enables point-of-care diagnostics, underscoring the need for computationally efficient solutions.
Deep Learning for Classifying Respiratory Sound: Deep learning models, especially Convolutional Neural Networks, have shown remarkable performance in respiratory sound classification [7-11]. However, their computational demands can hinder deployment on edge devices. Hybrid CNN-RNN architectures [12, 13] and other deep learning methods [14, 15] have been explored, but computational efficiency remains a critical challenge.
Efficient Feature Extraction for Edge Devices: Consequently, the ratio between accuracy and computational cost relies on effective feature extraction. Features from the spectrum domain, including Mel-Frequency Cepstral Coefficients [16, 17], are employed. Feature selection is essential for optimising performance on resource-limited devices. Alternative methods for extracting relevant information from respiratory sounds include wavelet transforms and various signal-processing techniques [18-22]. It is crucial to investigate lower complexity features for implementation in edge devices.
Machine Learning on Edge Devices: The simplification of respiratory sound classifications task on the edge devices by Efficient feature extraction and less computationally demanding machine learning models. Such a practical point-of-care diagnostics solution will likely enhance access to timely and accurate respiratory status measurements. A recent study [23] the feasibility of abnormal sound detection with deep learning and a vest-coat stethoscope, a practical application. Such systems demonstrate evidence of the integration of wearables and edge computing for respiratory health monitoring. It is important to conduct further research to optimize machine learning models to make them work at lower energy levels or on devices with limited resources.
Public Datasets and Challenges: The ICBHI, which aims to automatically detect adventitious respiratory sounds on a large, standardized dataset, and the emergence of public respiratory sound datasets [24] have propelled research in this area considerably. These datasets are helpful for detailed evaluations and comparisons between methods, leading to collaborative innovations.
Addressing Challenges: Noise Reduction and Data Augmentation Specifically, various noise reduction methods like spectral subtraction, as well as wavelet denoising, and data augmentation methods like adding noise, time stretching, and pitch shifting, are essential to achieve the robustness and generalizability of respiratory sound classification models [25]. Finding ways to overcome these challenges has the potential to improve the performance of ML models, especially in-practice clinical scenarios.
This research focuses on developing a state-of-the-art machine learning model of good performance and low computation for automatic classification of respiratory sounds based on the ICBHI 2017 dataset trained on good responses of the ICBHI 2017 dataset. The class imbalance is also tackled through data augmentation and hyperparameter tuning, benefiting its real-world clinical usability in place of traditional analysis methods. In particular, the aims are to:
Table 1. Classification of respiratory sounds by number of cycles
Disease/Class |
Number of Cycles |
Percentage of Total (%) |
Healthy/Normal |
1,200 |
17.4% |
Crackles (e.g., Pneumonia, COPD) |
1,720 |
25.0% |
Wheezes (e.g., Asthma, Bronchitis) |
920 |
13.3% |
Crackles+Wheezes (Mixed Respiratory Conditions) |
3,058 |
44.3% |
Total |
6,898 |
100% |
This dataset was specifically designed for benchmarking machine learning models in respiratory sound classification, providing a diverse set of examples for model evaluation.
We used a few audio data augmentation methods to avoid class imbalance and increase the robustness of our trained models [29-31]. These methods amplify the training dataset while maintaining significant content and mitigate the dangers of data shortages and overfitting. To address class imbalance and improve model generalizability, we applied the following augmentation techniques using the librosa library in Python:
These augmentations expanded the training dataset and minimized overfitting by balancing the class distributions.
These audio augmentation techniques improve model efficiency by expanding the training set and reducing the risk of overfitting associated with limited data [32, 33].
The proposed method for respiratory sound classification outlines a structured methodology for classifying ICBHI lung sound data into normal and abnormal categories. The process begins with normalisation of the raw lung sound recordings to ensure uniformity across the dataset.
Figure 1. Workflow of the proposed respiratory sound classification method
To improve the dataset's variability and resilience, audio augmentation techniques such as time masking, time shifting, and time stretching are applied. The augmented audio is then transformed into spectrograms, enabling visual representation of the frequency and temporal features of the sounds. Following this, 80 feature extraction techniques are utilised to extract meaningful characteristics from the spectrograms.
A feature selection process is then employed to identify the most relevant features by evaluating their effectiveness across various machine learning (ML) models. These optimised features are subsequently used as input for ML classifiers, which categorise the lung sounds as either normal or abnormal, facilitating effective diagnosis and analysis. The proposed method for respiratory sound classification is illustrated in Figure 1.
The audio database ICBHI 2017 was employed in the current study with 920 audio signals and 6898 respiratory cycles [34]. The samples were collected in a way that the COPD class has more samples as compared to the Asthma and LRTI classes which have a paucity of samples. To this end, data augmentation was used.
Your "Normalizing and Augmentation" section provides essential details about your preprocessing steps. Here are a few suggestions for improvement:
4.1 Normalizing and augmentation
The respiratory sound signals were segmented into individual breaths, each 4 seconds long, with a sampling rate of 16000 Hz. To address class imbalance and improve model robustness, we applied several data augmentation techniques using the librosa library in Python:
These augmentations were applied to the under-represented classes to achieve a balanced class distribution of approximately 1:1 with the majority class in ICBHI 2017 dataset. Additionally, the amplitude of each breath segment was normalized using peak normalization to ensure consistent input levels to the model.
Table 2 represents frequencies of the records in each group. Table 3 shows the number of records which have been chosen for augmentation and the number of times each of them has been augmented.
Time Masking: Random sections of the audio were silenced, simulating missing information and encouraging the model to learn more robust features. The duration of masked sections was randomly chosen between 0.1 and 0.2 seconds.
Time Shifting: Sections of the audio were randomly shifted in time by up to ±50 ms, increasing the model's tolerance to variations in speech rate.
Table 2. Number of records in each class
Class |
No. of Records |
Healthy |
35 |
COPD |
793 |
Bronchiectasis |
16 |
Bronchiolitis |
13 |
Pneumonia |
37 |
URTI |
23 |
Asthma |
1 |
LRTI |
2 |
Total |
920 |
Table 3. Number time each record is augmented
Class |
Selected Records |
No. of Time Each Record is Augmented |
Total Records |
Healthy |
35 |
30 |
1050 |
Bronchiectasis |
16 |
13 |
208 |
Bronchiolitis |
13 |
16 |
208 |
Pneumonia |
37 |
6 |
222 |
URTI |
23 |
9 |
207 |
COPD |
200 |
1 |
200 |
Total records for training and testing |
3140 |
Time Stretching: The duration of the audio was altered without changing the pitch, using a random stretching factor between 0.8 and 1.2. This augmentation further enhances the model's robustness to variations in speech timing
These augmentations were applied to the under-represented classes to achieve a balanced class distribution of approximately 1:1 with the majority class in ICBHI 2017 dataset. Additionally, the amplitude of each breath segment was normalized using peak normalization to ensure consistent input levels to the model.
Table 2 represents frequencies of the records in each group. Table 3 shows the number of records which have been chosen for augmentation and the number of times each of them has been augmented.
4.1.1 Time masking
Randomly masked audio segments between 100-200 ms using the librosa library to simulate missing information. Is the act of allowing a specific portion of the signal in auditory to the listener at a specific time. In this approach parts of the time are set to zero randomly; this in turn causes the time structure in the audio signal to be lost thus the model relies on other properties of sound.
Y(t)=x(t)∗m(t) (1)
In Eq. (1), Y(t) denotes the masked signal and x(t) and m(t) is an original audio signal. where m(t) equals to 1 for the cases when segment is preserved and equals to 0 when segment is masked.
4.1.2 Time shifting
Is the process of shifting an audio signal in time or in other words, moving it ahead or behind in time but not changing its content. This manipulation is done by moving the signal along the time axis in some way.
y(t)=x(t−Δt) (2)
where, in the above Eq. (2), y(t) is the shifted signal, x(t) is the original signal and Δt is the time shift. If Δt is positive, then the signal is delayed and if Δt is negative then the signal is advanced.
4.1.3 Time stretching
Is the method of altering the temporal characteristics of an audio signal without altering the frequency content of the signal. In mathematical terms time stretching can be expressed by modifying the Short-Time Fourier Transform (STFT) of the sound signal.
STFT(x(t))=∑∞n=−∞x(t)⋅w(t−nH)⋅e−jωt (3)
In Eq. (3), x(t) is the input audio signal, w(t-nH) denotes the windowing function that isolates a small portion of the signal around time t. The parameter H controls the window size. n is the integer index which shifts the window across the signal. e-jωt: This is a complex exponential that represents the frequency component at ω.
4.1.4 The pseudocode for data preparation
x(n): Input audio signal
T1, T2, T3, T4: Time masking, shifting, time stretching augmentation functions.
x'(n) = T4(T3(T2(T1(x(n)))))
x''(n) = x'(n)x'(n)·...,x'(n))
y(n) = truncate_or_pad(x''(n), L)
As depicted in Figure 1 the waveform and spectrogram of the audio sample is depicted before applying any type of audio augmentation. The waveform and spectrogram of the audio when audio augmentation techniques are applied is presented in Figure 2 and Figure 3.
Figure 2. Waveform and spectrogram before audio augmentation
Figure 3. Waveform and spectrogram after audio augmentation
It is a fundamental stage of audio processing that involves turning unprocessed input into a numerical format. A total of 80 features were taken from audio signals for this project. These characteristics fall into several categories.
The following categories apply to the features that have been added to the system:
The systematic feature extraction approach facilitates the analysis of each acoustic property's contribution to classification. This analysis enables identifying and eliminating superfluous features, leading to a more robust and interpretable model. Table 4 delineates the 80 features retrieved from the audio signals. Table 5 presents samples of extracted feature values alongside their corresponding labels.
Table 4. A set of 80 features were used to extract from the audio signals
Sl. No |
Feature Extraction Technique |
Sl. No |
Feature Extraction Technique |
Sl. No |
Feature Extraction Technique |
Sl. No |
Feature Extraction Technique |
1 |
chroma_cens_mean = [] |
21 |
rolloff_var = [] |
41 |
mfccs_mean_10 = [] |
61 |
chroma_mean_4 = [] |
2 |
chroma_cens_std = [] |
22 |
zcr_mean = [] |
42 |
mfccs_mean_11 = [] |
62 |
chroma_mean_5 = [] |
3 |
chroma_cens_var = [] |
23 |
zcr_std = [] |
43 |
mfccs_mean_12 = [] |
63 |
chroma_mean_6 = [] |
4 |
mel_mean = [] |
24 |
zcr_var = [] |
44 |
mfccs_std_0 = [] |
64 |
chroma_mean_7 = [] |
5 |
mel_std = [] |
25 |
harm_mean = [] |
45 |
mfccs_std_1 = [] |
65 |
chroma_mean_8 = [] |
6 |
mel_var = [] |
26 |
harm_std = [] |
46 |
mfccs_std_2 = [] |
66 |
chroma_mean_9 = [] |
7 |
mfcc_mean = [] |
27 |
harm_var = [] |
47 |
mfccs_std_3 = [] |
67 |
chroma_mean_10 = [] |
8 |
mfcc_std = [] |
28 |
perc_mean = [] |
48 |
mfccs_std_4 = [] |
68 |
chroma_mean_11 = [] |
9 |
mfcc_var = [] |
29 |
perc_std = [] |
49 |
mfccs_std_5 = [] |
69 |
chroma_std_0 = [] |
10 |
mfcc_delta_mean = [] |
30 |
perc_var = [] |
50 |
mfccs_std_6 = [] |
70 |
chroma_std_1 = [] |
11 |
mfcc_delta_std = [] |
31 |
mfccs_mean_0 = [] |
51 |
mfccs_std_7 = [] |
71 |
chroma_std_2 = [] |
12 |
mfcc_delta_var = [] |
32 |
mfccs_mean_1 = [] |
52 |
mfccs_std_8 = [] |
72 |
chroma_std_3 = [] |
13 |
cent_mean = [] |
33 |
mfccs_mean_2 = [] |
53 |
mfccs_std_9 = [] |
73 |
chroma_std_4 = [] |
14 |
cent_std = [] |
34 |
mfccs_mean_3 = [] |
54 |
mfccs_std_10 = [] |
74 |
chroma_std_5 = [] |
15 |
cent_var = [] |
35 |
mfccs_mean_4 = [] |
55 |
mfccs_std_11 = [] |
75 |
chroma_std_6 = [] |
16 |
spec_bw_mean = [] |
36 |
mfccs_mean_5 = [] |
56 |
mfccs_std_12 = [] |
76 |
chroma_std_7 = [] |
17 |
spec_bw_std = [] |
37 |
mfccs_mean_6 = [] |
57 |
chroma_mean_0 = [] |
77 |
chroma_std_8 = [] |
18 |
spec_bw_var = [] |
38 |
mfccs_mean_7 = [] |
58 |
chroma_mean_1 = [] |
78 |
chroma_std_9 = [] |
19 |
rolloff_mean = [] |
39 |
mfccs_mean_8 = [] |
59 |
chroma_mean_2 = [] |
79 |
chroma_std_10 = [] |
20 |
rolloff_std = [] |
40 |
mfccs_mean_9 = [] |
60 |
chroma_mean_3 = [] |
80 |
chroma_std_11 = [] |
Table 5. Extracted feature values and labels from the sample records
Sl. No |
File_Name |
144_1b1_Al_sc_aug_3.wav |
201_1b2_Ar_sc_aug_9.wav |
125_1b1_Tc_sc_aug_27.wav |
150_1b2_Al_sc_aug_2.wav |
221_2b3_Lr_mc_LittC2SE_aug_1.wav |
1 |
chroma_cens_mean |
0.26506 |
0.27453 |
0.25834 |
0.27745 |
0.27594 |
2 |
chroma_cens_std |
0.11436 |
0.08927 |
0.12882 |
0.07973 |
0.0848 |
3 |
chroma_cens_var |
0.01308 |
0.00797 |
0.0166 |
0.00636 |
0.00719 |
4 |
melspectrogram_mean |
0.6469 |
0.07583 |
1.39133 |
2.88862 |
0.99446 |
5 |
melspectrogram_std |
7.67458 |
0.94867 |
24.7201 |
34.8482 |
13.5884 |
6 |
melspectrogram_var |
58.89925 |
0.899968 |
611.0847 |
1214.395 |
184.6454 |
7 |
mfcc_mean |
-9.48785 |
-13.6381 |
-4.5474 |
-2.44224 |
-6.56647 |
8 |
mfcc_std |
141.9098 |
159.665 |
115.2643 |
128.0599 |
139.7678 |
9 |
mfcc_var |
20138.39 |
25492.9 |
13285.86 |
16399.33 |
19535.04 |
10 |
mfcc_delta_mean |
0.121052 |
0.046281 |
0.05109 |
0.007315 |
0.006629 |
11 |
mfcc_delta_std |
2.399807 |
2.465145 |
3.753295 |
3.847119 |
2.602856 |
12 |
mfcc_delta_var |
5.75907 |
6.07694 |
14.0872 |
14.8003 |
6.77486 |
13 |
mfccs_mean_0 |
-487.79 |
-541.46 |
-366.46 |
-427.67 |
-469.72 |
14 |
mfccs_mean_1 |
124.148 |
180.208 |
152.144 |
122.285 |
129.259 |
15 |
mfccs_mean_2 |
51.6949 |
46.0076 |
12.7227 |
67.2487 |
88.7582 |
Figure 4. Heatmap for all the features
It is a basic level of audio processing which is a process of converting raw data into a numerical form. The study extracted 80 features categorized as follows:
MFCCk(t)=∑Mm=1log(Sm)cos[kπ(m−0.5)M] (4)
where, Sm is the mel spectrum, and M is the number of mel filters.
This systematic feature extraction ensures comprehensive representation of the audio signals, facilitating high-accuracy classification. This analysis enables the identification and potential removal of less relevant features, leading to a more efficient and interpretable model. Table 2 lists the 80 features extracted from the audio signals. Table 3 presents examples of extracted feature values and their corresponding labels. Figure 4 provides a heatmap visualization of the relationships between all features used in the model.
Table 6 represents grouping 80 features into 9 feature set groups allows for more efficient data management and analysis by categorizing similar features together [36]. This approach enhances the interpretability of the data, making it easier to understand and visualize the relationships between different features. In addition, it can improve the performance of machine learning models by reducing redundancy and focusing on the most relevant feature subsets for specific tasks [37].
Table 6. Features are grouped
Sl. No |
Grouped Features |
1 |
df_feature_all = df3.iloc[:, 1:-1] |
2 |
df_feature_mel_chroma = df3.iloc[:, 2:7] |
3 |
df_feature_mfcc_mean = df3.iloc[:, 13:26] |
4 |
df_feature_mfcc_std = df3.iloc[:, 26:39] |
5 |
df_feature_mfcc = df3.iloc[:, 13:39] |
6 |
df_feature_chroma_mean = df3.iloc[:, 39:51] |
7 |
df_feature_chroma_std = df3.iloc[:, 51:63] |
8 |
df_feature_chroma = df3.iloc[:, 39:63] |
9 |
df_feature_csrzhp = df3.iloc[:, 63:81] |
MFCCk(t)=∑M−1m=0log(M[m,t])⋅cos(πkM(m+12)),k=0,1,…,12 (5)
where, MFCCk(t) denotes the k-th Mel-Frequency Cepstral Coefficient at time frame t, ∑M−1m=0 signifies a summation across M Mel filter banks. The log(M[m,t]) denotes the logarithm of the Mel filter bank output at index m and time frame t. The (πkM(m+12)) represents the cosine function utilized in the Discrete Cosine Transform (DCT) and k=0, 1, …, 12 signifies that the MFCCs are calculated for k values ranging from 0 to 12.
The advantages of amalgamating 13 MFCC features into a singular feature vector are as follows. The methods utilized in generating these audio signal representations provide a more succinct and efficient form of the signal, preserving the most pertinent elements of the spectral content. It fundamentally diminishes the dimensionality of the data, facilitating processing and analysis. Mean values of the MFCCs are also calculated, which provide helpful information about the generic spectral envelope of the signal, enabling discrimination between sounds and extract patterns. The method enables the extraction of stable characteristics from dynamic and transient inputs, such as heartbeats, voice, and music, enhancing model performance in classification tasks. It also assists in reducing noise and variability, leading to a more stable and precise evaluation.
μMFCCi=1T∑t=1TMFCCi(t) for i=0,1,…,12 (6)
μMFCCi calculates the mean value μMFCCi of each MFCC coefficient by averaging the MFCC values across all time frames.
mfccs−stdi=√T∑t=1(MFCCi(t)−mfccs_meani)2 for i=0,1,…,12 (7)
where, (mfccs_stdi) calculates the standard deviation of all Mel-Frequency Cepstral Coefficient (MFCC) values across all time frames is given here. The term 1T is the normalization factor, where T is the total number of time frames, ∑Tt=1(MFCCi(t)−mfccs_meani)2 aggregates the squared differences between the i-th MFCC values and their mean (mfccs_meani) across all frames.
chroma_mean i=1T∑tt=1TChromai(t) for i=0,1,…,11 (8)
csrzhp(x)={Cs(x),Bw(x),Rf(x),Zr(x),Hp(x)} (9)
This csrzhp(x) is the grouping of these spectral feature extractions applied to the input audio signal x.
Where Cs is a Spectral Centroid, Bw is a Spectral Bandwidth, Rf is a Spectral Roll-off, Zr is a Zero Crossing Rate and Hp is a Harmonic and Percussive Content. This equation integrates multiple spectral and temporal aspects into a unified feature vector, offering an extensive description of the audio signal's attributes.
One of the significant problems is the variability of the respiratory sounds that are physiological and may depend on age, gender, and other factors. Also, there is no specific way of data gathering and the data itself is quite limited and not very diverse, which in its turn influences the creation of better models. In view of the above clinical need for rapid and precise identification, the researchers need the development of fast and non-invasive diagnostic systems that can operate in situations of high noise and variability [39-41].
In this research article, a framework is presented meticulously for comparing the effectiveness of several machine learning algorithms in classifying lung sounds. A complex experimental design was developed, utilizing several classifiers, including Decision Trees, Random Forest, and Support Vector Machines. To assess the impact of feature representation, various feature sets were constructed based on extracted features from audio signals, including Mel-Frequency Cepstral Coefficients, chroma features, and other acoustic features. The model was trained and tested using different training and testing data splits (0.2, 0.3, 0.4, and 0.5) to evaluate model stability.
The feature sets (MFCC Mean, MFCC Mean+Std, and Chroma Mean+Std) also yielded good results; however, the most comprehensive feature set, "All Features, MFCC (Mean+Std), achieved the highest accuracy. Tables 7-10 present the average accuracy for All Features, MFCC Mean, MFCC Mean+Std, and Chroma Mean+Std at a split ratio of 0.20. Table 11 shows the accuracy, F1-score, precision, recall, and ROC scores for these feature groups, with All Features and MFCC (Mean_std) achieving good accuracies.
The combination of MFCC Mean+Std features with KNN and SVM classifiers achieved the highest accuracy (99%). This performance can be attributed to the robust feature representation of MFCCs, which effectively capture spectral properties crucial for distinguishing respiratory sounds.
Compared to state-of-the-art methods:
Comparative Analysis:
Table 7. ML models average accuracy with split ratio 0.20
Sl. No |
Feature |
All |
Mel + Chroma |
MFCC mean |
MFCC std |
MFCC (mean+std) |
Chroma mean |
Chroma std |
Chroma (mean+std) |
c+s+r+z+h+p |
1 |
Decision Tree |
0.98 |
0.98 |
0.98 |
0.97 |
0.98 |
0.90 |
0.83 |
0.92 |
0.98 |
2 |
Random Forest |
1.00 |
0.99 |
1.00 |
1.00 |
1.00 |
0.98 |
0.90 |
0.98 |
0.99 |
3 |
Gradient Boost |
0.99 |
0.98 |
0.98 |
0.99 |
1.00 |
0.935 |
0.84 |
0.93 |
0.99 |
4 |
XGBoost |
1.00 |
0.99 |
0.99 |
0.99 |
1.00 |
0.98 |
0.91 |
0.97 |
1.00 |
5 |
Ada Boost |
0.99 |
0.90 |
0.97 |
0.93 |
0.96 |
0.83 |
0.72 |
0.87 |
0.98 |
6 |
Extra Tree |
1.00 |
0.99 |
1.00 |
1.00 |
0.99 |
0.98 |
0.94 |
0.99 |
0.99 |
7 |
K-Neighbors |
0.98 |
0.84 |
0.99 |
0.98 |
0.99 |
0.97 |
0.93 |
0.97 |
0.92 |
8 |
Support Vector |
0.99 |
0.64 |
0.97 |
0.96 |
0.99 |
0.95 |
0.90 |
0.98 |
0.87 |
9 |
Gausian Naïve |
0.62 |
0.54 |
0.76 |
0.69 |
0.79 |
0.58 |
0.66 |
0.67 |
0.57 |
10 |
Multi-layer |
0.99 |
0.65 |
0.98 |
0.98 |
0.99 |
0.94 |
0.84 |
0.96 |
0.91 |
11 |
Logistic Reg |
0.96 |
0.58 |
0.77 |
0.65 |
0.84 |
0.59 |
0.68 |
0.70 |
0.78 |
Maximum |
1.00 |
0.99 |
1.00 |
1.00 |
1.00 |
0.98 |
0.94 |
0.99 |
1.00 |
|
Average accuracy |
0.95 |
0.82 |
0.94 |
0.92 |
0.96 |
0.87 |
0.83 |
0.90 |
0.91 |
Table 8. ML models average accuracy with split ratio 0.30
Sl. No |
Feature |
All |
Mel + Chroma |
MFCC mean |
MFCC std |
MFCC (mean+std) |
Chroma mean |
Chroma std |
Chroma (mean+std) |
c+s+r+z+h+p |
1 |
Decision Tree |
0.99 |
0.96 |
0.97 |
0.96 |
0.97 |
0.93 |
0.79 |
0.90 |
0.98 |
2 |
Random Forest |
1.00 |
0.98 |
1.00 |
1.00 |
1.00 |
0.97 |
0.89 |
0.97 |
0.99 |
3 |
Gradient Boost |
1.00 |
0.97 |
0.98 |
0.99 |
1.00 |
0.94 |
0.82 |
0.93 |
0.99 |
4 |
XGBoost |
0.99 |
0.98 |
0.99 |
0.99 |
0.99 |
0.97 |
0.93 |
0.97 |
0.99 |
5 |
Ada Boost |
0.99 |
0.90 |
0.95 |
0.93 |
0.98 |
0.82 |
0.73 |
0.83 |
0.97 |
6 |
Extra Tree |
1.00 |
0.98 |
1.00 |
1.00 |
1.00 |
0.98 |
0.93 |
0.99 |
0.99 |
7 |
K-Neighbors |
0.99 |
0.82 |
0.99 |
0.98 |
0.99 |
0.96 |
0.93 |
0.96 |
0.91 |
8 |
Support Vector |
1.00 |
0.62 |
0.95 |
0.94 |
0.99 |
0.94 |
0.89 |
0.97 |
0.86 |
9 |
Gausian Naïve |
0.60 |
0.53 |
0.73 |
0.68 |
0.77 |
0.57 |
0.65 |
0.67 |
0.56 |
10 |
Multi-layer |
0.99 |
0.62 |
0.96 |
0.96 |
0.99 |
0.93 |
0.84 |
0.96 |
0.91 |
11 |
Logistic Reg |
0.94 |
0.57 |
0.76 |
0.65 |
0.81 |
0.59 |
0.66 |
0.71 |
0.77 |
Maximum |
1.00 |
1.00 |
0.98 |
1.00 |
1.00 |
1.00 |
0.98 |
0.93 |
0.99 |
|
Average accuracy |
0.95 |
0.95 |
0.81 |
0.93 |
0.92 |
0.95 |
0.87 |
0.83 |
0.90 |
Table 9. ML models average accuracy with split ratio 0.40
Sl. No |
Feature |
All |
Mel + Chroma |
MFCC mean |
MFCC std |
MFCC (mean+std) |
Chroma mean |
Chroma std |
Chroma (mean+std) |
c+s+r+z+h+p |
1 |
Decision Tree |
0.98 |
0.97 |
0.97 |
0.94 |
0.98 |
0.87 |
0.79 |
0.88 |
0.98 |
2 |
Random Forest |
1.00 |
0.98 |
0.99 |
0.99 |
0.99 |
0.97 |
0.89 |
0.98 |
0.99 |
3 |
Gradient Boost |
0.99 |
0.98 |
0.98 |
0.98 |
0.99 |
0.92 |
0.83 |
0.93 |
0.99 |
4 |
XGBoost |
0.99 |
0.98 |
0.99 |
0.99 |
0.99 |
0.96 |
0.91 |
0.96 |
0.99 |
5 |
Ada Boost |
0.99 |
0.92 |
0.95 |
0.93 |
0.97 |
0.81 |
0.74 |
0.83 |
0.95 |
6 |
Extra Tree |
0.99 |
0.98 |
1.00 |
0.99 |
1.00 |
0.99 |
0.93 |
0.99 |
0.99 |
7 |
K-Neighbors |
0.98 |
0.80 |
0.98 |
0.97 |
0.99 |
0.96 |
0.93 |
0.96 |
0.90 |
8 |
Support Vector |
1.00 |
0.61 |
0.94 |
0.94 |
0.98 |
0.94 |
0.88 |
0.97 |
0.85 |
9 |
Gausian Naïve |
0.58 |
0.52 |
0.71 |
0.66 |
0.74 |
0.58 |
0.65 |
0.66 |
0.54 |
10 |
Multi-layer |
0.99 |
0.625 |
0.95 |
0.96 |
0.99 |
0.91 |
0.81 |
0.96 |
0.88 |
11 |
Logistic Reg |
0.94 |
0.57 |
0.76 |
0.65 |
0.81 |
0.61 |
0.66 |
0.71 |
0.75 |
Maximum |
1.00 |
1.00 |
0.98 |
1.00 |
0.99 |
1.00 |
0.99 |
0.93 |
0.99 |
|
Average accuracy |
0.95 |
0.95 |
0.81 |
0.93 |
0.91 |
0.95 |
0.86 |
0.82 |
0.89 |
Table 10. ML models average accuracy with split ratio 0.50
Sl. No |
Feature |
All |
Mel + Chroma |
MFCC mean |
MFCC std |
MFCC (mean+std) |
Chroma mean |
Chroma std |
Chroma (mean+std) |
c+s+r+z+h+p |
1 |
Decision Tree |
0.97 |
0.96 |
0.98 |
0.94 |
0.97 |
0.88 |
0.77 |
0.87 |
0.97 |
2 |
Random Forest |
1.00 |
0.98 |
0.99 |
0.99 |
0.99 |
0.96 |
0.89 |
0.97 |
0.99 |
3 |
Gradient Boost |
0.99 |
0.97 |
0.98 |
0.97 |
0.99 |
0.90 |
0.82 |
0.92 |
0.99 |
4 |
XGBoost |
0.99 |
0.97 |
0.99 |
0.99 |
0.99 |
0.95 |
0.90 |
0.95 |
0.99 |
5 |
Ada Boost |
0.98 |
0.91 |
0.95 |
0.92 |
0.97 |
0.80 |
0.72 |
0.82 |
0.96 |
6 |
Extra Tree |
0.99 |
0.98 |
1.00 |
0.99 |
0.99 |
0.98 |
0.93 |
0.98 |
0.99 |
7 |
K-Neighbors |
0.97 |
0.79 |
0.97 |
0.96 |
0.97 |
0.94 |
0.93 |
0.95 |
0.88 |
8 |
Support Vector |
0.99 |
0.62 |
0.93 |
0.92 |
0.98 |
0.92 |
0.87 |
0.97 |
0.85 |
9 |
Gausian Naïve |
0.60 |
0.51 |
0.71 |
0.66 |
0.74 |
0.59 |
0.65 |
0.68 |
0.54 |
10 |
Multi-layer |
0.99 |
0.61 |
0.93 |
0.93 |
0.99 |
0.90 |
0.80 |
0.95 |
0.87 |
11 |
Logistic Reg |
0.94 |
0.57 |
0.75 |
0.66 |
0.82 |
0.62 |
0.65 |
0.72 |
0.77 |
Maximum |
1.00 |
1.00 |
0.98 |
1.00 |
0.99 |
0.99 |
0.98 |
0.93 |
0.98 |
|
Average accuracy |
0.95 |
0.95 |
0.81 |
0.93 |
0.90 |
0.95 |
0.86 |
0.81 |
0.89 |
Table 11. ML models average accuracies on All features, MFCC mean, MFCC mean+std, chroma mean+std
Sl. No |
Classifiers |
All_feature |
MFCC mean |
MECC (mean+std) |
Chroma (mean+std) |
||||||||||||
acc |
f1 |
prec |
recall |
acc |
f1 |
prec |
recall |
acc |
f1 |
prec |
recall |
acc |
f1 |
prec |
recall |
||
1 |
Decision Tree |
0.98 |
0.98 |
0.99 |
0.98 |
0.98 |
0.98 |
0.98 |
0.99 |
0.99 |
0.98 |
0.99 |
0.99 |
0.83 |
0.82 |
0.85 |
0.81 |
2 |
Random Forest |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
0.99 |
1.00 |
1.00 |
0.91 |
0.91 |
0.91 |
0.90 |
3 |
Gradient Boost |
0.99 |
0.99 |
1.00 |
0.99 |
0.98 |
0.98 |
0.99 |
0.98 |
1.00 |
1.00 |
1.00 |
1.00 |
0.84 |
0.83 |
0.85 |
0.82 |
4 |
XGBoost |
1.00 |
1.00 |
1.00 |
1.00 |
0.99 |
0.99 |
1.00 |
0.99 |
1.00 |
1.00 |
1.00 |
1.00 |
0.91 |
0.91 |
0.93 |
0.89 |
5 |
Ada Boost |
0.99 |
0.99 |
0.99 |
0.99 |
0.97 |
0.96 |
0.98 |
0.95 |
0.96 |
0.96 |
0.97 |
0.96 |
0.72 |
0.72 |
0.71 |
0.73 |
6 |
Extra Tree |
1.00 |
0.99 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
0.99 |
1.00 |
0.99 |
0.94 |
0.94 |
0.94 |
0.93 |
7 |
K-Neighbors |
0.98 |
0.98 |
1.00 |
0.97 |
0.99 |
0.99 |
1.00 |
0.98 |
0.99 |
0.99 |
1.00 |
0.99 |
0.93 |
0.92 |
0.98 |
0.87 |
8 |
Support Vector |
0.99 |
0.99 |
1.00 |
0.99 |
0.97 |
0.96 |
1.00 |
0.94 |
0.99 |
0.99 |
1.00 |
0.98 |
0.90 |
0.90 |
0.90 |
0.90 |
9 |
Gausian Naive |
0.62 |
0.47 |
0.72 |
0.35 |
0.76 |
0.74 |
0.76 |
0.72 |
0.79 |
0.78 |
0.78 |
0.79 |
0.66 |
0.62 |
0.67 |
0.59 |
10 |
Multi-layer |
0.99 |
0.99 |
1.00 |
0.99 |
0.96 |
0.98 |
1.00 |
0.93 |
0.99 |
1.00 |
1.00 |
0.99 |
0.84 |
0.86 |
0.85 |
0.84 |
11 |
Logistic Reg |
0.96 |
0.96 |
0.97 |
0.94 |
0.77 |
0.77 |
0.76 |
0.77 |
0.84 |
0.83 |
0.85 |
0.81 |
0.68 |
0.66 |
0.67 |
0.64 |
Error Analysis
Confusion matrices reveal that misclassifications primarily occurred in overlapping classes (e.g., Crackles+Wheezes). Augmentation mitigated errors in minority classes.
To compare the classification results across different feature sets and machine learning models, a cross-comparison of tables was performed. This matrix compares the performance of different features and models, including Decision Trees and Random Forest, using features such as All Features, MFCC mean, MFCC mean+std, see Figures 5-7. The average accuracy results across different models, features, and split ratios suggest that the best-performing model utilizes all features and a split ratio of 0.20. The study [4] achieved 97% accuracy using CNNs, while our approach achieved 99% with KNN and SVM, demonstrating superior performance with lower computational costs.
Figure 5. Cross-matrix representation of ML models using all features (df_feature_all) group
Figure 6. Cross-matrix representation of ML models using MFCC mean features
Figure 7. Cross-matrix representation of ML models using Chroma(mean+std) features
This paper details our attempt to utilize machine learning to classify respiratory sounds, referencing the ICBHI 2017 Respiratory Sound Database. This work is intended to develop a robust and computationally efficient model to enhance diagnosis accuracy and objectivity, considering the worldwide public health crisis related to respiratory disorders and the limitations of classical auscultation. Our research addressed a significant issue in audio datasets, class imbalance, by applying various data augmentation techniques, including loudness adjustment, masking, shifting, and speed alteration. Hyperparameter tuning was implemented to improve classification, emphasizing attributes such as MFCC Mean, MFCC, and Chroma-Mean_std. This method prioritized high predicted accuracy with reduced computing demands, unlike deeper learning models that tend to be more computationally intensive. Tuning hyperparameters such as solver, penalty, C, and class_weight enhances the accuracy of our model, as demonstrated in Table 12, which pertains to MFCC Mean features.
Hyperparameter tuning was performed using a grid search approach:
A grid search was performed for KNN (neighbors: 1–15), SVM (kernel: linear, RBF; C: 0.1–10), Gradient Boosting: Learning rate (0.01–0.1) and maximum depth (3–10).
The performance plateau observed in some classifiers (e.g., Logistic Regression) is attributed to:
Model Complexity: Simpler models are less capable of capturing the nuanced relationships in high-dimensional feature spaces.
Dataset Size: Limited samples in minority classes may restrict the ability of certain classifiers to generalize.
Initial results were encouraging, but multiple aspects must be explored to use it accurately. While the ICBHI dataset is a valuable resource, the diversity of respiratory sounds may not fully reflect real-world complexities. Future research may address this issue through a larger, more heterogeneous dataset to improve generalizability. Moreover, an experiment with different combinations of features and machine learning models might yield even better classification performance. Validation of clinical utility requires direct comparison with established diagnostic capabilities of trained medical professionals. To conclude, further work should be carried out on seamless integration into the clinical workflow to understand its real-world impact on patient care and diagnostic efficiency.
Table 12 compares different ML models with MFCC Mean features on a 0.20 split. KNN, SVM, and Gradient Boosting classifiers' accuracy increased significantly with hyperparameter tuning. However, Logistic Regression showed limited improvement due to its inability to model non-linear relationships in high-dimensional feature spaces. Accuracy is improved even tried with the best hyperparameters (C: 10, penalty: l1, solver: liblinear), chosen from solver: [newton-cg, lbfgs, liblinear], penalty: [l1, l2, elasticnet], C:[1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 100].
Table 12. Hyperparameter tuning with MFCC mean features
Models |
Logistic Regression |
KNN |
SVM Classifier |
Random Forest |
Gradient Boost |
XGB Classifier |
Best hyperparameters |
'C': 10, 'penalty': 'l1', 'solver': 'liblinear' |
'algorithm': 'auto', 'metric': 'euclidean', 'n_neighbors': 1, 'weights': 'uniform' |
'C': 100, 'gamma': 'scale', 'kernel': 'rbf' |
'criterion': 'gini', 'max_features': 'sqrt', 'n_estimators': 100 |
'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100, 'subsample': 0.5 |
'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 1000, 'subsample': 0.5 |
Model Accuracy |
0.74 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
ROC-AUC |
0.74 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
Precision |
0.75 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
Recall |
0.74 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
F1 |
0.74 |
1.00 |
1.00 |
1.00 |
1.00 |
1.00 |
This study demonstrates the effectiveness of machine learning techniques in respiratory sound classification, achieving state-of-the-art achieving 99% accuracy with efficient feature-classifier combinations. However, limitations include:
This research article investigated the application of machine learning to classify respiratory sounds, utilizing the ICBHI 2017 Respiratory Sound Database as a benchmark dataset. Recognizing the global health challenge posed by respiratory diseases and the limitations of traditional auscultation, this work aimed to develop a robust and computationally efficient model for enhanced diagnostic accuracy and objectivity. The study addressed the critical issue of class imbalance inherent in audio datasets by employing various data augmentation techniques, including loudness control, masking, shifting, and speed alteration. Additionally, classification parameters were tuned, and features found to be optimal, e.g., MFCC Mean, MFCC, Chroma-Mean_std, were used for classification. This method emphasized high accuracy with little computational cost, which was a possible edge over computationally expensive deep learning technique.
Preliminary results showed promising accuracy rates, but some areas need further exploration. The ICBHI dataset is helpful but does not entirely encompass all true variability of respiratory sounds. It would be helpful for future studies to include more extensive and more heterogeneous datasets to improve the generalizability of the results. Also, re-evaluating the feature sets and learning models could help us to improve the classification even more. The clinical utility of an automated approach must be validated with a direct comparison to the diagnostic ability of trained medical professionals. Lastly, future research must explore the implementation of this technology into everyday clinical workflows and its real-time effect on patient management and diagnosis. In Future the dataset can be Expanded with real-world recordings and incorporate deep learning models for richer feature extraction and improved classification accuracy.
[1] World Health Organization. (2022). Global health statistics report 2022. Geneva: World Health Organization.
[2] Sun, Z. (2023). ICBHI 2017 challenge. Medicine, Health and Life Sciences. https://doi.org/10.7910/dvn/ht6pki
[3] Meng, F., Shi, Y., Wang, N., Cai, M., Luo, Z. (2020). Detection of respiratory sounds based on wavelet coefficients and machine learning. IEEE Access, 8: 155710-155720. https://doi.org/10.1109/ACCESS.2020.3016748
[4] Aykanat, M., Kılıç, Ö., Kurt, B., Saryal, S. (2017). Classification of lung sounds using convolutional neural networks. EURASIP Journal on Image and Video Processing, 2017: 1-9. https://doi.org/10.1186/s13640-017-0213-2
[5] Yang, R., Lv, K., Huang, Y., Sun, M., Li, J., Yang, J. (2023). Respiratory sound classification by applying deep neural network with a blocking variable. Applied Sciences, 13(12): 6956. https://doi.org/10.3390/app13126956
[6] Xia, T., Han, J., Mascolo, C. (2022). Exploring machine learning for audio-based respiratory condition screening: A concise review of databases, methods, and open issues. Experimental Biology and Medicine, 247(22): 2053-2061. https://doi.org/10.1177/15353702221115428
[7] Asatani, N., Kamiya, T., Mabu, S., Kido, S. (2021). Classification of respiratory sounds using improved convolutional recurrent neural network. Computers & Electrical Engineering, 94: 107367. https://doi.org/10.1016/j.compeleceng.2021.107367
[8] Roy, A., Satija, U. (2023). RDLINet: A novel lightweight inception network for respiratory disease classification using lung sounds. IEEE Transactions on Instrumentation and Measurement, 72: 4008813. https://doi.org/10.1109/TIM.2023.3292953
[9] Altan, G., Kutlu, Y., Allahverdi, N. (2019). Deep learning on computerized analysis of chronic obstructive pulmonary disease. IEEE Journal of Biomedical and Health Informatics, 24(5): 1344-1350. https://doi.org/10.1109/jbhi.2019.2931395
[10] Petmezas, G., Cheimariotis, G.A., Stefanopoulos, L., Rocha, B., Paiva, R.P., Katsaggelos, A.K., Maglaveras, N. (2022). Automated lung sound classification using a hybrid CNN-LSTM network and focal loss function. Sensors, 22(3): 1232. https://doi.org/10.3390/s22031232
[11] Sen, I., Saraclar, M., Kahya, Y.P. (2015). A comparison of SVM and GMM-based classifier configurations for diagnostic classification of pulmonary sounds. IEEE Transactions on Biomedical Engineering, 62(7): 1768-1776. https://doi.org/10.1109/tbme.2015.2403616
[12] Fraiwan, M., Fraiwan, L., Alkhodari, M., Hassanin, O. (2022). Recognition of pulmonary diseases from lung sounds using convolutional neural networks and long short-term memory. Journal of Ambient Intelligence and Humanized Computing, 13: 4759-4771. https://doi.org/10.1007/s12652-021-03184-y
[13] Bae, S., Kim, J.W., Cho, W.Y., Baek, H., Son, S., Lee, B., Yun, S.Y. (2023). Patch-mix contrastive learning with audio spectrogram transformer on respiratory sound classification. arXiv preprint arXiv:2305.14032. https://doi.org/10.21437/interspeech.2023-1426
[14] Ntalampiras, S. (2023). Explainable Siamese neural network for classifying pediatric respiratory sounds. IEEE Journal of Biomedical and Health Informatics, 27(10): 4728-4735. https://doi.org/10.1109/jbhi.2023.3299341
[15] Lal, K.N. (2023). A lung sound recognition model to diagnoses the respiratory diseases by using transfer learning. Multimedia Tools and Applications, 82(23): 36615-36631. https://doi.org/10.1007/s11042-023-14727-0
[16] Chen, C.H., Huang, W.T., Tan, T.H., Chang, C.C., Chang, Y.J. (2015). Using k-nearest neighbor classification to diagnose abnormal lung sounds. Sensors, 15(6): 13132-13158. https://doi.org/10.3390/s150613132
[17] Wanasinghe, T., Bandara, S., Madusanka, S., Meedeniya, D., Bandara, M., Díez, I.D.L.T. (2024). Lung sound classification with multi-feature integration utilizing lightweight CNN model. IEEE Access, 12: 21262-21276. https://doi.org/10.1109/access.2024.3361943
[18] McLane, I., Lauwers, E., Stas, T., Busch-Vishniac, I., Ides, K., Verhulst, S., Steckel, J. (2021). Comprehensive analysis system for automated respiratory cycle segmentation and crackle peak detection. IEEE Journal of Biomedical and Health Informatics, 26(4): 1847-1860. https://doi.org/10.1109/jbhi.2021.3123353
[19] Lee, C.S., Li, M., Lou, Y., Dahiya, R. (2022). Restoration of lung sound signals using a hybrid wavelet-based approach. IEEE Sensors Journal, 22(20): 19700-19712. https://doi.org/10.1109/jsen.2022.3203391
[20] Jin, F., Krishnan, S., Sattar, F. (2011). Adventitious sounds identification and extraction using temporal-spectral dominance-based features. IEEE Transactions on Biomedical Engineering, 58(11): 3078-3087. https://doi.org/10.1109/tbme.2011.2160721
[21] Tripathy, R.K., Dash, S., Rath, A., Panda, G., Pachori, R.B. (2022). Automated detection of pulmonary diseases from lung sound signals using fixed-boundary-based empirical wavelet transform. IEEE Sensors Letters, 6(5): 1-4. https://doi.org/10.1109/lsens.2022.3167121
[22] Abushakra, A., Faezipour, M. (2013). Acoustic signal classification of breathing movements to virtually aid breath regulation. IEEE Journal of Biomedical and Health Informatics, 17(2): 493-500. https://doi.org/10.1109/jbhi.2013.2244901
[23] Vardhan, B.V., Geetha, M.K., Prasad, G.S., Kumar, C.S.K. (2024). Abnormal sound detection in lungs using vest-coat stethoscope using deep learning algorithm. Explainable Artificial Intelligence in Healthcare Systems, Nova Publisher, USA, pp. 125-140. https://doi.org/10.52305/GOMR8163
[24] Zhou, G., Liu, C., Li, X., Liang, S., Wang, R., Huang, X. (2024). An open auscultation dataset for machine learning-based respiratory diagnosis studies. JASA Express Letters, 4(5): 052001. https://doi.org/10.1121/10.0025851
[25] Zulfiqar, R., Majeed, F., Irfan, R., Rauf, H.T., Benkhelifa, E., Belkacem, A.N. (2021). Abnormal respiratory sounds classification using deep CNN through artificial noise addition. Frontiers in Medicine, 8: 714811. https://doi.org/10.3389/fmed.2021.714811
[26] Altan, G., Kutlu, Y. (2020). RespiratoryDatabase@ TR (COPD Severity Analysis). In Mendeley Data V1. https://doi.org/10.17632/p9z4h98s6j.1
[27] Hsu, F.S., Huang, S.R., Huang, C.W., Cheng, Y.R., Chen, C.C., Hsiao, J., Lai, F. (2021). An update on a progressively expanded database for automated lung sound analysis. arXiv preprint arXiv:2102.04062. https://doi.org/10.48550/arXiv.2102.04062
[28] PixSoft. The R.A.L.E. Repository. http://www.rale.ca.
[29] Rocha, B.M., Filos, D., Mendes, L., Serbes, G., Ulukaya, S., Kahya, Y.P., De Carvalho, P. (2019). An open access database for the evaluation of respiratory sound classification algorithms. Physiological Measurement, 40(3): 035001. https://doi.org/10.1088/1361-6579/ab03ea
[30] Shuvo, S.B., Ali, S.N., Swapnil, S.I., Hasan, T., Bhuiyan, M.I.H. (2020). A lightweight CNN model for detecting respiratory diseases from lung auscultation sounds using EMD-CWT-based hybrid scalogram. IEEE Journal of Biomedical and Health Informatics, 25(7): 2595-2603. https://doi.org/10.1109/jbhi.2020.3048006
[31] Demirci, B.A., Koçyiğit, Y., Kızılırmak, D., Havlucu, Y. (2022). Adventitious and normal respiratory sound analysis with machine learning methods. Celal Bayar University Journal of Science, 18(2): 169-180. https://doi.org/10.18466/cbayarfbe.1002917
[32] Kim, Y., Hyon, Y., Jung, S.S., Lee, S., Yoo, G., Chung, C., Ha, T. (2021). Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning. Scientific Reports, 11(1): 1-11. https://doi.org/10.1038/s41598-021-96724-7
[33] Nguyen, T., Pernkopf, F. (2022). Lung sound classification using co-tuning and stochastic normalization. IEEE Transactions on Biomedical Engineering, 69(9): 2872-2882. https://doi.org/10.1109/tbme.2022.3156293
[34] Rocha, B.M., Filos, D., Mendes, L., Vogiatzis, I., Perantoni, E., Kaimakamis, E., Maglaveras, N. (2018). Α respiratory sound database for the development of automated classification. In Precision Medicine Powered by pHealth and Connected Health: ICBHI 2017, Thessaloniki, Greece, pp. 33-37. https://doi.org/10.1007/978-981-10-7419-6_6
[35] Tasar, B., Yaman, O., Tuncer, T. (2022). Accurate respiratory sound classification model based on piccolo pattern. Applied Acoustics, 188: 108589. https://doi.org/10.1016/j.apacoust.2021.108589
[36] Wang, Z., Sun, Z. (2024). Performance evaluation of lung sounds classification using deep learning under variable parameters. EURASIP Journal on Advances in Signal Processing, 2024(1): 51. https://doi.org/10.1186/s13634-024-01148-w
[37] Park, J.S., Kim, K., Kim, J.H., Choi, Y.J., Kim, K., Suh, D.I. (2023). A machine learning approach to the development and prospective evaluation of a pediatric lung sound classification model. Scientific Reports, 13(1): 1289. https://doi.org/10.1038/s41598-023-27399-5
[38] Dimoulas, C.A. (2016). Audiovisual spatial-audio analysis by means of sound localization and imaging: A multimedia healthcare framework in abdominal sound mapping. IEEE Transactions on Multimedia, 18(10): 1969-1976. https://doi.org/10.1109/tmm.2016.2594148
[39] Shkel, A.A., Kim, E.S. (2019). Continuous health monitoring with resonant-microphone-array-based wearable stethoscope. IEEE Sensors Journal, 19(12): 4629-4638. https://doi.org/10.1109/JSEN.2019.2900713
[40] Huang, D.M., Huang, J., Qiao, K., Zhong, N.S., Lu, H.Z., Wang, W.J. (2023). Deep learning-based lung sound analysis for intelligent stethoscope. Military Medical Research, 10(1): 44. https://doi.org/10.1186/s40779-023-00479-3
[41] Mukherjee, H., Sreerama, P., Dhar, A., Obaidullah, S.M., Roy, K., Mahmud, M., Santosh, K.C. (2021). Automatic lung health screening using respiratory sounds. Journal of Medical Systems, 45: 1-9. https://doi.org/10.1007/s10916-020-01681-9