JOURNAL METRICS

Impact Factor (JCR) 2023: 1.2 ℹImpact Factor (JCR):

The JCR provides quantitative tools for ranking, evaluating, categorizing, and comparing journals. The impact factor is one of these; it is a measure of the frequency with which the “average article” in a journal has been cited in a particular year or period. The annual JCR impact factor is a ratio between citations and recent citable items published. Thus, the impact factor of a journal is calculated by dividing the number of current year citations to the source items published in that journal during the previous two years.

5-Year Impact Factor: 1.2 ℹ5-Year Impact Factor:

A 5-Year Impact Factor shows the long-term citation trend for a journal. This is calculated differently from the Journal Impact Factor, so it is not simply an average of the Impact Factors in the time period. The Impact Factor itself is based only on Web of Science Core Collection citation data from the last three years and thus reflects only recent impact. The Journal Impact Factor is the average number of times articles from the journal published in the past two years have been cited in the Journal Citation Reports year.

qqtu_pian_20240428144739.png

ECG Arrhythmia Classification Using Vague C-Means Clustering Based on Multiresolution Analysis

Rahime Ceylan

Department of Electrical-Electronics Engineering, Konya Technical University, Konya 42250, Turkey

Corresponding Author Email:

rceylan@ktun.edu.tr

Received:

3 June 2024

Revised:

4 November 2024

Accepted:

13 December 2024

Available online:

28 February 2025

| Citation

ts_42.01_42.pdf

OPEN ACCESS

Abstract:

The accurate classification of ECG arrhythmias is crucial for diagnosing heart diseases. The detection and classification of arrhythmia rely on several key factors, including the specialist's experience level, work intensity, and time consumption. These factors are critical determinants of the accuracy and effectiveness of the diagnostic process, which, in turn, directly impacts the patient's health outcomes. Artificial intelligence-based computer-aided diagnosis systems have made great progress in ECG arrhythmia classification in recent years. In this study, ECG arrhythmia classification was performed using a vague c-means clustering algorithm. The data set used was obtained from the MIT-BIH ECG Arrhythmia Database. The data set consisted of 318 patterns which are RR intervals Experiments were performed with different parameters of vague c-means clustering to achieve the highest classification performance. In addition, experiments were repeated using fuzzy c-means clustering for comparison. Furthermore, a frequency-based feature set using multiresolution analysis based on discrete wavelet transform was obtained. An ECG classification task was realized with vague c-means clustering on this frequency-based dataset. The best results were obtained as 87.5%, 80%, and 84% for classification accuracy, sensitivity, and positive predictive value, respectively.

Keywords:

electrocardiography, vague c-means clustering, discrete wavelet transform

1. Introduction

Electrocardiography is an important non-invasive technique for the diagnosis of heart disease because it represents whether the blood circularity system is working well or not. The electrocardiogram (ECG) is a signal graph that reflects the variation of bioelectric potential in the heart. So, it gives us vital clinical information about the state of cardiac health. This information can be extracted from the shape of the ECG waveform. Early detection of abnormalities in ECG signals can prolong life and improve the quality of life. Handicaps of interpretation of ECG are expertise-dependent and time-consuming [1-6]. So, for more than five decades, computer-aided diagnostic systems (CAD) based on artificial intelligence and machine learning have been proposed for ECG classification. These techniques include artificial neural networks, machine learning algorithms, and fuzzy clustering algorithms. The first study on the classification of ECG arrhythmias with fuzzy clustering was carried out by Osowski and Linh [1]. In their study, Osowski and Linh proposed a fuzzy hybrid neural network that performs the self-organization part of the neural network with fuzzy c-means clustering for the classification of ECG arrhythmias. Therefore, the realized hybrid structure includes a fuzzy self-organization layer and a multiplayer perceptron used as the final classifier. In the study, classification accuracy was obtained as 97.45% in training and 96.06% in testing.

Yeh et al. [2] proposed a new method based on fuzzy c-means clustering to classify heartbeat cases in the ECG signal. They have achieved 93.57% classification accuracy with the proposed method. Doğan and Korürek [3] suggested the combined use of kernelized fuzzy-means clustering and hybrid ant colony optimization for ECG beat classification. In their study, they created a dataset containing 6 types of ECG beats and extracted four time-domain features for each beat. With the proposed model, 93.76% sensitivity and 98.76% specificity values were obtained. Haldar et al. [6] proposed a new model for arrhythmia classification for mobile health monitoring systems with improved fuzzy c-means clustering based on Mahalanobis distance. To reduce the number of iterations in the study, the cluster centers found with traditional FCM were assigned as the initial cluster center values for the proposed algorithm. Roopa et al. [7] proposed the robust spatial kernel fuzzy c-means clustering algorithm for ECG arrhythmia classification. The study consists of two elements: feature selection and clustering. In the first step, feature selection was performed using principal component analysis, linear discriminant analysis, and regularized locality-preserving indexing methods. The data reduced by performing feature selection was clustered with robust spatial Kernel FCM. As a result of the experiments, it has been determined that using regularized locality preserving in the feature selection step is superior to other methods. In the study conducted by Pander [8], an adaptive threat-based method was proposed for QRS detection with fuzzy clustering. In general, in QRS detection studies carried out in literature, there is an approach that evaluates whether the threshold determined for amplitude is met and, accordingly, if the threshold is met, the QRS complex is removed. However, the presence of noise in ECG signals reduces the sensitivity of such an approach. The study determines an adaptive threshold when detecting QRS on the signal using fuzzy c-means clustering. The recommended QRS detector achieved 99.82% sensitivity and 99.88% positive predictive value.

Monedero has proposed a new ECG diagnostic system based on wavelet transform and decision trees for the detection of 13 different diseases [9]. In the study, the results produced by the proposed system were re-evaluated by an expert and 80.8% reliability was reported for the results produced by the system. Another system recommended for the diagnosis of heart diseases is the study conducted by Malakouti (2023) using Gaussian Naïve Bayes, random forest, logistic regression, linear discriminant analysis, and Dummy classifier [10]. In the study, the Gaussian NB algorithm was able to distinguish individuals with heart disease and healthy individuals with 96% accuracy.

Challenges and complexities in studies of ECG classification are as follows:

• The amplitudes of ECG signals are at the mV level. Many different noise interferences can distort the signal and damage the information contained in the signal, thus reducing the classifier's performance.

• While different ECG arrhythmia types may have the same morphology in different patients, the same ECG arrhythmia type may have different morphology in the same patient at different times.

• It may not be possible to detect momentary arrhythmias in the ECG signal during the patient’s examination. For this reason, long-term recordings are taken with wearable ECG monitoring devices without needing a hospital environment. However, it is difficult for experts to examine and evaluate these long-term records.

This study aimed to assess the performance of the vague c-means clustering algorithm on the ECG classification problem. The ECG signals given from the MIT-BIH ECG Arrhythmia Database were adopted by separating them for training and testing. In addition, the results were obtained for traditional fuzzy c-means clustering. The contributions of the study are as follows:

• The usage of vague c-means clustering on the classification of ECG is made first.

• According to data sets in the studies on ECG classification with fuzzy clustering, this is the study containing the highest number of ECG signal classes. The dataset used includes 12 different ECG signal classes.

• The results obtained by the traditional fuzzy c-means clustering algorithm were evaluated. So, the superiority of vague c-means was revealed.

• In addition to the results obtained with the time-based feature set, the frequency-based feature set obtained using multiresolution analysis based on the discrete wavelet transform was also classified with vague c-means clustering.

• The qualitative evaluation in the study was diversified by using three different evaluation metrics.

2. Material and Methods

2.1 Dataset description

The data set used for experimental studies containing normal sinus rhythm and eleven arrhythmia types was collected from the MIT-BIH ECG Arrhythmia Database [9]. This database includes 48 records taken from 47 subjects (25 men and 22 women). Each record is of two channels which is sampled at 360 Hz and its duration is 30 minutes. The following rhythm types were selected into account in investigations: Normal sinus rhythm (NSR), sinus bradycardia (SB), ventricular tachycardia (VT), sinus arrhythmia (SA), atrial premature contraction (APC), paced beat (PB), right bundle branch block (RBB), left bundle branch block (LBB), atrial fibrillation (AFib) atrial flutter (AFlut), atrial couplet (ACoup) and ventricular trigeminy (VTrig). The morphologies of RR intervals of twelve rhythm types are shown in Figure 1. Some rhythm types have similar frequencies and amplitude ranges, it is difficult to differentiate one from the other. Firstly, the desired rhythm was obtained by subtracting the specified time interval (Table 1) from the nine patient’s relevant records. Patterns were created by extracting RR intervals from obtained signal fragments. Ahlstrom and Tompkins algorithm [10, 11], which is the first derivative-based algorithm, is used to determine R peaks. Each RR intervals were arranged as 200 samples by resampling.

image003_-_fu_ben_.png

(a)

image003.png

(b)

image004_-_fu_ben_.png

(c)

image004.png

(d)

image005_-_fu_ben_.png

(e)

image005.png

(f)

image006_-_fu_ben_.png

(g)

image006.png

(h)

image007_-_fu_ben_.png

(i)

image007.png

(j)

image008_-_fu_ben_.png

(k)

image008.png

(l)

Figure 1. The morphologies of ECG signals used (a) NSR, (b) SB, (c) VT, (d) SA, (e) APC, (f) PB, (g) RBB, (h) LBB, (i) AFib, (j) AFlut, (k) ACoup, (l) Vtrig

2.2 Preprocessing

As it is known, the ECG signal is periodic so it can be divided into periods. The process of dividing ECG signals into periods is called QRS detection. Before QRS detection, the ECG signal should be filtered to eliminate noises on the signal. For this aim, a high pass filter which has is 0,09 Hz corner frequency and a low pass filter which has a 30 Hz corner frequency are used. Signals that have been filtered and noise eliminated are used in QRS detection. It can be divided into RR intervals with the algorithm. There are many QRS detection methods in the literature. When these algorithms are examined in general, they can be grouped under three main classes [11]. First-class QRS detection algorithms are based on statistical methods [11]. Second-class QRS detection algorithms are based on examinations in the time and frequency spectrum such as Hilbert transform, and wavelet transform [12, 13]. Third-class QRS detection algorithms consist of artificial intelligence systems such as artificial neural networks and fuzzy logic, which have been widely used in recent years [8, 14]. In this study, a statistical method based on first and second-order derivatives improved by Ahlstrom and Tompkins algorithm [11] is utilized.

Each extracted RR interval is called a pattern, so a data set that includes 318 patterns is formed in this study (Table 1) [9, 15].

Table 1. ECG signals taken from MIT-BIH arrhythmia database [9, 15]

Rhythm Type	Record	Time	NP
Normal Sinus Rhythm (NSR)	103	1.09-17.21	40
Sinus Bradycardia (SB)	202	18.22-18.45	15
Ventricular Tachycardia (VT)	200	1.45-5.38	15
Sinus Arrhythmia (SA)	113	12.27-22.10	30
Atrial Premature Contraction (APC)	202	12.24-12.41	8
Paced Beat (PB)	107	0.44-12.30	30
Right Bundle Branch Block (RBB)	118	13.47-22.32	30
Left Bundle Branch Block (LBB)	109	17.08-17.50	30
Atrial Fibrillation (AFib)	202	29.35-30.06	30
Atrial Flutter (AFlut)	202	25.58-27.55	30
Atrial Couplet (ACoup)	220	25.44-29.40	30
Ventricular Trigeminy (VTrig)	119	2.38-4.51	30

*NP: Number of patterns

2.3 Vague c-means clustering

The most widely used clustering algorithm is fuzzy c-means (FCM) clustering. This algorithm, which was introduced by Dunn and later extended by Bezdek, is an iterative clustering technique. It divided data into “c” fuzzy partitions which provide the smallest objective function. In the algorithm, Let’s assume that be an array $S=\left\{ {{s}_{1}},~{{s}_{2}},\ldots ,{{s}_{n}} \right\}~$ with n data points. Each data point is d dimensional feature vector. This data can be separated into clusters (c, 1<c<n) and each pattern has a membership value for each cluster. The membership function is adopted as follows [1-5]:

${{u}_{ij}}=1/\underset{k=1}{\overset{c}{\mathop \sum }}\,{{\left( {{s}_{j}}-{{v}_{i}}/{{s}_{j}}-{{v}_{k}} \right)}^{2/\left( m-1 \right)}}$ (1)

In Eq. (1), $v$ is cluster centers and $m$ is the fuzzifier coefficient. Cluster centers in the FCM clustering algorithm are calculated as in Eq. (2).

${{v}_{i}}=\underset{j=1}{\overset{n}{\mathop \sum }}\,u_{ij}^{m}{{s}_{j}}/\underset{j=1}{\overset{n}{\mathop \sum }}\,u_{ij}^{m}$ (2)

In the beginning, each cluster center is started randomly, iteration is continued to reach the local minimum for the objective function. The objective function is calculated as in Eq. (3).

${{J}_{FCM}}=\underset{i=1}{\overset{c}{\mathop \sum }}\,\underset{j=1}{\overset{n}{\mathop \sum }}\,u_{ij}^{m}{{\left( {{s}_{j}}-{{v}_{i}} \right)}^{2}}$ (3)

In the vague c-means (VCM) clustering algorithm [16], which was introduced by Xu etc., the membership function is redefined as follows, unlike Eq. (1) above, as truth membership function ( ${{t}_{ij}}$ ) and false membership function ( ${{f}_{ij}}$ ). $\beta$ is a positive constant.

${{t}_{ij}}={{s}_{j}}-{{v}_{i}}^{-\beta }/\underset{k=1}{\overset{c}{\mathop \sum }}\,{{s}_{j}}-{{v}_{k}}^{-\beta }$ (4)

${{f}_{ij}}={{s}_{j}}-{{v}_{i}}^{\beta }/\underset{k=1}{\overset{c}{\mathop \sum }}\,{{s}_{j}}-{{v}_{k}}^{\beta }$ (5)

In VCM (Figure 2), for the calculation of cluster centers ( ${{v}_{i}}$ ), Eq. (2) is utilized like FCM. But the membership function is defined as ${{u}_{ij}}={{t}_{ij}}$ . The definition of the objective function in the VCM clustering algorithm is given as in Eq. (6) [16]. $\lambda$ is a balancing factor here.

${{J}_{VCM}}=\underset{i=1}{\overset{c}{\mathop \sum }}\,\underset{j=1}{\overset{n}{\mathop \sum }}\,t_{ij}^{m}{{s}_{j}}-{{v}_{i}}^{2}+\lambda \underset{j=1}{\overset{n}{\mathop \sum }}\,\left( 1/max\left( 1-{{f}_{ij}} \right) \right)$ (6)

image009.png

Figure 2. Flowchart of vague c-means clustering [16, 17]

Vague c-means clustering method can well partition nonlinearity distributed data rather than fuzzy c-means clustering [16, 17]. The interval-based membership generalization in VCM is more expressive than which one in FCM while defining data vagueness [16].

2.4 Evaluation metrics

In this study, three evaluation metrics are used for the quantitative evaluation of the clustering phase. Evaluation is very important in providing an interpretation of the study. These evaluation metrics are classification accuracy (CA), sensitivity, and positive predictive value. The equations for these metrics are given in Eqs. (7)-(9) [18-20], respectively. In the equations, TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively. Here, for normal sinus rhythm, TP shows patterns that are NSR that the network predicts as NSR, TN shows patterns that are another rhythm that the network predicts as another rhythm, FN shows patterns that are NSR that the network predicts as another rhythm, and FP shows patterns that are another rhythm that the network predicts as NSR.

$CA=\frac{The~number~of~patterns~found~as~true}{The~number~of~patterns~in~dataset}*100$ (7)

$Sensitivity=\frac{TP}{TP+FN}$ (8)

$Positive~Predictive~Value\left( PPV \right)=\frac{TP}{TP+FP}$ (9)

3. Experiments

In the study, the clustering of the ECG signal was implemented by the vague c-means clustering algorithm. The pipeline of the performed process is presented in Figure 3.

As can be seen in Figure 3, preprocessing was first done on the ECG signal record. In the preprocessing stage, to remove possible noise on the ECG signal record, it was filtered with a high pass filter with a cut-off frequency of 0.9 Hz and a low pass filter with a cut-off frequency of 30 Hz. In the second part of the preprocessing stage, R peaks were detected by performing QRS detection in the filtered ECG signal records. Based on the R peaks detected on the signal, RR intervals were extracted, and each RR interval was resampled to consist of 200 samples and normalized at intervals [0, 1]. Training and test data sets were created from RR intervals, i.e. patterns, each consisting of 200 samples. Patterns in the training data set were classified using the vague c-means clustering algorithm without a consultant. The patterns in the test data set were classified using the cluster center values obtained for each class and pattern because of the classification. The classification results obtained in the training and testing phase were interpreted using performance metrics.

In this section, in addition to the presentation and interpretation of the results obtained with the vague c-means clustering algorithm, these results were compared with the results of the fuzzy c-means clustering algorithm. Furthermore, the performance of the clustering algorithm was examined on the signal whose dimension is reduced. In this case, after the preprocessing phase, discrete wavelet transform (DWT) is implemented to reduce the dimension of patterns. Daubechies wavelet types are used to reduce the number of samples in an RR interval, and their results are compared.

image010.png

Figure 3. The ECG signal clustering method used

image011.png

(a)

image012.png

(b)

Figure 4. In VCM, the variation of classification accuracy according to (a) $m$ and (b) $\beta$

Firstly, fine-tuning was done for the parameters of the vague c-means clustering algorithm. The results of the implemented experiments are presented in Figure 4. The most important parameters in the vague c-means clustering algorithm are the fuzzifier constant ( $m$ ) and positive constant ( $\beta$ ) used in the calculation of truth and false membership functions. The performance of the clustering algorithm in collecting data belonging to the same class in the same cluster was evaluated as classification accuracy. Figure 4 assesses how the fuzzifier constant (m) and positive constant (beta) affect the classification capability of the VCM clustering algorithm. As can be seen from Figure 4, the highest classification results were obtained when $m$ and $\beta$ were taken as “3”.

image013.png

Figure 5. In FCM, the variation of classification accuracy according to $m$

In FCM, the most important parameter is the fuzzifier constant (m). The variation of classification accuracy according to $m$ can be seen in Figure 5. While using the FCM clustering algorithm, the best result was found as 72.3% when $m$ is taken as “8”. In VCM clustering, while using optimum $m$ and $\beta$ as “3”, the classification accuracy was obtained as 77.1%.

Dimension reduction is a usable tool to extract important features in a signal. A signal consists of several valuable features. Multiresolution analysis decomposes a signal to components at different scales. These components provide information about the attributes of physical data. Discrete Wavelet Transform (DWT) is an essential method for multiresolution analysis. DWT analyses the signal by dividing to low-high frequency intervals with filter banks. As can be seen in Figure 6, different frequency components were extracted from an RR interval of the ECG signal by using DWT with two levels. In this study, approximation coefficients obtained from RR interval in both Level 1 and Level 2 were classified using VCM. Thus, the performance of VCM on a dataset that composed of approximation coefficients (features at low frequencies) was also evaluated.

In training, the results produced with data set forming from approximation coefficients by different Daubechies wavelet types at Level 1 were represented in Figure 7(a). When the wavelet type was chosen as db4, classification accuracy was handled as 81.3%. When approximation coefficients at Level 2 were used as a data set, the success of VCM on clustering of these signals was found as 75.3% with db7 and db8 wavelets (Figure 7(b)).

After the training phase, the test data set was classified by using all the models implemented, and the results were presented in Table 2. The vague c-means clustering algorithm achieved 79.61% classification accuracy for separating ECG signals into 12 classes. 121 of 152 RR intervals were detected in the true class. 26 patterns were found as false positives. The false positive rate in classifying ECG signals with the fuzzy c-means clustering algorithm is much higher than in classifying with VCM. So, only 74.34% classification accuracy was achieved in classification with FCM. Additionally, sensitivity and positive predictive value are superior in results obtained in classification with VCM than the results of classification with FCM. As can be seen from Table 2, while PPV is 0.83 in the results of VCM, it is found as 0.61in FCM.

However, the classification model with VCM which uses a data set forming from approximation coefficients by different Daubechies wavelet types at Level 1 is called DWT1-VCM. The classification model with VCM uses a data set created by DWT Level 2 called DWT2-VCM. PPV was obtained as 0.84 by the DWT1-VCM model and this model achieved 87.5% accuracy. The sensitivity of the DWT1-VCM classification model is superior to the other results found by VCM, FCM, and DWT2-VCM. In the DWT2-VCM model, on the other hand, when level 2 approximation coefficients were used, 84,21% accuracy was handled but PPV was worse than those obtained in the classification of VCM.

image014.png

Figure 6. Frequency range on each level of DWT

image015.png

(a)

image016.png

(b)

Figure 7. The results obtained on VCM with dataset formed by (a) Level 1 of DWT (b) Level 2 of DWT

Table 2. The results observed with the test dataset

	VCM				FCM				DWT1-VCM				DWT2-VCM
	TP	FP	Sensitivity	PPV	TP	FP	Sensitivity	PPV	TP	FP	Sensitivity	PPV	TP	FP	Sensitivity	PPV
NSR	20	20	1,00	0,50	19	24	0,95	0,44	20	6	1,00	0,77	20	6	1,00	0,77
SB	5	1	1,00	0,83	0	0	0,00	0,00	5	1	1,00	0,83	5	3	1,00	0,63
VT	2	0	0,40	1,00	0	0	0,00	0,00	0	0	0,00	0,00	2	1	0,40	0,67
SA	0	0	0,00	0,00	0	0	0,00	0,00	14	0	0,93	1,00	14	0	0,93	1,00
APC	1	0	0,50	1,00	2	2	1,00	0,50	1	0	0,50	1,00	0	0	0,00	0,00
PB	15	0	1,00	1,00	15	0	1,00	1,00	15	0	1,00	1,00	12	0	0,80	1,00
RBB	15	2	1,00	0,88	15	2	1,00	0,88	15	2	1,00	0,88	15	2	1,00	0,88
LBB	15	0	1,00	1,00	15	2	1,00	0,88	15	2	1,00	0,88	15	0	1,00	1,00
AFib	13	0	0,87	1,00	13	4	0,87	0,76	13	0	0,87	1,00	10	5	0,67	0,67
AFlut	15	1	1,00	0,94	15	2	1,00	0,88	15	1	1,00	0,94	15	2	1,00	0,88
ACoup	15	1	1,00	0,94	15	0	1,00	1,00	15	1	1,00	0,94	15	1	1,00	0,94
VTrig	5	1	0,33	0,83	4	0	0,27	1,00	5	1	0,33	0,83	5	0	0,33	1,00
Total	121	26	0,76	0,83	113	36	0,67	0,61	133	14	0,80	0,84	128	20	0,76	0,79
CA (%)	79,61				74,34				87,50				84,21

3. Discussion

Since the used data set contains multi-labeled RR intervals extracted from the ECG signal, the classification models must produce a decision sequence that separately shows which classes the test data set contains.

These decision sequences on multi-labeled classification tasks represent functional confusion matrices which include TP, FP, TN, and FN values, and analyze how a classification model can recognize patterns of different classes. So, TP, FP, TN, and FN are the most valuable indicators in evaluating the success of classification models. It can be seen from Table 2 that the best classification model in the classification of the 12-class ECG data set is DWT1-VCM. Confusion matrices found with DWT1-VCM were presented for each arrhythmia type in Figure 8. It can be seen in Figure 8 that RR intervals that belong to NSR, SB, PB, RBB, LBB, AFlut, and ACoup ECG signal types could be detected with 100% accuracy by using the DWT1-VCM model. The percentage of average true negatives for all ECG signal types was determined to be 80%. Against this, patterns of VT signal type could not be collected under a single cluster with DWT1-VCM.

image017.png

Figure 8. Class-dependent confusion matrices for the best classifier model DWT1-VCM

VT patterns were classified into clusters containing RBB, LBB, and ACoup signal types. But, as can be seen in Table 2, VCM, and DWT2-VCM models could classify 2 of these VT patterns into the same class. This arrhythmia type has quite different signal morphologies in the data set, so classification models had difficulty in collecting all VT patterns in the same class. There is the same circumstance in classifying VTrig patterns. DWT1-VCM model could be classified with 33.3% accuracy. In VCM, FCM, and DWT2-VCM, classification accuracy is about the same for VTrig patterns. Although 100% classification accuracies could not be achieved in SA and AFib, these signal types could be classified with high accuracies of 93.33% and 86.67% using the DWT1-VCM classifier. The false negative rate in the results of the DWT1-VCM model was found as 12,5%. This value can be interpreted as the classifier assigning 12.5% of data from different classes to a cluster consisting of data from the same class. These false negative and false positive rates are the limitations of vague c-means clustering in ECG classification. Furthermore, the significance test was applied to the results of DWT1-VCM. In significance testing, h=0 was obtained. The returned value of h = 0 represents that ttest2 does not reject the null hypothesis at the default 5% significance level.

In addition to these experiments, another experiment was conducted with some 30 min-long records of ECG signal. In this experiment, it is tried to define ECG signal classes on recordings #100, 103, 113, 107, and 109 using trained DWT1-VCM. The results are presented in Table 3. Recording #100 contains 2273 beats and three signal classes: 2239 NSR beats, 33 APC beats, and 1 Premature Ventricular Contraction beat. DWT1-VCM, which is the best classification model in this study, could detect 2211 NSR beats. APC beats could not be classified. For NSR beats, sensitivity can be found as 98.75%. In record number 103, which contains 2082 NSR beats, 2036 of them could be detected correctly with DWT1-VCM. Additionally, two APC beats were detected here in the recording. Sensitivity is 100% for APC and 97.8% for NSR. Recordings #113 also contains predominantly NSR signal class. 1789 of 1795 beats belong to the NSR signal class. All 6 of them are APC. When this recording was classified with DWT1-VCM, 1744 of 1789 NSR beats could be detected. However, only 1 out of 6 APC beats could be classified. The sensitivity value for NSR was found to be 97.5%. Likewise, 1820 of the 2078 PB in recording #107 were found correctly with DWT1-VCM. In this recording, where PB is concentrated, a sensitivity value of 87.6% was reached in PB detection. Recording #109 was also tested with the trained DWT1-VCM because of contained a different ECG signal class. Looking at the classification results of this record containing 2492 LB beats, it can be seen that 2412 of them were classified correctly with DWT1-VCM. In this case, it can be said that the sensitivity is 96.8% in the classification of LB beats. Additionally, the experimental results presented in Table 3, are an indication of success for the vague c-means clustering algorithm proposed for ECG classification in this study in classifying different rhythm types on a patient record in the clinic.

Table 3. Test results of DWT1-VCM for some ECG records in MIT-BIH Arrhythmia Database [9]

		Predicted Class
		N	SB	VT	SA	APC	PB	RBB	LBB	Afib	Aflut	ACoup	Vtrig
Records	100	2211	26	12	0	0	0	1	0	12	0	2	0
	103	2036	5	10	0	2	0	0	0	1	0	28	0
	107	4	5	5	0	153	1820	25	1	0	0	10	0
	109	5	0	9	0	49	0	22	2412	1	7	6	14
	113	1744	8	5	0	1	0	0	0	0	0	0	0

4. Conclusion

This paper proposes a robust unsupervised ECG arrhythmia classification method that includes the usage of vague c-means clustering. Experiments in this study were carried out on a dataset containing 12 ECG signal classes (normal sinus rhythm and 11 different ECG arrhythmia types) taken from the MIT-BIH ECG arrhythmia database. and discrete wavelet transform. The vague c-means algorithm is an unsupervised clustering algorithm based on fuzzy c-means. For this reason, ECG classification was also performed with FCM. Additionally, to improve the efficiency of VCM, a new frequency-based dataset was created from the dataset using discrete wavelet transform. Then, the feature set that consists of low-frequency components called approximation coefficients was classified with VCM and so the DWT-VCM classifier model was adopted for the ECG classification task. Considering the test results from the best performance values of the proposed classification model as 87.5% accuracy, 80% sensitivity, and 84% positive predictive value, DWT1-VCM for ECG arrhythmia classification emerges as a superior classifier when compared to other classifier models (VCM, FCM, DWT2-VCM). While obtaining 79.61% accuracy by VCM, 87.5%, and 84.21% accuracies are obtained with DWT1-VCM and DWT2-VCM. These results are evidence that the use of DWT increases the performance of VCM in distinguishing arrhythmia types from each other with high accuracy, high sensitivity, and high positive predictive value.

Furthermore, a brief comparison with the most recent studies that utilize the same database in literature is presented in Table 4. There are many ECG arrhythmias classification studies in the literature using fuzzy clustering and its derivatives. As can be seen from Table 4, the fuzzy clustering method was used with an artificial neural network model or a feature extraction algorithm in all except two studies. In study [2], a novel FCM clustering algorithm is proposed to classify five heartbeat cases. In addition to this, four arrhythmia types are clustered by Mahalanobis distance-based FCM [6]. The results of this study have an important place in the literature in terms of proposing an unsupervised classifier that can classify 12 arrhythmia classes with high accuracy.

VCM is a fuzzy clustering technique. The results of clattering are often easier to interpret, because of the inspection of cluster membership degrees. Neural network models are black boxes, so it is not clear how they arrive at their conclusions. However, VCM requires less computational time while neural networks and like algorithms need more computational time because of training.

In future work, optimization algorithms can be used for the selection of VCM parameters to improve results.

Table 4. A brief comparison with the literature

Study	Database	Rhythm (Class) Types	Classification Model	CA (%)
[1]	MIT-BIH Arrhythmia Database	6	Fuzzy Hybrid Neural Network	96.06
[2]	MIT-BIH Arrhythmia Database	5	A novel FCM Clustering (FCMM)	93.57
[4]	MIT-BIH Arrhythmia Database	10	Type-2 Fuzzy Clustering Neural Network	99
[5]	MIT-BIH Arrhythmia Database	10	Wavelet transform and Type-2 Fuzzy Clustering Neural Network	99
[6]	MIT-BIH Arrhythmia Database	4	Mahalanobis distance based FCM	80.1
[7]	MIT-BIH Arrhythmia Database	6	Spatial Kernel Fuzzy C-Means Clustering Principal Component Analysis Linear Discriminant Analysis Regularized Locality Preserving Indexing	96.41
[21]	MIT-BIH Arrhythmia Database	10	Fuzzy Clustering Neural Network	99.81
This study	MIT-BIH Arrhythmia Database	12	Vague C-Means Clustering	79.61
This study	MIT-BIH Arrhythmia Database	12	Discrete Wavelet Transform (Level-1) and Vague C-Means Clustering	87.5
This study	MIT-BIH Arrhythmia Database	12	Discrete Wavelet Transform (Level-2) and Vague C-Means Clustering	84.21

References

[1] Osowski, S., Linh, T.H. (2001). ECG beat recognition using fuzzy hybrid neural network. IEEE Transactions on Biomedical Engineering, 48(11): 1265-1271. https://doi.org/10.1109/10.959322

[2] Yeh, Y.C., Wang, W.J., Chiou, C.W. (2010). A novel fuzzy c-means method for classifying heartbeat cases from ECG signals. Measurement, 43(10): 1542-1555. https://doi.org/10.1016/j.measurement.2010.08.019

[3] Doğan, B., Korürek, M. (2012). A new ECG beat clustering method based on kernelized fuzzy c-means and hybrid ant colony optimization for continuous domains. Applied Soft Computing, 12(11): 3442-3451. https://doi.org/10.1016/j.asoc.2012.07.007

[4] Ceylan, R., Özbay, Y., Karlik, B. (2009). A novel approach for classification of ECG arrhythmias: Type-2 fuzzy clustering neural network. Expert Systems with Applications, 36(3): 6721-6726. https://doi.org/10.1016/j.eswa.2008.08.028

[5] Özbay, Y., Ceylan, R., Karlik, B. (2011). Integration of type-2 fuzzy clustering and wavelet transform in a neural network based ECG classifier. Expert Systems with Applications, 38(1): 1004-1010. https://doi.org/10.1016/j.eswa.2010.07.118

[6] Haldar, N.A.H., Khan, F.A., Ali, A., Abbas, H. (2017). Arrhythmia classification using Mahalanobis distance based improved Fuzzy C-Means clustering for mobile health monitoring systems. Neurocomputing, 220: 221-235. https://doi.org/10.1016/j.neucom.2016.08.042

[7] Roopa, C.K., Harish, B.S., Kumar, S.A. (2018). A novel method of clustering ECG arrhythmia data using robust spatial kernel fuzzy c-means. Procedia Computer Science, 143: 133-140. https://doi.org/10.1016/j.procs.2018.10.361

[8] Pander, T. (2022). A new approach to adaptive threshold based method for QRS detection with fuzzy clustering. Biocybernetics and Biomedical Engineering, 42(1): 404-425. https://doi.org/10.1016/j.bbe.2022.02.007

[9] Physiobank Archieve Index, MIT-BIH Arrhytmia Database.http://www.physionet.org/physiobank/database.

[10] Pan, J., Tompkins, W.J. (1985). A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering, BME-32(3): 230-236. https://doi.org/10.1109/TBME.1985.325532

[11] Friesen, G.M., Jannett, T.C., Jadallah, M.A., Yates, S.L., Quint, S.R., Nagle, H.T. (1990). A comparison of the noise sensitivity of nine QRS detection algorithms. IEEE Transactions on Biomedical Engineering, 37(1): 85-98. https://doi.org/10.1109/10.43620

[12] Szilagyi, L. (1999). Wavelet-transform-based QRS complex detection in on-line Holter systems. In 1999 IEEE Engineering in Medicine and Biology 21st Annual Conference and the 1999 Annual Fall Meeting of the Biomedical Engineering Society, Atlanta, GA, USA, p. 271. https://doi.org/10.1109/IEMBS.1999.802340

[13] Benitez, D.S., Gaydecki, P.A., Zaidi, A., Fitzpatrick, A.P. (2000). A new QRS detection algorithm based on the Hilbert transform. In Computers in Cardiology 2000, Cat. 00CH37163, Cambridge, MA, USA, pp. 379-382. https://doi.org/10.1109/CIC.2000.898536

[14] Chromik, J., Pirl, L., Beilharz, J., Arnrich, B., Polze, A. (2021). Certainty in QRS detection with artificial neural networks. Biomedical Signal Processing and Control, 68: 102628. https://doi.org/10.1016/j.bspc.2021.102628

[15] Ceylan, R., Özbay, Y., Karlik, B. (2014). Comparison of type-2 fuzzy clustering-based cascade classifier models for ECG arrhythmias. Biomedical Engineering: Applications, Basis and Communications, 26(6): 1450075. https://doi.org/10.4015/S1016237214500756

[16] Xu, C., Zhang, P., Li, B., Wu, D., Fan, H. (2013). Vague C-means clustering algorithm. Pattern Recognition Letters, 34(5): 505-510. https://doi.org/10.1016/j.patrec.2012.12.001

[17] Mukherjee, S., Das, A. (2020). Vague set theory based segmented image fusion technique for analysis of anatomical and functional images. Expert Systems with Applications, 159, 113592. https://doi.org/10.1016/j.eswa.2020.113592

[18] Barstuğan, M., Ceylan, R. (2020). The effect of dictionary learning on weight update of AdaBoost and ECG classification. Journal of King Saud University-Computer and Information Sciences, 32(10): 1149-1157. https://doi.org/10.1016/j.jksuci.2018.11.007

[19] Solak, A., Ceylan, R. (2022). Pneumonia detection with chest-caps. Traitement du Signal, 39(6): 2211-2216. https://doi.org/10.18280/ts.390636

[20] Koyuncu, H., Ceylan, R., Asoglu, S., Cebeci, H., Koplay, M. (2019). An extensive study for binary characterisation of adrenal tumours. Medical & Biological Engineering & Computing, 57: 849-862. https://doi.org/10.1007/s11517-018-1923-z

[21] Özbay, Y., Ceylan, R., Karlık, B. (2006). A fuzzy clustering neural network architecture for classification of ECG arrhythmias. Computers in Biology and Medicine, 36(4): 376-388. https://doi.org/10.1016/j.compbiomed.2005.01.006

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

ECG Arrhythmia Classification Using Vague C-Means Clustering Based on Multiresolution Analysis