Fine-grained electrocardiosignal classification method based on deep convolutional neural network and online decision fusion
    
      Technical Field
      The invention belongs to the field of signal classification, and particularly relates to a fine-grained electrocardiosignal classification method based on a deep convolutional neural network and on-line decision fusion.
    
    
      Background
      Electrocardiograms (ECGs), which record the depolarization repolarization process of the electrical activity of the heart during the cardiac cycle, are widely used to monitor or diagnose the condition of the heart in patients. Patients are often required to go to a hospital and be diagnosed by a trained, experienced cardiologist, which is expensive and inconvenient. Therefore, automated monitoring and diagnostic systems are highly desirable in clinics, community medical centers, and home healthcare programs. Although great progress has been made in the past decades in the filtering, detection and classification of electrocardiosignals, effective and accurate classification of electrocardiosignals remains a challenging problem due to noise and the type of symptoms that vary from patient to patient.
      Prior to classification, filtering is typically required to remove various noises from the ECG signal, including power line disturbances, baseline wander, muscle contraction noise, etc. Conventional approaches such as low pass filters and filter banks can reduce noise but can also cause some artifacts. Combining signal modeling and filtering together can alleviate this problem, but it is limited to a single type of noise. In recent years, different noise cancellation methods based on wavelet transform have been proposed, which shows great advantages in multi-resolution signal analysis. For example, s.poungponsri and x.h.yu propose an adaptive filtering method based on wavelet transform and artificial neural network, which can effectively remove different types of noise.
      For ECG classification, the classical approach typically includes two parts: feature extraction and classifier training. Feature extraction is usually performed in the time domain or frequency domain, and includes amplitude, interval, higher order statistics, and the like. Common methods are: filter bank, kalman filter, principal component analysis, frequency analysis, wavelet transform and statistical methods. The classification method generally includes: support vector machines, artificial neural networks, hidden markov models, and hybrid expert methods. Among them, a large number of methods are based on the strong modeling capability of the artificial neural network. For example, m.engin proposes an electrocardiographic signal classification method based on a fuzzy neural network, which is characterized by autoregressive model coefficients, high-order cumulants and wavelet transform variances. L.y.shyu et al propose a new method for detecting ventricular premature beats (VPCs) using wavelet transformation and fuzzy neural networks, which has a reduced computational complexity due to the use of the same wavelets for QRS detection and VPC classification. I.gurer and e.d.ubeyli suggest ECG signal classification using a combined neural network model. And extracting the statistical characteristics based on the discrete wavelet transform as the input of the primary network. The network is then trained using the output of the previous stage network as input. Inc, presents a new approach that uses a robust and versatile artificial neural network architecture and trains a patient-specific model with morphological wavelet transform features and temporal features for each patient.
      Although the above methods work well, there are some common drawbacks:
      1) artificial features rely on expert knowledge or experience and require careful design and testing. Moreover, the classifier needs to have appropriate modeling capability for the feature.
      2) The types of ECG signals are typically limited or coarse grained, e.g., 2-5. On the one hand, for a new type of electrocardiographic waveform, one should first examine existing distinctive features and redesign new features. On the other hand, they still present problems in fine-grained classification, as it requires more discriminative features and classifiers with better modeling capabilities.
    
    
      Disclosure of Invention
      Aiming at the defects of the prior art, the invention designs a fine-grained electrocardiosignal classification method based on deep convolutional neural network and on-line decision fusion. The present invention converts the raw ECG signal to the time-frequency domain by a short-time fourier transform (STFT). Then, the time-frequency characteristics of the signal are learned by the CNN of the two-dimensional convolution. Finally, an online decision fusion method is proposed to fuse past and current decisions of different models into a more accurate result.
      The invention comprises the following steps:
      acquiring and processing an electrocardiosignal waveform:
      step (1-1) acquisition of a data set: an SKX-2000ECG signal simulator is used for ECG waveform generation. The simulator can simulate electrocardiographic waveforms that produce different symptoms of various amplitudes and frequencies, including but not limited to normal, coarse atrial fibrillation, fine atrial fibrillation, atrial flutter and more than 20 electrocardiographic waveforms. The 20 types of ECG waveform signals are acquired by acquiring a certain amount of waveform signals corresponding to the types of ECG waveform signals under different parameters. The mathematical representation is:
      X={(xi(t),yi)|i∈Λ}(1) 
      wherein x isi(t) is the ith sample, yiE { 0., C-1} represents the cardiac signal xi(t) a class label, which shares a class C electrocardiosignal, and Λ is an index set of samples. x is the number ofi(t)=[xi(0),xi(1),...,xi(N-1)]TIs a time-sequential representation of the ith signal having a number N of sample points.
      Step (1-2) short-time Fourier transform: firstly, converting the original electrocardiosignals into a time-frequency domain through short-time Fourier transform to obtain an electrocardio-spectrogram of the electrocardiosignals. The mathematical representation process is as follows:
      
      wherein w (-) is a window function, the window of the invention adopts a Hamming window, the size is 256 sampling points, and the size of the overlapping area is 128 sampling points. S, si(k, m) is xiThe electrocardiogram having a 2-dimensional structure.
      Step (2) design of network structure: considering the use of 2-dimensional spectrograms as network inputs, a deep convolutional neural network structure including 2-dimensional convolution was designed, the designed network structure including 3 convolutional layers, 2 fully-connected layers, and 1 max pooling layer. The concrete structure is as follows:
      
      taking the electrocardiogram as input, and predicting a probability vector p for the deep neural network structure for the classification problemi=h(si,θh)∈RcAnd Pi||1=1,θhRepresenting a parameter to be learned of the network, which can be trained by minimizing a cross entropy loss function, represented as follows:
      
      wherein q isiIs corresponding to the category label yiThe one-hot vector of (c).
      In effect, the width of the electrocardiogram is related to the length of the electrocardiographic waveform signal for a given window function. Given a sampling rate, longer signals contain more beats. Generally, the detected and classified signals are single beat signals, but the more beats contain more information, the higher the accuracy of detection and classification. In the invention, after the sequence length and the sampling rate are determined, each sample is divided into a plurality of subsamples, and the length of each subsample is the same. The designed deep neural network model is then trained from the data set. In addition, in order to compare the performance of the model when testing longer samples, the samples are divided into sub-samples with different lengths, and corresponding deep neural network models are trained by using the sub-samples, and the models designed according to the different lengths of the samples are respectively expressed as h1-h6 according to the sizes of the sub-samples from small to large. Although training samples of different lengths will correspond to different widths of the spectrogram, the same architecture is used for all of the above models, as long as the pooling step size along the column is changed accordingly, while keeping the fully connected layer fixed.
      And (3) an online decision fusion algorithm: for in-line testing, the above model can be used to sequentially test signals at different times as the length of the signal increases. When the sample length is short, the decision result is often based on this sample length; when the sample length is longer, the decision result will be determined by the whole signal. These models can be viewed as different experts, focusing on different amounts of information. The decisions of different experts may be complementary and may possibly be fused to a more accurate decision. Therefore, an online fusion mode is proposed, which is specifically expressed as:
      
      wherein
Is the result of fusion, s
lE {1,2,3,4,5,6} is the order that corresponds to the longest length in the subsamples into which a signal of a particular length x can be divided; x is the number of
skIs when the model is h
sThe k-th part, k, of the sample x
sIs that the length is sample x corresponding to h
sThe number of hour samples. For example, the total length of the samples at the current time is 2048, so s
l=3,k
1=4,k
2=2,k
3=1。ω
sIs h
sFusion weights of the models, and
as can be seen from equation 4, the weights of the fused results for each part in the same subsample are equal and averaged. This result is reasonable because each part uses the same model, with no priority. When the subsamples are of different lengths, a weight is given to each model and their effects are compared in the final fused result.
To verify the effectiveness of the algorithm presented herein, an SKX-2000ECG signal simulator was used for ECG waveform generation. The simulator can simulate electrocardiographic waveforms that produce different symptoms of various amplitudes and frequencies, including but not limited to normal, coarse atrial fibrillation, fine atrial fibrillation, atrial flutter and more than 20 electrocardiographic waveforms. 19 types of symptom electrocardiographic waveforms and normal electrocardiographic waveforms are selected to carry out simulation experiments.
      For the above 20 types of ECG waveform signals, a certain amount of waveform signals corresponding to the different parameters of the type are respectively collected. Then, after removing the shorter and unwanted signals, a total of 2426 samples were collected, averaging about 120 samples per class, each sample containing 16384 points at the longest, and in the following experiments, 3-fold cross-validation was used to evaluate the proposed method.
      In the short-time Fourier transform, a Hamming window is adopted as a window, the size of the Hamming window is 256 sampling points, and the size of an overlapping area is 128 sampling points. And then training the designed network structure by using the obtained electrocardiogram. When the CNN model is trained, 20000 iterations are provided, the batch size of each iteration is 128, the basic learning rate is 0.01, the basic learning rate is reduced to 0.5 times every 5000 times, and the momentum and decay parameters are respectively set to 0.9 and 5 multiplied by 10-6. And then fusing the results of different models by adopting the fusion method.
      The above methods were all performed in caffe, and all experiments were performed on a workstation of Nvidia GeForce GTX Titan X (Maxwell) GPU.
      The invention has the beneficial effects that: by computing a short-time fourier transform of the original signal, one can learn from the time-frequency domain to a distinctive representation of the features. An online decision fusion method is proposed to fuse past and present decisions from different models into a more accurate decision. Experimental results on the synthesized 20-class ECG data sets demonstrate the effectiveness and efficiency of the proposed method. Furthermore, the proposed method is computationally efficient and hopefully integrated in a portable ECG monitoring instrument with limited computational resources.
    
    
      Drawings
      FIG. 1 is a flow chart of an algorithmic implementation of the method.
      Fig. 2 is a waveform diagram of an electrocardiographic signal.
      Figure 3 is an electrocardiogram corresponding to figure 2.
      FIG. 4 shows the fusion results and each slCorresponding single model results.
    
    
      Detailed Description
      The invention is further described below with reference to the accompanying drawings.
      As shown in fig. 1, the present invention provides a fine-grained electrocardiographic signal classification based on a deep convolutional neural network and on-line decision fusion, which includes the following steps:
      acquiring and processing an electrocardiosignal waveform:
      step (1-1) acquisition of a data set: an SKX-2000ECG signal simulator is used for ECG waveform generation. The simulator can simulate electrocardiographic waveforms that produce different symptoms of various amplitudes and frequencies, including but not limited to normal, coarse atrial fibrillation, fine atrial fibrillation, atrial flutter and more than 20 electrocardiographic waveforms. The 20 types of ECG waveform signals are acquired by acquiring a certain amount of waveform signals corresponding to the types of ECG waveform signals under different parameters. The mathematical representation is:
      X={(xi(t),yi)|i∈Λ}   (1)
      wherein x isi(t) is the ith sample, yiE { 0., C-1} represents the cardiac signal xi(t) a class label, which shares a class C electrocardiosignal, and Λ is an index set of samples. x is the number ofi(t)=[xi(0),xi(1),...,xi(N-1)]TThe number of sampling points being NTime-sequential representation of the ith signal. Specifically, the electrocardiographic signal waveform is shown in fig. 2, where the horizontal axis represents time in s and the vertical axis represents amplitude μ V.
      Step (1-2) short-time Fourier transform: firstly, converting the original electrocardiosignals into a time-frequency domain through short-time Fourier transform to obtain an electrocardio-spectrogram of the electrocardiosignals. The mathematical representation process is as follows:
      
      wherein w (-) is a window function, the window of the invention adopts a Hamming window, the size is 256 sampling points, and the size of the overlapping area is 128 sampling points. si(k, m) is xiThe electrocardiogram having a 2-dimensional structure. Fig. 3 shows an electrocardiograph corresponding to fig. 2, where the horizontal axis represents time and the vertical axis represents frequency. The time-frequency characteristics of the signals can be observed simultaneously.
      Step (2) design of network structure: considering the use of 2-dimensional spectrograms as network inputs, a deep convolutional neural network structure including 2-dimensional convolution was designed, the designed network structure including 3 convolutional layers, 2 fully-connected layers, and 1 max pooling layer.
      Network architecture designed in table 1
      
      Taking the electrocardiogram as input, and predicting a probability vector p for the deep neural network structure for the classification problemi=h(si,θh)∈RcAnd Pi||1=1,θhRepresenting a parameter to be learned of the network, which can be trained by minimizing a cross entropy loss function, represented as follows:
      
      wherein q isiIs corresponding to the category label yiThe one-hot vector of (a) is,
      in effect, the width of the electrocardiogram is related to the length of the electrocardiographic waveform signal for a given window function. Given a sampling rate, longer signals contain more beats. Generally, the detected and classified signals are single beat signals, but the more beats contain more information, the higher the accuracy of detection and classification. In the invention, after the sequence length and the sampling rate are determined, each sample is divided into a plurality of subsamples, and the length of each subsample is the same. The designed deep neural network model (CNN model) is then trained from the data set. In addition, in order to compare the performance of the model when testing longer samples, the samples are divided into sub-samples with different lengths, and corresponding CNN models are trained by the sub-samples, and the models designed according to the different lengths of the samples are respectively expressed as h1-h6 according to the sizes of the sub-samples from small to large. Although training samples of different lengths will correspond to different widths of the spectrogram, the same architecture is used for all of the above models, as long as the pooling step size along the column is changed accordingly, while keeping the fully connected layer fixed.
      To verify the validity of the method in learning the feature representation, it is checked by the learned features. For example, learning features of all training data are obtained by calculating responses of the second last layer, and then the first three principal component vectors are obtained by principal component analysis.
      And (3) an online decision fusion algorithm: for in-line testing, the above model can be used to sequentially test signals at different times as the length of the signal increases. When the sample length is short, the decision result is often based on this sample length; when the sample length is longer, the decision result will be determined by the whole signal. These models can be viewed as different experts, focusing on different amounts of information. The decisions of different experts may be complementary and may possibly be fused to a more accurate decision. Therefore, an online fusion mode is proposed, which is specifically expressed as:
      
      wherein
Is the result of fusion, s
lE {1,2,3,4,5,6} is the order that corresponds to the longest length in the subsamples into which a signal of a particular length x can be divided; x is the number of
skIs when the model is h
sThe k-th part, k, of the sample x
sIs the sample x corresponds to h
sThe number of hour samples. For example, the total length of the samples at the current time is 2048, so s
l=3,k
1=4,k
2=2,k
3=1。ω
sIs h
sFusion weights of the models, and
as can be seen from equation 4, the weights of the fused results for each part in the same subsample are equal and averaged. This result is reasonable because each part uses the same model, with no priority. When the subsamples are of different lengths, a weight is given to each model and their effects are compared in the final fused result.
Two fusion weight definitions are considered: the mean weight and weight increase with higher model level (i.e., longer subsample length). The fusion weight in the latter case is calculated according to the following equation
      
      FIG. 4 shows the fusion results and each slCorresponding single model results. First, it can be seen that model h4 achieves the best performance among all the different levels of models, perhaps because it balances the decision on data length and number of models. Compared to model h1, the input data is 16 times longer. Compared with model h6, model h6 can make only a single decision on the entire sequence, whereas model h4 can make 4 decisions on four different subsequences, which can be merged into oneA more accurate subsequence. It can then be seen that the fused results are consistently better than the single model results, and performance continues to improve as the data length grows. The method proves that the decision of fusing different models can obtain a more robust and more accurate decision, because the models contain different ranges of original data when sample training is carried out. Furthermore, the use of non-uniform weights does not have any advantage over the use of uniform weights. And (4) assuming that the uneven weight strategy is biased to the model with the maximum data volume, and ignoring the decision of the model with the smaller data volume. Although it is superior to the single model case, the boost is very limited, especially at higher levels, where performance depends heavily on the high level model.
      To verify the validity of the algorithm proposed by the present invention, an SKX-2000ECG signal simulator was used for ECG waveform generation. The simulator can simulate electrocardiographic waveforms that produce different symptoms of various amplitudes and frequencies, including but not limited to normal, coarse atrial fibrillation, fine atrial fibrillation, atrial flutter and more than 20 electrocardiographic waveforms. 19 types of symptom electrocardiographic waveforms and normal electrocardiographic waveforms are selected to carry out simulation experiments.
      For the above 20 types of ECG waveform signals, a certain amount of waveform signals corresponding to the different parameters of the type are respectively collected. Then, after removing the shorter and unwanted signals, a total of 2426 samples were collected, averaging about 120 samples per class, each sample containing 16384 points at the longest, and in the following experiments, 3-fold cross-validation was used to evaluate the proposed method.
      In the short-time Fourier transform, a Hamming window is adopted as a window, the size of the Hamming window is 256 sampling points, and the size of an overlapping area is 128 sampling points. And then training the designed network structure by using the obtained electrocardiogram. When the CNN model is trained, 20000 iterations are provided, the batch size of each iteration is 128, the basic learning rate is 0.01, the basic learning rate is reduced to 0.5 times every 5000 times, and the momentum and decay parameters are respectively set to 0.9 and 5 multiplied by 10-6. And then fusing the results of different models by adopting the fusion method.
      The above methods were all performed in caffe, and all experiments were performed on a workstation of Nvidia GeForce GTX Titan X (Maxwell) GPU.
      The simulation results are as follows: the test results of the proposed method and other models are shown in table 2, and the run times of the proposed method models at different levels are shown in table 3.
      TABLE 2 test results of different models
      
      
      TABLE 3 run time of different level models