CN112989971A - Electrocardiogram data fusion method and device for different data sources - Google Patents

Electrocardiogram data fusion method and device for different data sources Download PDF

Info

Publication number
CN112989971A
CN112989971A CN202110224552.0A CN202110224552A CN112989971A CN 112989971 A CN112989971 A CN 112989971A CN 202110224552 A CN202110224552 A CN 202110224552A CN 112989971 A CN112989971 A CN 112989971A
Authority
CN
China
Prior art keywords
data
electrocardiogram
label
electrocardiographic
different
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110224552.0A
Other languages
Chinese (zh)
Other versions
CN112989971B (en
Inventor
朱佳兵
朱涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zoncare Bio Medical Electronics Co ltd
Original Assignee
Wuhan Zoncare Bio Medical Electronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zoncare Bio Medical Electronics Co ltd filed Critical Wuhan Zoncare Bio Medical Electronics Co ltd
Priority to CN202110224552.0A priority Critical patent/CN112989971B/en
Publication of CN112989971A publication Critical patent/CN112989971A/en
Application granted granted Critical
Publication of CN112989971B publication Critical patent/CN112989971B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Psychiatry (AREA)
  • Pathology (AREA)
  • Veterinary Medicine (AREA)
  • Public Health (AREA)
  • Physiology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Signal Processing (AREA)
  • Surgery (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Fuzzy Systems (AREA)
  • Evolutionary Biology (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

The invention relates to an electrocardiogram data fusion method of different data sources, which comprises the following steps: acquiring the electrocardiogram data marked with the initial label from different data sources, and establishing an electrocardiogram data set; preprocessing the electrocardio data in the electrocardio data set; clustering the preprocessed electrocardiogram data set through unsupervised deep clustering to obtain a plurality of clustering clusters; respectively counting the probability of each type of label in the electrocardio data initial label of each cluster; and fusing the central electrical data of each cluster based on the probability statistical result to obtain a fused electrocardiographic data set. The method can fuse the electrocardiogram data of different data sources, remove noise labels and facilitate the subsequent training of the model.

Description

Electrocardiogram data fusion method and device for different data sources
Technical Field
The invention relates to the technical field of classification and labeling processing of electrocardiogram data, in particular to an electrocardiogram data fusion method and device of different data sources and a computer storage medium.
Background
In the actual electrocardiographic data acquisition process, there may be multiple sources of electrocardiographic data that we acquire, such as from hospital a, hospital B, or hospital C. The electrocardiographic data of the same hospital comes from different departments, such as a hospital department, an electrocardiogram room, a chest pain center, a physical examination center, an emergency treatment center, etc., and possibly from a primary hospital or a 120 emergency treatment process. The electrocardiographic data from different sources has 12 static leads and 12 dynamic leads. These data often differ in sampling frequency, signal quality, diagnostic decision writing and custom. Taking the ventricular escape rhythm as an example, some older electrocardiographs may still follow the previous habit and label it as a "ventricular escape rhythm" while younger physicians may label it as a "ventricular spontaneous rhythm". At present, when the problem that initial labels of 'similar' electrocardiograms are inconsistent due to different personal analysis levels of doctors and knowledge inheritance of hospitals is solved, two main treatment methods are available:
1. the method comprises the steps of removing initial labels in electrocardiogram data, and then randomly distributing the initial labels to two experienced electrocardiographs to enable the electrocardiographs to label independently. If the two labeling results are not consistent, the labeling result is handed to a labeling bifurcation arbitrator for subsequent processing;
2. an electrocardiogram classification algorithm, which can be a traditional algorithm or a neural network algorithm, is established first, and then electrocardiogram data is predicted. Finally, comparing the predicted label output by the model with the initial label, and if the predicted label is the same as the initial label, keeping the initial label; otherwise, the corresponding electrocardiogram data is delivered to the doctor for secondary calibration.
In the first method, when the data size is large, the labor and financial costs required for labeling are large. In the second method, data used for training the model are multi-source, model optimization is carried out by taking an initial label as a guide, the initial label often has large noise, and labeling errors can be caused by the non-uniformity of diagnosis labels of different hospitals, personal levels of doctors or difficulty in interpretation of certain electrocardiograms and the like, so that the finally trained model has limited generalization capability, and the electrocardio data volume screened by the model and needing secondary calibration of doctors is large.
Disclosure of Invention
In view of the above, there is a need to provide a method and an apparatus for fusing electrocardiographic data of different data sources, and a computer storage medium, so as to solve the problem that when electrocardiographic data of multiple data sources is involved, the generalization capability of a trained model is poor due to a noise tag, so that the amount of data required to be secondarily calibrated is large.
The invention provides an electrocardiogram data fusion method of different data sources, which comprises the following steps:
acquiring the electrocardiogram data marked with the initial label from different data sources, and establishing an electrocardiogram data set;
preprocessing the electrocardio data in the electrocardio data set;
clustering the preprocessed electrocardiogram data set through unsupervised deep clustering to obtain a plurality of clustering clusters;
respectively counting the probability of each type of label in the electrocardio data initial label of each cluster;
and fusing the central electrical data of each cluster based on the probability statistical result to obtain a fused electrocardiographic data set.
Further, acquiring the electrocardiographic data marked with the initial tags from different data sources, and establishing an electrocardiographic data set, specifically:
acquiring different types of electrocardiogram data from different data sources, wherein the quantity of the selected electrocardiogram data of the same type from the different data sources is within the same set range, and obtaining the electrocardiogram data set.
Further, preprocessing the electrocardiographic data in the electrocardiographic data set specifically comprises:
converting the electrocardiogram data into space vector data for extracting the space characteristics of the electrocardiogram data;
extracting second lead data in the electrocardiogram data for extracting time domain characteristics of the electrocardiogram data;
and acquiring a spectrogram of the electrocardiogram data, and extracting frequency domain characteristics of the electrocardiogram data.
Further, clustering the preprocessed electrocardiogram data set through unsupervised deep clustering to obtain a plurality of clustering clusters, which specifically comprises the following steps:
selecting different neural network structures, and respectively extracting spatial features, time domain features and frequency domain features of the electrocardio data;
building an electrocardiogram network based on different neural network structures;
training the electrocardiogram network by adopting the electrocardiogram data in the electrocardiogram data set to obtain a classification model;
clustering the electrocardiogram data set based on the electrocardiogram data features extracted by the electrocardiogram network;
labeling the electrocardio data with a pseudo label based on the clustering result;
comparing the pseudo label with a prediction label output by the classification model, and calculating a loss value;
performing back propagation training on the classification model based on the loss value;
and judging whether a termination condition is reached, if so, stopping training, outputting a clustering result to obtain a plurality of clustering clusters, and otherwise, training the electrocardiogram network by adopting the next electrocardiogram data in the electrocardiogram data set.
Further, different neural network structures are selected and used for extracting spatial features, time domain features and frequency domain features of the electrocardiographic data respectively, and the method specifically comprises the following steps:
selecting a CNN network for extracting spatial features of the electrocardiogram data;
selecting an LSTM network for extracting time domain characteristics of the electrocardio data;
and selecting a CNN network for extracting the frequency domain characteristics of the electrocardio data.
Further, an electrocardiogram network is built based on different neural network structures, and specifically the method comprises the following steps:
and sequentially setting a first CNN network, an LSTM network, a second CNN network and a full connection layer to obtain the electrocardiogram network.
Further, the probability of each type of label in the initial label of the electrocardiographic data of each cluster is counted, and the method specifically comprises the following steps:
judging whether the initial tag of the electrocardiogram data is a single tag, if so, keeping the single tag unchanged, otherwise, splitting the initial tag into a plurality of single tags;
counting the occurrence frequency of each type of label in the single label;
and calculating the probability of the occurrence of each type of label according to the statistical times.
Further, the central electrical data of each cluster is fused based on the probability statistical result to obtain a fused electrocardiographic data set, which specifically comprises:
dividing the electrocardiogram data with the probability of the occurrence of the label larger than the upper limit value into high-quality label data;
dividing the electrocardiogram data with the probability of the occurrence of the label smaller than the lower limit value into noise label data, and re-calibrating the noise label data;
dividing the electrocardiogram data with the probability of the occurrence of the label between an upper limit value and a lower limit value into clinical label data;
and combining the high-quality label data, the noise label data after re-calibration and the clinical label data to obtain a fused electrocardiogram data set.
The invention also provides an electrocardiogram data fusion device with different data sources, which comprises a processor and a memory, wherein the memory is stored with a computer program, and the computer program is executed by the processor to realize the electrocardiogram data fusion method with the different data sources.
The invention also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for fusing the electrocardiogram data of the different data sources is realized.
Has the advantages that: the invention provides a two-stage multi-source data fusion method based on a deep clustering and statistical method. The method comprises the following steps that firstly, after electrocardio data from different sources are preprocessed, clustering results are obtained through an unsupervised deep clustering training model; and in the second stage, counting out possible labels of each cluster obtained by clustering according to the initial labels, and finally screening out data with high label error possibility according to a statistical result to carry out secondary calibration on the data for doctors. When the method is used for training the model, the initial label containing the noise label is not used as a training basis, and the characteristics of the electrocardio data are only utilized, so that the negative influence of the noise label on the training model is avoided.
Drawings
FIG. 1 is a flowchart of a first embodiment of a method for fusing ECG data from different data sources according to the present invention;
FIG. 2 is a superimposed view of different leads of the same ECG data;
FIG. 3 is a schematic diagram of Z-axis data of space vector data obtained by transforming electrocardiographic data according to the present invention;
FIG. 4a is a schematic diagram of 8 neighborhoods of neural network convolution in the present invention;
FIG. 4b is a schematic diagram of a cross convolution according to the present invention;
FIG. 5 is a schematic diagram of the ECG network of the present invention;
FIG. 6 is a schematic diagram of the training process of the ECG network of the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
Example 1
As shown in fig. 1, embodiment 1 of the present invention provides an electrocardiographic data fusion method with different data sources, including the following steps:
s1, acquiring the electrocardiogram data marked with the initial labels from different data sources, and establishing an electrocardiogram data set;
s2, preprocessing the electrocardio data in the electrocardio data set;
s3, clustering the preprocessed electrocardiogram data set through unsupervised deep clustering to obtain a plurality of clustering clusters;
s4, respectively counting the probability of each type of label in the electrocardio data initial label of each cluster;
and S5, fusing the central electrical data of each cluster based on the probability statistical result to obtain a fused electrocardiographic data set.
The embodiment provides a two-stage multi-source data fusion method based on a deep clustering and statistical method. In the first stage, after preprocessing is carried out on electrocardiogram data from different sources, a clustering result is obtained through an unsupervised deep clustering training model, then a possible label of each clustering cluster obtained through clustering is counted according to an initial label, and finally data with high label error probability are screened out according to the counting result to carry out secondary calibration on doctors. In the embodiment, when the model is trained, because the initial label containing the noise label is not used as the training basis, the characteristics of the electrocardiogram data are only utilized, so that the negative influence of the noise label on the training model is avoided; and in the second stage, each specific label is determined in a statistical mode, so that random noise in the label is eliminated to a certain extent.
Preferably, the electrocardiographic data marked with the initial tag is collected from different data sources, and an electrocardiographic data set is established, specifically:
acquiring different types of electrocardiogram data from different data sources, wherein the quantity of the selected electrocardiogram data of the same type from the different data sources is within the same set range, and obtaining the electrocardiogram data set.
In this embodiment, there are M different data sources of ecg data, where we first determine the ecg type to be processed, such as sinus rhythm, sinus bradycardia/tachycardia/arrhythmia, atrial premature beat/tachycardia, atrial flutter, atrial fibrillation, junctional escape, ventricular escape, accelerated escape heart rate, ventricular premature beat/tachycardia, supraventricular tachycardia, left bundle branch block, ventricular block, right bundle branch block, etc. Then, various types of electrocardio data with the same quantity are selected from different data sources respectively to form an initial electrocardio data set for analysis, and corresponding initial labels are recorded. And after the electrocardiogram data set is established, carrying out data preprocessing operation on the electrocardiogram data set.
Preferably, the preprocessing is performed on the electrocardiographic data in the electrocardiographic data set, and specifically comprises the following steps:
converting the electrocardiogram data into space vector data for extracting the space characteristics of the electrocardiogram data;
extracting second lead data in the electrocardiogram data for extracting time domain characteristics of the electrocardiogram data;
and acquiring a spectrogram of the electrocardiogram data, and extracting frequency domain characteristics of the electrocardiogram data.
In order to facilitate the extraction of the characteristics of the electrocardiogram data in the subsequent model training process, the electrocardiogram data is preprocessed firstly. Specifically, as shown in fig. 6, in the data preprocessing stage, the embodiment converts the original electrocardiographic data including a plurality of leads (I lead, II lead, V6 lead) into three types of data: 1. space vector data which can reflect space characteristics can be further embodied, the space vector data are established in an orthogonal coordinate system and comprise X, Y, Z triaxial data; 2. most of the II lead data of the time domain information is reserved, and the II lead is the second lead; 3. spectrograms of II-lead, V5-lead data covering rich frequency domain information.
Converting the electrocardiogram data into space vector data, specifically comprising:
X=-0.172*V1-0.074*V2+0.122*V3+0.231*V4+0.239*V5+0.194*V6+0.156*I-0.010*II;
Y=0.057*V1-0.019*V2-0.106*V3-0.022*V4+0.041*V5+0.048*V6-0.227*I+0.887*II;
Z=-0.229*V1-0.310*V2-0.246*V3-0.063*V4+0.055*V5+0.108*V6+0.022*I+0.102*II;
x, Y, Z is three dimensions of the spatial vector data, and V1, V2, V3, V4, V5, V6, and I, II respectively represent voltage values of a V1 lead, a V2 lead, a V3 lead, a V4 lead, a V5 lead, a V6 lead, an I lead, and a II lead of the electrocardiographic data.
FIG. 2 is a superimposed view of the same ECG data and different leads (I, II, V1, V2, V3, V4, V5 and V6). As can be seen from fig. 2, the following is a visual representation: for different leads of the same electrocardiogram data, some leads have higher relevance; different time data show a time series correlation for the same lead. There are many studies to well complete the diagnosis of the electrocardiographic rhythm only by using the II lead data in the MIT-BIH data set, so we select II lead to extract the rhythm feature, i.e. the time domain feature.
Since the industry generally only selects the II lead and the V5 lead for spectral electrocardiographic analysis, we also follow this convention here to extract frequency domain features from the data of these two leads. Acquiring a spectrogram of the electrocardiographic data, which specifically comprises the following steps:
equally dividing the electrocardiogram data into a plurality of segments, and generally taking 1-5 seconds as one segment;
performing fast Fourier transform on each segment to obtain a spectrogram of each segment;
normalizing the spectrogram of each segment of the same lead:
Figure BDA0002956604240000071
wherein G isfNormalizing the processed spectrogram, EW, for the ith segment of lead dataiRepresents the ith segment of the lead data, FFT () represents the fast Fourier transform, max () represents the maximum value, EWjRepresents the jth segment of lead data, j ═ 1,2, …, N is the total number of segments of the lead;
the window function selected by the fast Fourier transform is a Hamming window:
Figure BDA0002956604240000072
wherein w (N) is a window function value, N is a data value in a segment, and N is the total number of the segments of the lead;
splicing the spectrograms of all segments of the same lead to obtain the spectrograms of all leads;
the spectrogram of each lead is set to the same dimension.
Because the energy of the electrocardiographic data is mainly concentrated in the low-frequency part of the range of 0-25Hz, only the first 25% of the spectral coefficients are selected to reduce the dimensionality of the input data. And directly splicing the segment frequency spectrums of the leads to obtain the frequency spectrum chart of the whole lead. In this embodiment, for the convenience of calculation, the spectrogram of each lead is set to the same dimension: 125*200.
After data preprocessing is finished, an electrocardiogram model is built for cluster training.
Preferably, the preprocessed electrocardiographic data set is clustered through unsupervised deep clustering to obtain a plurality of clustering clusters, which specifically comprises the following steps:
selecting different neural network structures, and respectively extracting spatial features, time domain features and frequency domain features of the electrocardio data;
building an electrocardiogram network based on different neural network structures;
training the electrocardiogram network by adopting the electrocardiogram data in the electrocardiogram data set to obtain a classification model;
clustering the electrocardiogram data set based on the electrocardiogram data features extracted by the electrocardiogram network;
labeling the electrocardio data with a pseudo label based on the clustering result;
comparing the pseudo label with a prediction label output by the classification model, and calculating a loss value;
performing back propagation training on the classification model based on the loss value;
and judging whether a termination condition is reached, if so, stopping training, outputting a clustering result to obtain a plurality of clustering clusters, and otherwise, training the electrocardiogram network by adopting the next electrocardiogram data in the electrocardiogram data set.
The classifiers of different types have different feature expression capacities on the same data, so that three neural networks are set up from the three angles of time domain, frequency domain and spatial information respectively, and the three neural networks respectively extract different features of the electrocardiogram data. Meanwhile, deep clustering is introduced into the classification task of the electrocardiogram data, so that negative effects caused by high-noise labels in the multi-source data are effectively avoided. Specifically, as shown in fig. 6, after the electrocardiographic data is preprocessed, the electrocardiographic data is first input into an electrocardiographic network built by three different neural networks, the central electrocardiographic network in this embodiment is a ResNet electrocardiographic network built by a first CNN network, an LSTM network and a second CNN network, and the first CNN network specifically selects an FCN network; merging the features obtained after the training of the three neural networks; then, feature fusion is carried out through two full connection layers (namely FC layers in figure 6), and the fused features are sent to a clustering network for clustering, wherein the clustering mode can be k-means clustering, k is equal to the category number of the electrocardiogram data, and can also be hierarchical clustering or density clustering, and the like; calculating the clustering result serving as a pseudo label of the electrocardiogram data and a prediction label given by the classification model to obtain a Loss value (namely a Loss value in the graph 6); then, performing backward propagation according to gradient descent, and correcting the classification model; and when the classification model is trained until the loss value is smaller than a set value or reaches the specified training times, extracting a clustering result for further analysis.
Preferably, different neural network structures are selected and used for extracting spatial features, time domain features and frequency domain features of the electrocardiographic data respectively, and specifically the method comprises the following steps:
selecting a CNN network for extracting spatial features of the electrocardiogram data;
selecting an LSTM network for extracting time domain characteristics of the electrocardio data;
and selecting a CNN network for extracting the frequency domain characteristics of the electrocardio data.
The CNN network is suitable for extracting spatial features, and the LSTM network can effectively extract time domain features. Therefore, three different networks are used for respectively extracting the spatial characteristics, the time domain characteristics and the frequency domain characteristics of the electrocardio data.
1. For the space vector data, a CNN network is mainly used for extracting space features, such as DCN, FCN, ResNet, AlexNet, VGG, and the like, and is referred to as a first CNN network, and in this embodiment, the first CNN network is an FCN network.
2. For II lead data, the LSTM network is mainly used to extract the time domain features, and is recorded as the LSTM network.
3. For the extraction of spectral features, we also use the CNN network, denoted as the second CNN network.
Preferably, the electrocardiogram network is built based on different neural network structures, and specifically comprises:
and sequentially setting a first CNN network, an LSTM network, a second CNN network and a full connection layer to obtain the electrocardiogram network.
The space vector data after the electrocardiographic data conversion can still be regarded as sequence time data, specifically as shown in fig. 3, and fig. 3 is a Z-axis illustration of the electrocardiographic data after being converted into the space vector data. Therefore, for the first CNN network, the present embodiment adopts a cross convolution mode more suitable for such data. Convolution in a neural network refers to an 8-neighborhood structure. As shown in fig. 4 a. However, the cross convolution method, i.e. the values of the four opposite angles are always kept to be 0 when performing convolution calculation, is only required, as shown in fig. 4 b. The 8-neighborhood structure used for calculating the convolution in the network A is replaced by the 4-neighborhood structure in the shape of a cross, so that the interference of information is effectively reduced.
For the second CNN network, since the spectral data used for training the network has been processed like a picture, the convolution here still uses the conventional neural network convolution, i.e. the structure of 8 neighborhoods.
Based on three different neural networks, a 34-layer ResNet electrocardiogram network is built in the embodiment, and the structure diagram of the network is shown in FIG. 5. As can be seen from FIG. 5, the ECG data first enters the input layer; then entering a first CNN network, wherein the first CNN network comprises three convolutional layers, a batch normalization layer and a ReLU layer; then entering an LSTM network, wherein the LSTM network comprises three convolutional layers, a batch normalization layer, a ReLU layer, a Dropout layer, three convolutional layers and a maximum pooling layer; then entering a second CNN network, wherein the second CNN network comprises a batch normalization layer, a ReLU layer, three convolution layers, a batch normalization layer, a ReLU layer, a Dropout layer, three batch normalization layers and a maximum pooling layer; and finally, entering a full connection layer, wherein the full connection layer comprises a batch normalization layer, a ReLU layer and a Dense layer, and outputting through an output layer.
Preferably, the probability of each type of label in the initial label of the electrocardiographic data of each cluster is counted, and the method specifically comprises the following steps:
judging whether the initial tag of the electrocardiogram data is a single tag, if so, keeping the single tag unchanged, otherwise, splitting the initial tag into a plurality of single tags;
counting the occurrence frequency of each type of label in the single label;
and calculating the probability of the occurrence of each type of label according to the statistical times.
Analyzing each cluster obtained after clustering respectively, wherein the method comprises the following steps:
1. and counting the unique single label of the initial label of the electrocardio data in the cluster, and if multiple labels exist, the multiple labels are required to be disassembled into a single label, wherein the single label refers to a single type of label. And counting the occurrence frequency of each type of single label, wherein the statistical result of the embodiment is shown in the following table:
TABLE 1 statistics of single tag occurrence
Figure BDA0002956604240000101
Figure BDA0002956604240000111
2. And sorting the single labels from large to small according to the occurrence times of the single labels. And calculating the corresponding probability of occurrence:
Figure BDA0002956604240000112
wherein p isiRepresenting the probability of occurrence of a single label of the ith class, niThe number of occurrences of the ith type of single tag is represented, N is the number of categories of the single tag, which is 17 in this embodiment;
the statistical probability of occurrence of each type of single tag in this embodiment is shown in the following table:
TABLE 2 statistical results of single label occurrence probability
Figure BDA0002956604240000113
Figure BDA0002956604240000121
Preferably, the central electrical data of each cluster is fused based on the probability statistical result to obtain a fused electrocardiographic data set, which specifically comprises:
dividing the electrocardiogram data with the probability of the occurrence of the label larger than the upper limit value into high-quality label data;
dividing the electrocardiogram data with the probability of the occurrence of the label smaller than the lower limit value into noise label data, and re-calibrating the noise label data;
dividing the electrocardiogram data with the probability of the occurrence of the label between an upper limit value and a lower limit value into clinical label data;
and combining the high-quality label data, the noise label data after re-calibration and the clinical label data to obtain a fused electrocardiogram data set.
Specifically, the fusion steps are as follows:
1. the electrocardiogram data of which the initial labels only contain the single labels with the probability ranks of 30% later are selected, because the probability of the single labels of the electrocardiogram data appearing in the category is low, if the electrocardiogram data contains the single labels with the low probability, the electrocardiogram data can be considered as label noise data. And (4) submitting the electrocardio data to a doctor for secondary calibration, and then storing the electrocardio data into a database to be remarked as expert label data.
2. And selecting the electrocardio data of which the initial labels only comprise the single labels with the top 40% of the ranks. It is considered that the initial signature of the electrocardiographic data is correct with high probability. And storing the electrocardiogram data into a database, and noting as high-quality clinical label data.
3. And storing the rest electrocardio data into a database, and remarking the electrocardio data as clinical label data.
By the method, the screened electrocardio data needing secondary calibration (the label is error probably) is about 9 percent, and the probability of the secondary calibration data is low.
Example 2
Embodiment 2 of the present invention provides an electrocardiographic data fusion apparatus with different data sources, which includes a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the electrocardiographic data fusion method with different data sources provided in embodiment 1 is implemented.
The electrocardiogram data fusion device with different data sources provided by the embodiment of the invention is used for realizing the electrocardiogram data fusion method with different data sources, so that the electrocardiogram data fusion method with different data sources has the technical effects, and the electrocardiogram data fusion device with different data sources also has the technical effects, and the details are not repeated herein.
Example 3
Embodiment 3 of the present invention provides a computer storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements the method for fusing electrocardiographic data of different data sources provided in embodiment 1.
The computer storage medium provided by the embodiment of the invention is used for realizing the electrocardio-data fusion methods of different data sources, so that the electrocardio-data fusion methods of different data sources have the technical effects, and the computer storage medium also has the technical effects, and is not described herein again.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. An electrocardiogram data fusion method of different data sources is characterized by comprising the following steps:
acquiring the electrocardiogram data marked with the initial label from different data sources, and establishing an electrocardiogram data set;
preprocessing the electrocardio data in the electrocardio data set;
clustering the preprocessed electrocardiogram data set through unsupervised deep clustering to obtain a plurality of clustering clusters;
respectively counting the probability of each type of label in the electrocardio data initial label of each cluster;
and fusing the central electrical data of each cluster based on the probability statistical result to obtain a fused electrocardiographic data set.
2. The method for fusing the electrocardiographic data of different data sources according to claim 1, wherein the electrocardiographic data marked with the initial tags are collected from different data sources to establish an electrocardiographic data set, and specifically, the method comprises the following steps:
acquiring different types of electrocardiogram data from different data sources, wherein the quantity of the selected electrocardiogram data of the same type from the different data sources is within the same set range, and obtaining the electrocardiogram data set.
3. The method for fusing the electrocardiographic data of different data sources according to claim 1, wherein the preprocessing is performed on the electrocardiographic data in the electrocardiographic data set, and specifically comprises:
converting the electrocardiogram data into space vector data for extracting the space characteristics of the electrocardiogram data;
extracting second lead data in the electrocardiogram data for extracting time domain characteristics of the electrocardiogram data;
and acquiring a spectrogram of the electrocardiogram data, and extracting frequency domain characteristics of the electrocardiogram data.
4. The method for fusing the electrocardiographic data of different data sources according to claim 1, wherein the preprocessed electrocardiographic data set is clustered through unsupervised deep clustering to obtain a plurality of clusters, specifically:
selecting different neural network structures, and respectively extracting spatial features, time domain features and frequency domain features of the electrocardio data;
building an electrocardiogram network based on different neural network structures;
training the electrocardiogram network by adopting the electrocardiogram data in the electrocardiogram data set to obtain a classification model;
clustering the electrocardiogram data set based on the electrocardiogram data features extracted by the electrocardiogram network;
labeling the electrocardio data with a pseudo label based on the clustering result;
comparing the pseudo label with a prediction label output by the classification model, and calculating a loss value;
performing back propagation training on the classification model based on the loss value;
and judging whether a termination condition is reached, if so, stopping training, outputting a clustering result to obtain a plurality of clustering clusters, and otherwise, training the electrocardiogram network by adopting the next electrocardiogram data in the electrocardiogram data set.
5. The method for fusing the electrocardiographic data of different data sources according to claim 4, wherein different neural network structures are selected and used for extracting spatial features, time domain features and frequency domain features of the electrocardiographic data, and specifically:
selecting a CNN network for extracting spatial features of the electrocardiogram data;
selecting an LSTM network for extracting time domain characteristics of the electrocardio data;
and selecting a CNN network for extracting the frequency domain characteristics of the electrocardio data.
6. The method for fusing the electrocardiographic data of different data sources according to claim 5, wherein an electrocardiographic network is built based on different neural network structures, and specifically comprises the following steps:
and sequentially setting a first CNN network, an LSTM network, a second CNN network and a full connection layer to obtain the electrocardiogram network.
7. The method for fusing the electrocardiographic data of different data sources according to claim 1, wherein the statistics of the probability of each type of label in the electrocardiographic data initial label of each cluster specifically comprises:
judging whether the initial tag of the electrocardiogram data is a single tag, if so, keeping the single tag unchanged, otherwise, splitting the initial tag into a plurality of single tags;
counting the occurrence frequency of each type of label in the single label;
and calculating the probability of the occurrence of each type of label according to the statistical times.
8. The method for fusing the electrocardiographic data of different data sources according to claim 1, wherein the method for fusing the electrocardiographic data of the clusters based on the probability statistical result to obtain a fused electrocardiographic data set comprises:
dividing the electrocardiogram data with the probability of the occurrence of the label larger than the upper limit value into high-quality label data;
dividing the electrocardiogram data with the probability of the occurrence of the label smaller than the lower limit value into noise label data, and re-calibrating the noise label data;
dividing the electrocardiogram data with the probability of the occurrence of the label between an upper limit value and a lower limit value into clinical label data;
and combining the high-quality label data, the noise label data after re-calibration and the clinical label data to obtain a fused electrocardiogram data set.
9. An apparatus for fusing electrocardiographic data of different data sources, comprising a processor and a memory, wherein the memory stores a computer program, and the computer program, when executed by the processor, implements the method for fusing electrocardiographic data of different data sources according to any one of claims 1 to 8.
10. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a method for fusing electrocardiographic data of different data sources according to any one of claims 1-8.
CN202110224552.0A 2021-03-01 2021-03-01 Electrocardiogram data fusion method and device for different data sources Active CN112989971B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110224552.0A CN112989971B (en) 2021-03-01 2021-03-01 Electrocardiogram data fusion method and device for different data sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110224552.0A CN112989971B (en) 2021-03-01 2021-03-01 Electrocardiogram data fusion method and device for different data sources

Publications (2)

Publication Number Publication Date
CN112989971A true CN112989971A (en) 2021-06-18
CN112989971B CN112989971B (en) 2024-03-22

Family

ID=76351460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110224552.0A Active CN112989971B (en) 2021-03-01 2021-03-01 Electrocardiogram data fusion method and device for different data sources

Country Status (1)

Country Link
CN (1) CN112989971B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114190950A (en) * 2021-11-18 2022-03-18 电子科技大学 Intelligent electrocardiogram analysis method and electrocardiograph for containing noise label
CN118177830A (en) * 2024-05-15 2024-06-14 济南宝林信息技术有限公司 Heart function real-time monitoring data optimization processing method based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460795A (en) * 2018-12-17 2019-03-12 北京三快在线科技有限公司 Classifier training method, apparatus, electronic equipment and computer-readable medium
CN110032639A (en) * 2018-12-27 2019-07-19 中国银联股份有限公司 By the method, apparatus and storage medium of semantic text data and tag match
CN111191726A (en) * 2019-12-31 2020-05-22 浙江大学 Fault classification method based on weak supervised learning multi-layer perceptron
CN111700608A (en) * 2020-07-24 2020-09-25 武汉中旗生物医疗电子有限公司 Multi-classification method and device for electrocardiosignals
CN112232241A (en) * 2020-10-22 2021-01-15 华中科技大学 Pedestrian re-identification method and device, electronic equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460795A (en) * 2018-12-17 2019-03-12 北京三快在线科技有限公司 Classifier training method, apparatus, electronic equipment and computer-readable medium
CN110032639A (en) * 2018-12-27 2019-07-19 中国银联股份有限公司 By the method, apparatus and storage medium of semantic text data and tag match
CN111191726A (en) * 2019-12-31 2020-05-22 浙江大学 Fault classification method based on weak supervised learning multi-layer perceptron
CN111700608A (en) * 2020-07-24 2020-09-25 武汉中旗生物医疗电子有限公司 Multi-classification method and device for electrocardiosignals
CN112232241A (en) * 2020-10-22 2021-01-15 华中科技大学 Pedestrian re-identification method and device, electronic equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘伟涛 等: "一种使用未标记样本聚类信息的自训练方法", 计算机应用研究, vol. 27 *
张岩金 等: "一种基于符号关系图的快速符号数据聚类算法", 计算机科学, vol. 48 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114190950A (en) * 2021-11-18 2022-03-18 电子科技大学 Intelligent electrocardiogram analysis method and electrocardiograph for containing noise label
CN118177830A (en) * 2024-05-15 2024-06-14 济南宝林信息技术有限公司 Heart function real-time monitoring data optimization processing method based on artificial intelligence
CN118177830B (en) * 2024-05-15 2024-08-09 济南宝林信息技术有限公司 Heart function real-time monitoring data optimization processing method based on artificial intelligence

Also Published As

Publication number Publication date
CN112989971B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
Wang et al. A high-precision arrhythmia classification method based on dual fully connected neural network
Murugesan et al. Ecgnet: Deep network for arrhythmia classification
Li et al. SLC-GAN: An automated myocardial infarction detection model based on generative adversarial networks and convolutional neural networks with single-lead electrocardiogram synthesis
Wagh et al. Eeg-gcnn: Augmenting electroencephalogram-based neurological disease diagnosis using a domain-guided graph convolutional neural network
CN111990989A (en) Electrocardiosignal identification method based on generation countermeasure and convolution cyclic network
CN111261289A (en) Heart disease detection method based on artificial intelligence model
CN110664395A (en) Image processing method, image processing apparatus, and storage medium
US20220313172A1 (en) Prediabetes detection system and method based on combination of electrocardiogram and electroencephalogram information
Zhang et al. Semi-supervised learning for automatic atrial fibrillation detection in 24-hour Holter monitoring
CN112989971B (en) Electrocardiogram data fusion method and device for different data sources
Ullah et al. An End‐to‐End Cardiac Arrhythmia Recognition Method with an Effective DenseNet Model on Imbalanced Datasets Using ECG Signal
CN112690802A (en) Method, device, terminal and storage medium for detecting electrocardiosignals
CN113901893A (en) Electrocardiosignal identification and classification method based on multiple cascade deep neural network
CN111904411A (en) Multi-lead heartbeat signal classification method and device based on multi-scale feature extraction
Al Rahhal et al. Automatic premature ventricular contractions detection for multi-lead electrocardiogram signal
Li et al. An intelligent heartbeat classification system based on attributable features with AdaBoost+ Random forest algorithm
CN112052874A (en) Physiological data classification method and system based on generation countermeasure network
Wołk et al. Early and remote detection of possible heartbeat problems with convolutional neural networks and multipart interactive training
Berger et al. Generative adversarial networks in electrocardiogram synthesis: Recent developments and challenges
Wang et al. Multiscale residual network based on channel spatial attention mechanism for multilabel ECG classification
Allam et al. A deformable CNN architecture for predicting clinical acceptability of ECG signal
Zhang et al. Interpretable detection and location of myocardial infarction based on ventricular fusion rule features
Song et al. [Retracted] A Multimodel Fusion Method for Cardiovascular Disease Detection Using ECG
Geirnaert et al. Tensor-based ECG signal processing applied to atrial fibrillation detection
CN113052229B (en) Heart condition classification method and device based on electrocardiographic data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant