CN112989971B - Electrocardiogram data fusion method and device for different data sources - Google Patents

Electrocardiogram data fusion method and device for different data sources Download PDF

Info

Publication number
CN112989971B
CN112989971B CN202110224552.0A CN202110224552A CN112989971B CN 112989971 B CN112989971 B CN 112989971B CN 202110224552 A CN202110224552 A CN 202110224552A CN 112989971 B CN112989971 B CN 112989971B
Authority
CN
China
Prior art keywords
data
electrocardiographic
label
different
electrocardio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110224552.0A
Other languages
Chinese (zh)
Other versions
CN112989971A (en
Inventor
朱佳兵
朱涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zoncare Bio Medical Electronics Co ltd
Original Assignee
Wuhan Zoncare Bio Medical Electronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zoncare Bio Medical Electronics Co ltd filed Critical Wuhan Zoncare Bio Medical Electronics Co ltd
Priority to CN202110224552.0A priority Critical patent/CN112989971B/en
Publication of CN112989971A publication Critical patent/CN112989971A/en
Application granted granted Critical
Publication of CN112989971B publication Critical patent/CN112989971B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention relates to an electrocardiographic data fusion method of different data sources, which comprises the following steps: acquiring electrocardiographic data marked with an initial tag from different data sources, and establishing an electrocardiographic data set; preprocessing the central electric data of the electrocardiograph data set; clustering the preprocessed electrocardiograph data set through unsupervised deep clustering to obtain a plurality of clusters; respectively counting the probability of various labels in the electrocardio data initial labels of each cluster; and fusing the central electric data of each cluster based on the probability statistical result to obtain a fused electrocardio data set. The invention can fuse the electrocardiographic data of different data sources, remove noise labels and facilitate the training of subsequent models.

Description

Electrocardiogram data fusion method and device for different data sources
Technical Field
The invention relates to the technical field of electrocardiograph data classification labeling processing, in particular to an electrocardiograph data fusion method and device of different data sources and a computer storage medium.
Background
In the actual electrocardiographic data acquisition process, there may be multiple sources of electrocardiographic data that we acquire, such as from hospital a, hospital B, or hospital C. Electrocardiographic data from the same hospital may be from different departments, such as a hospital department, electrocardiographic room, chest pain center, physical examination center, emergency center, etc., and may be from a primary hospital, or 120 emergency procedures. Among these different sources of electrocardiographic data are both static 12 leads and dynamic 12 leads. These data often differ somewhat in sampling frequency, signal quality, diagnostic decision writing and habit. Taking the ventricular escape rhythm as an example, some older electrocardiographists may still follow previous habits, labeling them as "ventricular escape rhythms", while young physicians may label them as "ventricular spontaneous rhythms". At present, when dealing with the problem of inconsistent initial tags of 'similar' electrocardiograms caused by different personal analysis levels of doctors and knowledge inheritance of hospitals, two main processing methods are as follows:
1. the initial label in the electrocardiographic data is firstly removed, and then the label is randomly distributed to two electrocardiographists with abundant experience, so that the electrocardiographists can label the electrocardiographic data independently. If the marking results are inconsistent, the marking results are given to marking branch arbitrators for subsequent processing;
2. an electrocardiogram classification algorithm is firstly established, which can be a traditional algorithm or a neural network algorithm, and then the electrocardiographic data is predicted. Finally, comparing the predicted label output by the model with the initial label, and if the predicted label is the same as the initial label, reserving the initial label; otherwise, the corresponding electrocardiographic data is given to a doctor for secondary calibration.
In the first method, when the data volume is large, the labor and financial costs required for labeling are large. In the second method, the data used for training the model are multi-source, the model optimization is carried out by taking the initial label as a guide, and the initial label always has larger noise, namely labeling errors possibly caused by different hospital diagnosis labels, the personal level of doctors or the difficulty in interpretation of certain electrocardiographs, and the like, so that the generalization capability of the finally trained model is limited, and the screened electrocardiographic data needing the doctor to carry out secondary calibration is larger.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a method, an apparatus and a computer storage medium for fusing electrocardiographic data of different data sources, so as to solve the problem that when electrocardiographic data of a plurality of data sources are involved, the trained model has poor generalization capability due to noise labels, and thus the data volume requiring secondary calibration is large.
The invention provides an electrocardiographic data fusion method of different data sources, which comprises the following steps:
acquiring electrocardiographic data marked with an initial tag from different data sources, and establishing an electrocardiographic data set;
preprocessing the central electric data of the electrocardiograph data set;
clustering the preprocessed electrocardiograph data set through unsupervised deep clustering to obtain a plurality of clusters;
respectively counting the probability of various labels in the electrocardio data initial labels of each cluster;
and fusing the central electric data of each cluster based on the probability statistical result to obtain a fused electrocardio data set.
Further, electrocardiographic data marked with an initial label is collected from different data sources, and an electrocardiographic data set is built, specifically:
and acquiring electrocardiographic data of different categories from different data sources, wherein the number of electrocardiographic data of the same category selected from the different data sources is in the same set range, so as to obtain the electrocardiographic data set.
Further, preprocessing the central electric data of the electrocardiograph data set specifically comprises:
converting the electrocardio data into space vector data for extracting the space characteristics of the electrocardio data;
extracting second lead data in the electrocardio data, wherein the second lead data is used for extracting time domain characteristics of the electrocardio data;
and acquiring a spectrogram of the electrocardio data, and extracting frequency domain characteristics of the electrocardio data.
Further, clustering is carried out on the preprocessed electrocardiographic data set through unsupervised deep clustering, so as to obtain a plurality of clustering clusters, specifically:
selecting different neural network structures, and respectively extracting spatial features, time domain features and frequency domain features of the electrocardio data;
building an electrocardiogram network based on different neural network structures;
training the electrocardiogram network by adopting the electrocardiogram data set center electric data to obtain a classification model;
clustering an electrocardiographic data set based on electrocardiographic data features extracted by the electrocardiographic network;
labeling pseudo tags for the electrocardiographic data based on the clustering result;
comparing the pseudo tag with a prediction tag output by the classification model, and calculating a loss value;
performing back propagation training on the classification model based on the loss value;
judging whether a termination condition is met, if so, stopping training, outputting a clustering result to obtain a plurality of clusters, otherwise, training the electrocardiogram network by adopting the next electrocardiograph data in the electrocardiograph data set.
Further, different neural network structures are selected and respectively used for extracting the spatial characteristics, the time domain characteristics and the frequency domain characteristics of the electrocardio data, and the method specifically comprises the following steps:
selecting a CNN network for extracting the spatial characteristics of the electrocardio data;
selecting an LSTM network for extracting time domain characteristics of the electrocardio data;
and selecting a CNN network for extracting the frequency domain characteristics of the electrocardio data.
Furthermore, an electrocardiogram network is built based on different neural network structures, and the method specifically comprises the following steps:
and sequentially setting a first CNN network, an LSTM network, a second CNN network and a full connection layer to obtain the electrocardiogram network.
Further, the probability of each type of label in the electrocardiograph data initial label of each cluster is counted, specifically:
judging whether the initial label of the electrocardiograph data is a single label, if so, keeping the single label unchanged, otherwise, splitting the initial label into a plurality of single labels;
counting the occurrence times of various labels in the single label;
and calculating the occurrence probability of various labels according to the statistics times.
Further, based on the probability statistical result, the central electric data of each cluster are fused to obtain a fused electrocardiograph data set, which specifically comprises:
dividing electrocardiographic data with the occurrence probability of the label being larger than the upper limit value into high-quality label data;
dividing electrocardiograph data with the probability of occurrence of the label smaller than the lower limit value into noise label data, and recalibrating the noise label data;
dividing electrocardiographic data of the probability of occurrence of the label between an upper limit value and a lower limit value into clinical label data;
and combining the high-quality label data, the recalibrated noise label data and the clinical label data to obtain a fused electrocardiographic data set.
The invention also provides an electrocardio data fusion device with different data sources, which comprises a processor and a memory, wherein the memory is stored with a computer program, and the electrocardio data fusion method with different data sources is realized when the computer program is executed by the processor.
The invention also provides a computer storage medium on which a computer program is stored, which when being executed by a processor, implements the method for fusing electrocardiographic data of different data sources.
The beneficial effects are that: the invention provides a two-stage multi-source data fusion method based on a deep clustering and statistical method. Firstly, preprocessing electrocardiographic data from different sources, and obtaining a clustering result through an unsupervised deep clustering training model; and in the second stage, counting possible labels of each clustered cluster obtained by clustering according to the initial labels, and finally screening out data with high label error probability according to the counting result, and calibrating the doctor for the second time. When the method is used for training the model, because the initial label containing the noise label is not used as a training basis, only the characteristics of the electrocardiographic data are utilized, and the negative influence of the noise label on the training model is avoided.
Drawings
FIG. 1 is a flow chart of a method for a first embodiment of an electrocardiographic data fusion method of different data sources provided by the present invention;
FIG. 2 is a superimposed view of different leads of the same piece of electrocardiographic data according to the present invention;
FIG. 3 is a schematic diagram of Z-axis data of space vector data obtained by converting center electrical data according to the present invention;
FIG. 4a is a schematic diagram of an 8-neighborhood of neural network convolution in accordance with the present invention;
FIG. 4b is a schematic representation of cross convolution in accordance with the present invention;
FIG. 5 is a schematic diagram of a hub electrogram network according to the present invention;
fig. 6 is a schematic diagram of the training process of the electrocardiogram network of the present invention.
Detailed Description
Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and together with the description serve to explain the principles of the invention, and are not intended to limit the scope of the invention.
Example 1
As shown in fig. 1, embodiment 1 of the present invention provides an electrocardiographic data fusion method of different data sources, including the following steps:
s1, acquiring electrocardiographic data marked with an initial tag from different data sources, and establishing an electrocardiographic data set;
s2, preprocessing the central electric data of the electrocardiograph data set;
s3, clustering the preprocessed electrocardiograph data set through unsupervised deep clustering to obtain a plurality of clustering clusters;
s4, respectively counting the probability of various labels in the electrocardio data initial labels of each cluster;
and S5, fusing the central electric data of each cluster based on the probability statistical result to obtain a fused electrocardiographic data set.
The embodiment provides a two-stage multi-source data fusion method based on a deep clustering and statistical method. In the first stage, after preprocessing electrocardiographic data from different sources, an unsupervised deep clustering training model is used to obtain a clustering result, then possible labels of each clustering cluster obtained by clustering are counted according to initial labels, and finally data with high label error probability are screened according to the counting result, so that a doctor is calibrated for the second time. In the embodiment, when the model is trained, because the initial label containing the noise label is not used as a training basis, only the characteristics of the electrocardiographic data are utilized, and the negative influence of the noise label on the training model is avoided; while in the second stage each specific class of labels is statistically determined, which to some extent eliminates random noise in the tag.
Preferably, electrocardiographic data marked with an initial tag is collected from different data sources, and an electrocardiographic data set is built, specifically:
and acquiring electrocardiographic data of different categories from different data sources, wherein the number of electrocardiographic data of the same category selected from the different data sources is in the same set range, so as to obtain the electrocardiographic data set.
In this embodiment there are M different data sources of electrocardiographic data, where we first determine the class of electrocardiography to process, such as sinus rhythm, sinus bradycardia/tachycardia/arrhythmia, atrial premature beat/tachycardia, atrial flutter, atrial fibrillation, junctional escape, ventricular escape, accelerated escape heart rate, ventricular premature beat/tachycardia, supraventricular tachycardia, left bundle branch block, indoor block, right bundle branch block, etc. And then, respectively selecting various electrocardiographic data with approximately the same quantity from each different data source to form an initial electrocardiographic data set for analysis, and recording corresponding initial labels. After the electrocardiographic data set is established, data preprocessing operation is carried out on the electrocardiographic data set.
Preferably, the preprocessing of the electrocardiograph data set center electrical data specifically includes:
converting the electrocardio data into space vector data for extracting the space characteristics of the electrocardio data;
extracting second lead data in the electrocardio data, wherein the second lead data is used for extracting time domain characteristics of the electrocardio data;
and acquiring a spectrogram of the electrocardio data, and extracting frequency domain characteristics of the electrocardio data.
In order to facilitate the extraction of the characteristics of the electrocardiographic data in the subsequent model training process, the electrocardiographic data is preprocessed. As shown in fig. 6 in particular, this embodiment is in the data preprocessing stage, the original containing multiple leads (I leads, II leads, & gtI & gtII & lt, & gtI & lt, & gt V6 leads) into three types of data: 1. space vector data which can better embody space characteristics is established under an orthogonal coordinate system and comprises X, Y, Z triaxial data; 2. the II lead data of most time domain information is reserved, and the II lead is the second lead; 3. the spectrograms of the II lead and V5 lead data which enrich the frequency domain information are covered.
Converting the electrocardiographic data into space vector data, specifically:
X=-0.172*V1-0.074*V2+0.122*V3+0.231*V4+0.239*V5+0.194*V6+0.156*I-0.010*II;
Y=0.057*V1-0.019*V2-0.106*V3-0.022*V4+0.041*V5+0.048*V6-0.227*I+0.887*II;
Z=-0.229*V1-0.310*V2-0.246*V3-0.063*V4+0.055*V5+0.108*V6+0.022*I+0.102*II;
wherein X, Y, Z is the three dimensions of the space vector data, V1, V2, V3, V4, V5, V6, I, II represent the voltage values of the V1, V2, V3, V4, V5, V6, I, II leads of the electrocardiographic data, respectively.
FIG. 2 is a superimposed graph of the same piece of electrocardiographic data, different leads (I, II, V1, V2, V3, V4, V5, V6). As can be seen intuitively in fig. 2: for different leads of the same piece of electrocardiographic data, some leads are highly correlated; for the same lead, different time data exhibit time series correlation. There are a great deal of researches that can better complete the diagnosis of the electrocardiogram rhythm by only using the II lead data in the MIT-BIH data set, so we select the II lead to extract the rhythm characteristics, namely the domain characteristics.
Since only II and V5 leads are generally selected for spectral electrocardiographic analysis in the industry, we also use this convention to extract frequency domain features from the data of both leads. The spectrogram of the electrocardiographic data is obtained, and the spectrogram is specifically:
dividing the electrocardiographic data into a plurality of segments at equal intervals, and taking 1-5 seconds as one segment generally;
performing fast Fourier transform on each segment to obtain a spectrogram of each segment;
normalizing the spectrograms of the fragments of the same lead:
wherein G is f Normalized spectrograms, EW, for the ith segment of lead data i Representing the ith fragment of the lead data, FFT () represents the fast Fourier transform, max () represents the maximum value, EW j J=1, 2, …, N representing the j-th segment of the lead data, the total number of segments of the lead;
the window function selected by the fast fourier transform is a Hamming window:
wherein w (N) is a window function value, N is a data value in the segment, and N is the total number of segments of the leads;
splicing the spectrograms of the fragments of the same lead to obtain the spectrograms of the leads;
the spectrograms of the leads are set to the same dimension.
Since the energy of the electrocardiographic data is mainly concentrated in the low frequency part of the range of 0-25Hz, we only select the first 25% of the spectral coefficients to reduce the dimension of the input data. The segment spectrograms of the leads are directly spliced, so that the spectrogram of the whole lead is obtained. In this embodiment, for the convenience of calculation, the spectrogram of each lead is set to the same dimension: 125*200.
After the data pretreatment is finished, an electrocardiogram model is built for clustering training.
Preferably, the preprocessed electrocardiographic data set is clustered through unsupervised deep clustering to obtain a plurality of clusters, specifically:
selecting different neural network structures, and respectively extracting spatial features, time domain features and frequency domain features of the electrocardio data;
building an electrocardiogram network based on different neural network structures;
training the electrocardiogram network by adopting the electrocardiogram data set center electric data to obtain a classification model;
clustering an electrocardiographic data set based on electrocardiographic data features extracted by the electrocardiographic network;
labeling pseudo tags for the electrocardiographic data based on the clustering result;
comparing the pseudo tag with a prediction tag output by the classification model, and calculating a loss value;
performing back propagation training on the classification model based on the loss value;
judging whether a termination condition is met, if so, stopping training, outputting a clustering result to obtain a plurality of clusters, otherwise, training the electrocardiogram network by adopting the next electrocardiograph data in the electrocardiograph data set.
Different types of classifiers have different feature expression capacities on the same data, so that three neural networks are respectively built from three angles of time domain, frequency domain and space information, and the three neural networks respectively extract different features of electrocardiographic data. Meanwhile, deep clustering is introduced into an electrocardiographic data classification task, so that negative effects caused by high noise labels in multi-source data are effectively avoided. Specifically, as shown in fig. 6, after preprocessing the electrocardiographic data, firstly inputting an electrocardiographic network built by three different neural networks, wherein the electrocardiographic network in this embodiment is a res net electrocardiographic network built by a first CNN network, an LSTM network and a second CNN network, and the first CNN network is specifically an FCN network; combining the characteristics obtained after the training of the three neural networks; then, feature fusion is carried out through two full-connection layers (namely an FC layer in FIG. 6), and the fused features are sent to a clustering network for clustering, wherein the clustering mode can be k-means clustering, k is equal to the number of electrocardiographic data categories, hierarchical clustering or density clustering and the like; the clustering result is used as a pseudo tag of the electrocardiographic data, and the pseudo tag is calculated with a prediction tag given by a classification model to obtain a Loss value (namely a Loss value in fig. 6); then, carrying out back propagation according to gradient descent, and correcting the classification model; and after the classification model is trained until the loss value is smaller than a set value or reaches the designated training times, extracting a clustering result to be used for the next analysis.
Preferably, different neural network structures are selected and respectively used for extracting spatial features, time domain features and frequency domain features of the electrocardio data, and the method specifically comprises the following steps:
selecting a CNN network for extracting the spatial characteristics of the electrocardio data;
selecting an LSTM network for extracting time domain characteristics of the electrocardio data;
and selecting a CNN network for extracting the frequency domain characteristics of the electrocardio data.
The CNN network is suitable for extracting space features, and the LSTM network can effectively extract time domain features. Therefore, three different networks are used to extract the spatial feature, the time domain feature and the frequency domain feature of the electrocardiographic data respectively.
1. For the space vector data, a CNN network is mainly used to perform space feature extraction, for example DCN, FCN, resNet, alexNet, VGG, and is denoted as a first CNN network, where in this embodiment, the first CNN network is an FCN network.
2. For the II lead data, the LSTM network is mainly used to extract the time domain features, denoted as LSTM network.
3. For the extraction of spectral features we also use a CNN network, denoted as second CNN network.
Preferably, an electrocardiogram network is built based on different neural network structures, specifically:
and sequentially setting a first CNN network, an LSTM network, a second CNN network and a full connection layer to obtain the electrocardiogram network.
The space vector data after the conversion of the electrocardiographic data can still be regarded as sequence time data, and particularly as shown in fig. 3, fig. 3 is a Z-axis illustration of the space vector data after the conversion of the electrocardiographic data. Therefore, for the first CNN network, the present embodiment adopts a cross convolution manner more suitable for such data. Convolution in a neural network refers to a structure of 8 neighbors. As shown in fig. 4 a. In the cross convolution method, that is, the value of four opposite angles is always kept to be 0 when the convolution calculation is performed, as shown in fig. 4 b. The 8 neighborhood structure used for calculating convolution in the network A is replaced by the 4 neighborhood structure in the cross shape, so that the interference of information is effectively reduced.
For the second CNN network, the convolution here still uses a conventional neural network convolution, i.e. an 8-neighborhood structure, because the spectral data used to train the network has itself been processed like a picture.
Based on three different neural networks, this embodiment builds a 34-layer ResNet electrocardiographic network, the structure of which is shown in FIG. 5. As can be seen from fig. 5, the electrocardiographic data first enters the input layer; then entering a first CNN network, wherein the first CNN network comprises three layers of convolution layers, a batch normalization layer and a ReLU layer; then entering an LSTM network, wherein the LSTM network comprises three layers of convolution layers, a batch normalization layer, a ReLU layer, a Dropout layer, three layers of convolution layers and a maximum pooling layer; then entering a second CNN network, wherein the second CNN network comprises a batch normalization layer, a ReLU layer, three layers of convolution layers, a batch normalization layer, a ReLU layer, a Dropout layer, three layers of batch normalization layers and a maximum pooling layer; and finally, entering a full-connection layer, wherein the full-connection layer comprises a batch normalization layer, a ReLU layer and a Dense layer, and outputting the batch normalization layer, the ReLU layer and the Dense layer through an output layer.
Preferably, the probability of each type of label in the electrocardiograph data initial label of each cluster is counted, specifically:
judging whether the initial label of the electrocardiograph data is a single label, if so, keeping the single label unchanged, otherwise, splitting the initial label into a plurality of single labels;
counting the occurrence times of various labels in the single label;
and calculating the occurrence probability of various labels according to the statistics times.
Each clustered cluster obtained after clustering is analyzed respectively, and the method comprises the following steps:
1. counting unique single labels of initial labels of electrocardiographic data in a cluster, and if multiple labels exist, disassembling the multiple labels into single labels, wherein the single labels refer to single types of labels. And counting the occurrence times of various single labels, the counting result of the embodiment is shown in the following table:
TABLE 1 statistics of the number of occurrences of single tags
2. Ordered from big to small by the number of times a single tag appears. And calculates the corresponding occurrence probability:
wherein p is i Represents the occurrence probability of the ith class of single labels, n i The number of occurrences of the i-th type of single label is represented, N being the number of types of single labels, 17 in this embodiment;
the occurrence probability of each type of single label counted in this embodiment is shown in the following table:
TABLE 2 statistical results of probability of occurrence of single tags
Preferably, the central electric data of each cluster are fused based on the probability statistical result to obtain a fused electrocardiograph data set, which specifically comprises:
dividing electrocardiographic data with the occurrence probability of the label being larger than the upper limit value into high-quality label data;
dividing electrocardiograph data with the probability of occurrence of the label smaller than the lower limit value into noise label data, and recalibrating the noise label data;
dividing electrocardiographic data of the probability of occurrence of the label between an upper limit value and a lower limit value into clinical label data;
and combining the high-quality label data, the recalibrated noise label data and the clinical label data to obtain a fused electrocardiographic data set.
Specifically, the fusion procedure is as follows:
1. the electrocardiographic data with probability ranking of 30% is selected from the initial labels, because the probability of the electrocardiographic data single labels in the class is low, and if a piece of electrocardiographic data contains single labels with low probability, the piece of electrocardiographic data can be considered to be label noise data. And giving the electrocardiographic data to doctors for secondary calibration, and storing the electrocardiographic data into a database, and remarking the electrocardiographic data as expert label data.
2. The electrocardiographic data of the single label only comprising the top 40% of the initial label is selected. The initial labels of these electrocardiographic data may be considered to be highly likely to be correct. These electrocardiographic data are stored in a database and are annotated as high quality clinical label data.
3. And the rest of the electrocardio data are stored in a database and remarked as clinical label data.
By the method, the screened electrocardio data needing secondary calibration (the label has high possibility of being wrong) is about 9 percent, and the probability of the secondary calibration data is low.
Example 2
Embodiment 2 of the present invention provides an electrocardiographic data fusion apparatus of different data sources, including a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the electrocardiographic data fusion method of different data sources provided in embodiment 1 is implemented.
The electrocardiograph data fusion device with different data sources provided by the embodiment of the invention is used for realizing electrocardiograph data fusion methods with different data sources, so that the electrocardiograph data fusion method with different data sources has the technical effects that the electrocardiograph data fusion device with different data sources has, and the electrocardiograph data fusion device with different data sources is not repeated herein.
Example 3
Embodiment 3 of the present invention provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method for electrocardiographic data fusion of different data sources provided in embodiment 1.
The computer storage medium provided by the embodiment of the invention is used for realizing the electrocardio data fusion method of different data sources, so that the electrocardio data fusion method of different data sources has the technical effects, and the computer storage medium is also provided and is not described herein.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

Claims (8)

1. An electrocardiographic data fusion method of different data sources is characterized by comprising the following steps:
acquiring electrocardiographic data marked with an initial tag from different data sources, and establishing an electrocardiographic data set;
preprocessing the central electric data of the electrocardiograph data set;
clustering the preprocessed electrocardiograph data set through unsupervised deep clustering to obtain a plurality of clusters;
respectively counting the probability of various labels in the electrocardio data initial labels of each cluster;
fusing the central electric data of each cluster based on the probability statistical result to obtain a fused electrocardio data set;
selecting different neural network structures, and respectively extracting spatial features, time domain features and frequency domain features of the electrocardio data;
building an electrocardiogram network based on different neural network structures;
training the electrocardiogram network by adopting the electrocardiogram data set center electric data to obtain a classification model;
clustering an electrocardiographic data set based on electrocardiographic data features extracted by the electrocardiographic network;
labeling pseudo tags for the electrocardiographic data based on the clustering result;
comparing the pseudo tag with a prediction tag output by the classification model, and calculating a loss value;
performing back propagation training on the classification model based on the loss value;
judging whether a termination condition is met, if so, stopping training, outputting a clustering result to obtain a plurality of clusters, otherwise, training the electrocardiogram network by adopting the next electrocardiograph data in the electrocardiograph data set;
based on the probability statistical result, the central electric data of each cluster are fused to obtain a fused electrocardio data set, which is specifically as follows:
dividing electrocardiographic data with the occurrence probability of the label being larger than the upper limit value into high-quality label data;
dividing electrocardiograph data with the probability of occurrence of the label smaller than the lower limit value into noise label data, and recalibrating the noise label data;
dividing electrocardiographic data of the probability of occurrence of the label between an upper limit value and a lower limit value into clinical label data;
and combining the high-quality label data, the recalibrated noise label data and the clinical label data to obtain a fused electrocardiographic data set.
2. The method for fusing the electrocardiographic data of different data sources according to claim 1, wherein electrocardiographic data with initial labels already marked are collected from the different data sources, and an electrocardiographic data set is built, specifically:
and acquiring electrocardiographic data of different categories from different data sources, wherein the number of electrocardiographic data of the same category selected from the different data sources is in the same set range, so as to obtain the electrocardiographic data set.
3. The method for fusing the electrocardiographic data of different data sources according to claim 1, wherein preprocessing is performed on the electrocardiographic data set center electrical data, specifically:
converting the electrocardio data into space vector data for extracting the space characteristics of the electrocardio data;
extracting second lead data in the electrocardio data, wherein the second lead data is used for extracting time domain characteristics of the electrocardio data;
and acquiring a spectrogram of the electrocardio data, and extracting frequency domain characteristics of the electrocardio data.
4. The method for fusing electrocardiographic data of different data sources according to claim 1, wherein different neural network structures are selected and respectively used for extracting spatial features, time domain features and frequency domain features of electrocardiographic data, specifically:
selecting a CNN network for extracting the spatial characteristics of the electrocardio data;
selecting an LSTM network for extracting time domain characteristics of the electrocardio data;
and selecting a CNN network for extracting the frequency domain characteristics of the electrocardio data.
5. The method for fusing electrocardiographic data of different data sources according to claim 4, wherein the electrocardiographic network is built based on different neural network structures, specifically:
and sequentially setting a first CNN network, an LSTM network, a second CNN network and a full connection layer to obtain the electrocardiogram network.
6. The method for fusing the electrocardiographic data of different data sources according to claim 1, wherein the statistics of the probability of each type of label in the electrocardiographic data initial label of each cluster is specifically:
judging whether the initial label of the electrocardiograph data is a single label, if so, keeping the single label unchanged, otherwise, splitting the initial label into a plurality of single labels;
counting the occurrence times of various labels in the single label;
and calculating the occurrence probability of various labels according to the statistics times.
7. An electrocardiographic data fusion apparatus of different data sources, comprising a processor and a memory, wherein the memory has a computer program stored thereon, which when executed by the processor, implements the electrocardiographic data fusion method of different data sources according to any one of claims 1-6.
8. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method of electrocardiographic data fusion of different data sources according to any one of claims 1-6.
CN202110224552.0A 2021-03-01 2021-03-01 Electrocardiogram data fusion method and device for different data sources Active CN112989971B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110224552.0A CN112989971B (en) 2021-03-01 2021-03-01 Electrocardiogram data fusion method and device for different data sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110224552.0A CN112989971B (en) 2021-03-01 2021-03-01 Electrocardiogram data fusion method and device for different data sources

Publications (2)

Publication Number Publication Date
CN112989971A CN112989971A (en) 2021-06-18
CN112989971B true CN112989971B (en) 2024-03-22

Family

ID=76351460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110224552.0A Active CN112989971B (en) 2021-03-01 2021-03-01 Electrocardiogram data fusion method and device for different data sources

Country Status (1)

Country Link
CN (1) CN112989971B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114190950B (en) * 2021-11-18 2023-07-28 电子科技大学 Electrocardiogram intelligent analysis method for noise-containing label and electrocardiograph

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460795A (en) * 2018-12-17 2019-03-12 北京三快在线科技有限公司 Classifier training method, apparatus, electronic equipment and computer-readable medium
CN110032639A (en) * 2018-12-27 2019-07-19 中国银联股份有限公司 By the method, apparatus and storage medium of semantic text data and tag match
CN111191726A (en) * 2019-12-31 2020-05-22 浙江大学 Fault classification method based on weak supervised learning multi-layer perceptron
CN111700608A (en) * 2020-07-24 2020-09-25 武汉中旗生物医疗电子有限公司 Multi-classification method and device for electrocardiosignals
CN112232241A (en) * 2020-10-22 2021-01-15 华中科技大学 Pedestrian re-identification method and device, electronic equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460795A (en) * 2018-12-17 2019-03-12 北京三快在线科技有限公司 Classifier training method, apparatus, electronic equipment and computer-readable medium
CN110032639A (en) * 2018-12-27 2019-07-19 中国银联股份有限公司 By the method, apparatus and storage medium of semantic text data and tag match
CN111191726A (en) * 2019-12-31 2020-05-22 浙江大学 Fault classification method based on weak supervised learning multi-layer perceptron
CN111700608A (en) * 2020-07-24 2020-09-25 武汉中旗生物医疗电子有限公司 Multi-classification method and device for electrocardiosignals
CN112232241A (en) * 2020-10-22 2021-01-15 华中科技大学 Pedestrian re-identification method and device, electronic equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种使用未标记样本聚类信息的自训练方法;刘伟涛 等;计算机应用研究;第27卷;全文 *
一种基于符号关系图的快速符号数据聚类算法;张岩金 等;计算机科学;第48卷;全文 *

Also Published As

Publication number Publication date
CN112989971A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
Shi et al. A hierarchical method based on weighted extreme gradient boosting in ECG heartbeat classification
Wang et al. A high-precision arrhythmia classification method based on dual fully connected neural network
US10869610B2 (en) System and method for identifying cardiac arrhythmias with deep neural networks
US20200337580A1 (en) Time series data learning and analysis method using artificial intelligence
Wagh et al. Eeg-gcnn: Augmenting electroencephalogram-based neurological disease diagnosis using a domain-guided graph convolutional neural network
JP6986724B2 (en) ECG interference identification method based on artificial intelligence
Murugesan et al. Ecgnet: Deep network for arrhythmia classification
CN111990989A (en) Electrocardiosignal identification method based on generation countermeasure and convolution cyclic network
US20200250554A1 (en) Method and storage medium for predicting the dosage based on human physiological parameters
CN111956212B (en) Inter-group atrial fibrillation recognition method based on frequency domain filtering-multi-mode deep neural network
US11062792B2 (en) Discovering genomes to use in machine learning techniques
WO2021071688A1 (en) Systems and methods for reduced lead electrocardiogram diagnosis using deep neural networks and rule-based systems
CN110974214A (en) Automatic electrocardiogram classification method, system and equipment based on deep learning
WO2019161611A1 (en) Ecg information processing method and ecg workstation
CN113901893B (en) Electrocardiosignal identification and classification method based on multi-cascade deep neural network
US20220313172A1 (en) Prediabetes detection system and method based on combination of electrocardiogram and electroencephalogram information
Zhang et al. Semi-supervised learning for automatic atrial fibrillation detection in 24-hour Holter monitoring
Wołk et al. Early and remote detection of possible heartbeat problems with convolutional neural networks and multipart interactive training
Al Rahhal et al. Automatic premature ventricular contractions detection for multi-lead electrocardiogram signal
Li et al. An intelligent heartbeat classification system based on attributable features with AdaBoost+ Random forest algorithm
CN112989971B (en) Electrocardiogram data fusion method and device for different data sources
Wang et al. Multiscale residual network based on channel spatial attention mechanism for multilabel ECG classification
Meqdad et al. Meta structural learning algorithm with interpretable convolutional neural networks for arrhythmia detection of multisession ECG
CN112690802A (en) Method, device, terminal and storage medium for detecting electrocardiosignals
Lu et al. Identification of arrhythmia by using a decision tree and gated network fusion model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant