CN113679393B - ECG data feature generation model based on contrast predictive coding - Google Patents

ECG data feature generation model based on contrast predictive coding Download PDF

Info

Publication number
CN113679393B
CN113679393B CN202110978438.7A CN202110978438A CN113679393B CN 113679393 B CN113679393 B CN 113679393B CN 202110978438 A CN202110978438 A CN 202110978438A CN 113679393 B CN113679393 B CN 113679393B
Authority
CN
China
Prior art keywords
data
model
training
trained
ecg
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110978438.7A
Other languages
Chinese (zh)
Other versions
CN113679393A (en
Inventor
孙乐
任超旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202110978438.7A priority Critical patent/CN113679393B/en
Publication of CN113679393A publication Critical patent/CN113679393A/en
Application granted granted Critical
Publication of CN113679393B publication Critical patent/CN113679393B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/318Heart-related electrical modalities, e.g. electrocardiography [ECG]
    • A61B5/346Analysis of electrocardiograms
    • A61B5/349Detecting specific parameters of the electrocardiograph cycle
    • A61B5/35Detecting specific parameters of the electrocardiograph cycle by template matching
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/318Heart-related electrical modalities, e.g. electrocardiography [ECG]
    • A61B5/346Analysis of electrocardiograms
    • A61B5/349Detecting specific parameters of the electrocardiograph cycle
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/318Heart-related electrical modalities, e.g. electrocardiography [ECG]
    • A61B5/346Analysis of electrocardiograms
    • A61B5/349Detecting specific parameters of the electrocardiograph cycle
    • A61B5/364Detecting abnormal ECG interval, e.g. extrasystoles, ectopic heartbeats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Cardiology (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Veterinary Medicine (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Surgery (AREA)
  • Medical Informatics (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Pathology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses an ECG data characteristic generation model based on contrast prediction coding, which comprises the following steps: firstly, ECG training data are divided, positive sample pairs and negative sample pairs are seen transversely, the positive sample pairs are data of the same category, the negative sample pairs are data of different categories, training data and data to be trained are seen longitudinally, then the training data and the data to be trained are encoded through an encoder, a result obtained by encoding the training data is put into an autoregressive model to obtain Context information Context, the Context enters a prediction model to obtain future multi-step prediction values, and finally the prediction values and the encoded values of the data to be trained are calculated together to obtain a dot product to obtain a loss value. The invention can expand the data with insufficient sample number and improve the generalization capability of the downstream task.

Description

ECG data feature generation model based on contrast predictive coding
Technical Field
The invention belongs to the technical field of computer software, and particularly relates to an ECG data characteristic generation model based on contrast prediction coding.
Background
The contrast prediction coding is one of self-supervision learning, and the main methods of self-supervision learning at present are divided into three types, namely context-based, time sequence-based and contrast-based. Self-supervised learning based on contrast constructs a representation by learning to encode similarity or dissimilarity of two things, the performance of which is very strong. The self-supervised learning algorithm no longer relies on labels, but rather generates labels from the data by revealing the relationships between the parts of the data. In addition, in the current deep learning application, the problem of data is ubiquitous, the ECG is used as one of medical data, and the problems of unbalanced sample distribution, no labels and the like exist.
Electrocardiogram (ECG) has good effect on diagnosis and analysis of various arrhythmias and conduction blocks, and has great significance on diagnosis of coronary heart disease. The electrocardiogram mainly reflects the electrical activity of heart activation, and myocardial damage, insufficient blood supply, medicine and electrolyte disturbance can cause certain electrocardiogram changes, and a reliable method for diagnosing myocardial infarction when the characteristic electrocardiogram changes. At present, various models aiming at ECG data classification face a problem that the distribution of samples in the ECG data is extremely unbalanced, the proportion of normal samples to heart beat samples is seriously unbalanced, a supervision learning network cannot obtain enough data for training, and the performance of the model cannot be guaranteed. The high-dimensional characteristics consistent with the original categories of the ECG data can be generated through contrast prediction coding, the number of samples is expanded, meanwhile, the score between the same samples is higher through a scoring function, the score between different samples is lower, the categories of the samples are further distinguished, the method can be used for downstream tasks, such as classification tasks, model overfitting can be greatly prevented, the convergence speed of a downstream training model is improved, and the classification accuracy of the model is improved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an ECG data characteristic generation model based on contrast prediction coding, which introduces a self-supervision learning model to contrast prediction coding to predict high-dimensional characteristics of the same category as the original ECG data, so that the sample set is increased, the manual labeling cost is reduced, and meanwhile, the classification is carried out by matching with a downstream classification task, thereby being convenient for other classification models to reduce overfitting and improve the classification accuracy.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a model is generated based on the ECG data characteristics of the comparative predictive coding, comprising the steps of:
s1, adopting a data set and preprocessing;
s2, dividing ECG training data into positive sample pairs and negative sample pairs, wherein the positive sample pairs are data of the same category, and the negative sample pairs are data of different categories; respectively dividing training data and data to be trained in the positive sample pair and the negative sample pair;
s3, constructing a contrast predictive coding CPC model, and inputting training data and data to be trained;
coding the training data and the data to be trained through an encoder, and then putting a result obtained by coding the training data into an autoregressive model to obtain Context information Context, wherein the Context enters a prediction model to obtain a future multi-step prediction value;
s4, calculating dot products together with the predicted value and the value of the data to be trained after encoding to obtain a loss value;
s5, training a contrast prediction coding CPC model;
s6, applying the trained CPC model to a downstream classification task.
In order to optimize the technical scheme, the specific measures adopted further comprise:
further, the preprocessing process of the data set in s1 includes:
s11, acquiring heart beats by adopting R peak positions marked by the data set;
s12, resampling the heart beat;
s13, filtering by using wavelet transformation;
s14, re-labeling the data set, disturbing and rearranging the data set, dividing the data set into a training set and a verification set, and dividing the training set into two parts, namely training data and data to be trained; and simultaneously constructing a positive sample pair and a negative sample pair.
Further, the autoregressive model construction process in s3 includes:
the autoregressive model GRU is used to fuse the history information, with an output dimension of 256, returning only the output of the last unit.
Further, the process of building the prediction model includes:
full connection layer output dimension 10, using a linear activation function; because the four fully connected layers are placed in a list, the Lambda layer is used to splice the four fully connected layers together laterally to form a network.
Further, in s4, the loss value obtained by dot product is within [0,1] range by using a sigmoid function, and is used as the output of the contrast prediction coding CPC model.
Further, training the CPC model is performed as follows:
s51, initializing model parameters;
s52, inputting the data into a model for training;
and s53, saving the model, and drawing the training set and verifying the accuracy of the set.
Further, s6 includes the steps of:
s61, dividing the training data, and dividing the data set into 5 parts in order to keep consistent with the trained CPC model.
S62, constructing a classification model, wherein three identical training data are used by the classification model; each training data pass through an encoder part of CPC, a one-dimensional convolution layer, a relu activation layer, a one-dimensional maximum pooling layer, a one-dimensional convolution layer, a relu activation layer and a one-dimensional maximum pooling layer; splicing the results obtained by the three data, then connecting a flat layer and two full-connection layers, and finally obtaining a classification result through the full-connection layer with an activation function of softmax;
s63, training a classification model, using a penalty function of categorical cross sentropy, using rmsprop with batch size set to 64, training 10 epochs.
The beneficial effects of the invention are as follows:
(1) The invention is suitable for ECG data under the condition of unbalanced data;
the invention is applicable to less frequent arrhythmia data with less ECG data collected. Aiming at the condition that the collected data is less, the CPC can solve the problem caused by insufficient data quantity by maximizing mutual information of the CPC, so that the data with insufficient sample quantity can be expanded, and the generalization capability of downstream tasks is improved.
(2) The invention improves the accuracy of generating the same class of ECG data features;
effective features are extracted through the encoder, unnecessary noise is removed, the features are more obvious, and subsequent processing is facilitated. The contrast prediction coding utilizes own mutual information, improves the self prediction capability, and strengthens the characteristic extraction capability of the encoder at the same time, so that the characteristic extraction effect is quite good. The model achieves a fairly good effect on the MIT-BIT arrhythmia database.
(3) The training speed of a downstream ECG classification model is accelerated;
the ECG data is coded by the coder through the contrast prediction coding, and different types of data are distinguished, so that the convergence speed of the model can be improved and the model training can be accelerated when the classification model is trained.
Drawings
FIG. 1 is a flowchart of the application of the comparative predictive coding of the present invention to ECG.
Fig. 2 is a schematic diagram of a model structure of an encoder of the present invention.
Fig. 3 is a schematic structural diagram of a prediction model of the present invention.
Fig. 4 is a schematic diagram of the relationship between the training data and the data to be trained, and the positive sample pair and the negative sample pair.
FIG. 5 is a graph of accuracy records of training and validation sets of the present invention.
Fig. 6 is a schematic structural diagram of a classification model according to an embodiment of the invention.
FIG. 7 is a schematic diagram of model accuracy in the classification model training process according to an embodiment of the present invention.
Detailed Description
The invention will now be described in further detail with reference to the accompanying drawings.
It should be noted that the terms like "upper", "lower", "left", "right", "front", "rear", and the like are also used for descriptive purposes only and are not intended to limit the scope of the invention in which the invention may be practiced, but rather the relative relationship of the terms may be altered or modified without materially altering the teachings of the invention.
The invention provides an ECG data characteristic generation model based on contrast prediction coding, which introduces a self-supervision learning model to contrast prediction coding to predict the characteristics of the same category as the original ECG data, so that the sample set is increased, the manual labeling cost is reduced, and meanwhile, the invention is convenient for other classification models to reduce overfitting and improve the classification accuracy. The present invention aims to solve the following problems:
1) The ECG data is unbalanced in number. The data is used as a model training material, and the size of the data often determines the performance of the model. There are various models for classifying ECG today, however, due to the characteristics of ECG data itself, some heart rate types have very little data, and the computer cannot be trained with enough data, which results in various problems for classifying models.
2) Labeling costs are high. In the full-supervision learning training process, a large amount of manually labeled data sets are needed, however, labeling data types consumes a large amount of resources such as manpower, material resources and the like. For ECG specific data, labeling requires a certain expertise, and increases the threshold for manual labeling, making labeling large-scale, more complex data sets increasingly difficult.
3) The prediction accuracy is not high. The final goal of predicting ECG data features using deep learning models is to achieve fast and accurate predictions. Most of the existing models are used for prediction based on the same probability distribution, and feature differences among different types of data are not mined, so that the prediction accuracy is not high.
4) And the training speed of the downstream tasks is improved. The difference between the same samples after coding is smaller and the difference between different samples is larger and larger by comparing the predictive coding, so that different types of data are distinguished before formal downstream task training, and the convergence speed of downstream tasks can be increased.
The invention mainly comprises the following steps:
as shown in fig. 1, fig. 1 illustrates the working steps of applying the comparative predictive coding to ECG data, firstly dividing ECG training data, looking up the ECG training data in the horizontal direction as a positive sample pair and a negative sample pair, the positive sample pair being the same class of data, the negative sample pair being different classes of data, looking up the ECG training data and the data to be trained in the vertical direction, then coding both the ECG training data and the data to be trained by an encoder, then putting the result obtained by coding the ECG training data into an autoregressive model to obtain Context information Context, entering the Context into the predictive model to obtain future multi-step predictive values, and finally calculating dot products of the predictive values and the coded values of the data to be trained to obtain loss values. The update gradient gradually reduces the loss value by back-propagation. The specific implementation process is as follows:
step 1: data preprocessing
1.1 collecting cardiac beat with R peak position marked by data set
1.2 resampling of heart beats
1.3 Filtering Using wavelet transforms
1.4 re-labeling the data set and disturbing and rearranging the data set, dividing the data set into a training set and a verification set, and dividing the training set into two parts, namely training data and data to be trained. And simultaneously constructing a positive sample pair and a negative sample pair.
Step 2: constructing CPC model
2.1 building encoder model to extract data features
2.2 constructing an autoregressive network model, inputting the features extracted in 2.1 into the autoregressive network model, and fusing the historical information by using the autoregressive model GRU to obtain a feature fusion vector c of the historical information.
And 2.3, constructing a prediction model, and inputting the feature fusion vector obtained in the step 2.2 into the prediction model to generate a prediction result.
2.4, constructing a model output, carrying out dot product operation on the prediction result obtained in the step 2.3 and the result obtained by the data to be trained through the encoder, and taking the result obtained by calculation as the output of the model by using a sigmoid function in the range of [0,1] according to the characteristic of vector dot product operation, wherein the larger the result is, the smaller the result is.
2.5 constructing a CPC model, wherein the input data of the CPC model comprises training data and data to be trained, and the output is calculated as 2.4.
Step 3: training CPC model
3.1 initializing model parameters
3.2 inputting data into the model for training
3.3, saving the model, and drawing the accuracy of the training set and the verification set. At this point the model can generate features of the same class as the original ECG data, which can be used for training of downstream tasks such as classification tasks.
Step 4: classification task
4.1 randomly scrambling the training data, and dividing the training data into 5 folds by using the idea of K-fold cross validation.
4.2 constructing a classification model, which is specific in that three identical training data are used instead of only one.
4.3 training the model.
The following is one embodiment of the present invention.
Step 1: data preprocessing
1.1 a heart beat was truncated by the R peak position. The present experiment uses an MIT-BIT arrhythmia database as the data set. MITAB contains 48 dual lead ECG recordings, with the first lead of each recording being the II lead except for a few recordings, each recording being 30 minutes long, with a sampling rate of 360Hz, and each recording having 650000 points. Since the four records 102, 104, 107, 217 contain pacing beats, the four records are deleted. Taking the R peak as a reference point, taking the front 0.4s and the rear 0.5s as a heart beat, the sampling rate is 360Hz, so the heart beat length is 0.4×360+0.5×360=324, and resampling the heart beat to 251. All heart beats can be classified into these five categories according to the standard set forth by the american medical equipment enhancement institute (The Association for the Advancement of Medical Instrumentation, AAMI for short): normal heart beat (N for short), supraventricular ectopic heart beat (supraventricular ectopic beat for short SVEB), ventricular ectopic heart beat (VEB for short), fusion heart beat (F for short), and unknown classification heart beat (Q for short). The heart beats are classified simultaneously when the heart beats are cut off, and the obtained N, SVEB, VEB, F, Q heart beats of the five types correspond to the following numbers: 90081. 2781, 7008, 802, 15. Since Q is an unclassified heartbeat, the final dataset contains only four major classes N, SVEB, VEB, F, with labels encoded as 0,1, 2, 3.
1.2, carrying out scrambling rearrangement on all data and corresponding labels according to the mutual corresponding relation. The first 90% of data is taken as training set and the last 10% of data is taken as validation set.
1.3 wavelet transform filtering the signal. The wavelet basis uses db6 and the filtering replaces wavelet coefficients less than 5hz and greater than 90hz with 0, leaving only coefficients between the 3 rd and 6 th detail subbands for reconstruction.
1.4 generates positive and negative samples. The positive sample pairs are the same in category, the negative sample pairs are different in category, and sample data are divided into training data and data to be trained. Both training data and data to be trained are (32,4,151).
Step 2: and (5) constructing a CPC model.
2.1 building an encoder model. The overall architecture of the coding model is shown in fig. 2, the first half consists of four blocks consisting of a full-connection layer, a batch normalization layer, a LeakyReLu activation layer, the output dimension of the full-connection layer being 64, after passing through a flat layer, a block is connected, the output dimension of the full-connection layer of the block is 256, the output dimension of the last full-connection layer is 10, and all full-connection layers use linear activation functions. The function of the encoder is to extract the features of the training data.
2.2 constructing an autoregressive network model. The autoregressive network model uses GRU (Gated Recurrent Unit) with an output dimension of 256, returning only the output of the last cell. This part results in a feature fusion vector c that fuses the history information.
And 2.3, building a prediction model. The predictive model is shown in fig. 3, where the full connected layer output dimension 10 uses a linear activation function. Because the four fully connected layers are placed in a list, the Lambda layer is used to splice the four fully connected layers together laterally to form a network. And inputting the feature fusion vector output by the autoregressive model into a prediction model, and outputting a predicted result.
And 2.4, building an output model. The output model calculates the dot product between the result obtained by encoding the data to be trained by the encoder and the result to be predicted generated by the prediction model, and maps the value of the dot product to the range of [0,1] through a sigmoid function after averaging.
2.5, constructing an integral CPC model, inputting training data and data to be trained, obtaining a result 1 after the training data passes through an encoder, an autoregressive model and a prediction model, obtaining a result 2 after the data to be trained passes through the encoder, calculating dot product values of the result 1 and the result 2, and outputting the result after the dot product value passes through sigmoid.
And step 3, initializing model parameters, and inputting ECG data into a model for training.
3.1 initializing model parameters. The model learning rate was set to 0.001, the sample batch size was 32, and the number of iterations was 10. Adam was used to optimize the learning rate and the loss function was used with binary_cross sentropy. The learning rate is specially processed to make the model converge more quickly, and when 2 epochs pass and the model performance is not improved, the learning rate is reduced to 1/3 of that of the original model.
And 3.2, generating training data and data to be trained. Wherein the relationship between the training data and the data to be trained, the positive sample pair and the negative sample pair is shown in fig. 4. The number of positive sample pairs is the same as the number of negative sample pairs, and the probability of generating N, SVEB, VEB, F categories is 0.1,0.3,0.2,0.4 respectively according to the different numbers of the categories of the training set, so that the categories with fewer sample numbers can be sufficiently trained. The data generated after the training data passes through the encoder, the autoregressive model and the prediction model is prediction data, and the data generated after the data to be trained passes through the encoder is the data to be predicted. Training the training data and the data to be trained in the model.
3.3 saving the model. The trained model is stored, and meanwhile, the accuracy of the training set and the verification set in the training process is drawn, as shown in fig. 5, the model can be seen from the graph to obtain a very good effect.
Step four: applying the trained CPC model to downstream classification tasks
4.1 dividing the training data. To keep pace with the trained CPC model, the training set data was also divided into 5 shares using the MIT-BIT arrhythmia database.
4.2 constructing a classification model. Three identical training data were used for the classification model. Each training data pass through the encoder portion of the CPC, the one-dimensional convolutional layer, the relu active layer, the one-dimensional max-pooling layer, the one-dimensional convolutional layer, the relu active layer, and the one-dimensional max-pooling layer. And splicing the results obtained by the three data, then connecting a flat layer and two full-connection layers, and finally obtaining a classification result through the full-connection layer with an activation function of softmax. A concrete model structure is shown in fig. 6.
4.3 training a classification model. The penalty function uses categorical cross sentropy and the optimizer uses rmsprop, set to a batch size of 64, training 10 epochs. The model accuracy of the training process is shown in fig. 7.
The invention is applicable to ECG data under the condition of unbalanced data.
The invention is applicable to less frequent arrhythmia data with less ECG data collected. Aiming at the condition that the collected data is less, the CPC can solve the problem caused by insufficient data quantity by maximizing mutual information of the CPC, so that the data with insufficient sample quantity can be expanded, and the generalization capability of downstream tasks is improved.
The invention improves the accuracy of generating the same class of ECG data features.
Effective features are extracted through the encoder, unnecessary noise is removed, the features are more obvious, and subsequent processing is facilitated. The contrast prediction coding utilizes own mutual information, improves the self prediction capability, and strengthens the characteristic extraction capability of the encoder at the same time, so that the characteristic extraction effect is quite good. The model achieves a fairly good effect on the MIT-BIT arrhythmia database.
The invention accelerates the training speed of the downstream ECG classification model.
The ECG data is coded by the coder through the contrast prediction coding, and different types of data are distinguished, so that the convergence speed of the model can be improved and the model training can be accelerated when the classification model is trained.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims (7)

1. A model is generated based on the characteristics of the ECG data of the comparative predictive coding, comprising the steps of:
s1, adopting a data set and preprocessing;
s2, dividing ECG training data into positive sample pairs and negative sample pairs, wherein the positive sample pairs are data of the same category, and the negative sample pairs are data of different categories; respectively dividing training data and data to be trained in the positive sample pair and the negative sample pair;
s3, constructing a contrast predictive coding CPC model, and inputting training data and data to be trained;
coding the training data and the data to be trained through an encoder, and then putting a result obtained by coding the training data into an autoregressive model to obtain Context information Context, wherein the Context enters a prediction model to obtain a future multi-step prediction value;
s4, calculating dot products together with the predicted value and the value of the data to be trained after encoding to obtain a loss value;
s5, training a contrast prediction coding CPC model;
and s6, applying the trained CPC model to arrhythmia classification tasks.
2. The contrast prediction encoding based ECG data feature generation model of claim 1, wherein the preprocessing of the data set in s1 comprises:
s11, acquiring heart beats by adopting R peak positions marked by the data set;
s12, resampling the heart beat;
s13, filtering by using wavelet transformation;
s14, re-labeling the data set, disturbing and rearranging the data set, dividing the data set into a training set and a verification set, and dividing the training set into two parts, namely training data and data to be trained; and simultaneously constructing a positive sample pair and a negative sample pair.
3. The contrast prediction encoding based ECG data feature generation model of claim 1, wherein the autoregressive model construction process in s3 comprises:
the autoregressive model GRU is used to fuse the history information, with an output dimension of 256, returning only the output of the last unit.
4. The contrast prediction encoding-based ECG data feature generation model of claim 1, wherein the process of constructing the prediction model comprises:
the output dimension of the full connection layer is 10, and a linear activation function is used; because the four fully connected layers are placed in a list, the Lambda layer is used to splice the four fully connected layers together laterally to form a network.
5. The ECG data feature generation model based on contrast prediction coding according to claim 1, wherein in s4, the dot product loss value is set to be in the range of [0,1] using a sigmoid function as the output of the contrast prediction coding CPC model.
6. The contrast prediction encoding based ECG data feature generation model of claim 1, wherein the CPC model is trained as follows:
s51, initializing model parameters;
s52, inputting the data into a model for training;
and s53, saving the model, and drawing the training set and verifying the accuracy of the set.
7. The contrast prediction encoding based ECG data feature generation model of claim 1, wherein s6 comprises the steps of:
s61, dividing training data, and dividing a data set into 5 parts in order to keep consistency with a trained CPC model;
s62, constructing a classification model, wherein the classification model uses three identical training data; each training data sequentially passes through an encoder part of the CPC, a one-dimensional convolution layer, a relu activation layer, a one-dimensional maximum pooling layer, a one-dimensional convolution layer, a relu activation layer and a one-dimensional maximum pooling layer; splicing the results obtained by the three data, then connecting a flat layer and two full-connection layers, and finally obtaining a classification result through the full-connection layer with an activation function of softmax;
s63, training a classification model, using a penalty function of categorical cross sentropy, using rmsprop with batch size set to 64, training 10 epochs.
CN202110978438.7A 2021-08-25 2021-08-25 ECG data feature generation model based on contrast predictive coding Active CN113679393B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110978438.7A CN113679393B (en) 2021-08-25 2021-08-25 ECG data feature generation model based on contrast predictive coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110978438.7A CN113679393B (en) 2021-08-25 2021-08-25 ECG data feature generation model based on contrast predictive coding

Publications (2)

Publication Number Publication Date
CN113679393A CN113679393A (en) 2021-11-23
CN113679393B true CN113679393B (en) 2023-05-26

Family

ID=78582137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110978438.7A Active CN113679393B (en) 2021-08-25 2021-08-25 ECG data feature generation model based on contrast predictive coding

Country Status (1)

Country Link
CN (1) CN113679393B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663719B (en) * 2022-01-26 2024-03-22 合肥工业大学 Data scarcity-oriented self-supervision data mining method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783884A (en) * 2020-06-30 2020-10-16 山东女子学院 Unsupervised hyperspectral image classification method based on deep learning
WO2021062366A1 (en) * 2019-09-27 2021-04-01 The Brigham And Women's Hospital, Inc. Multimodal fusion for diagnosis, prognosis, and therapeutic response prediction
CN113178189A (en) * 2021-04-27 2021-07-27 科大讯飞股份有限公司 Information classification method and device and information classification model training method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021062366A1 (en) * 2019-09-27 2021-04-01 The Brigham And Women's Hospital, Inc. Multimodal fusion for diagnosis, prognosis, and therapeutic response prediction
CN111783884A (en) * 2020-06-30 2020-10-16 山东女子学院 Unsupervised hyperspectral image classification method based on deep learning
CN113178189A (en) * 2021-04-27 2021-07-27 科大讯飞股份有限公司 Information classification method and device and information classification model training method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
He, JY., Rong, J., Sun, L.,et.al..An advanced two-step dnn-based framework for arrhythmia detection.《24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)》.2020,第12085卷422-434. *
Lorre G , Rabarisoa J , Orcesi A , et al..Temporal Contrastive Pretraining for Video Action Recognition.《2020 IEEE Winter Conference on Applications of Computer Vision (WACV)》.2020,全文. *
Wang J , Zheng Y , Chen X , et al..Semi-supervised Learning with Contrastive Predicative Coding.《arXiv》.2019,全文. *

Also Published As

Publication number Publication date
CN113679393A (en) 2021-11-23

Similar Documents

Publication Publication Date Title
CN109948647B (en) Electrocardiogram classification method and system based on depth residual error network
CN111160139B (en) Electrocardiosignal processing method and device and terminal equipment
CN107890348B (en) One kind is extracted based on the automation of deep approach of learning electrocardio tempo characteristic and classification method
CN110379506B (en) Arrhythmia detection method using binarization neural network for electrocardiogram data
CN111626114B (en) Electrocardiosignal arrhythmia classification system based on convolutional neural network
CN110046604B (en) Single-lead ECG arrhythmia detection and classification method based on residual error network
CN111772619B (en) Heart beat identification method based on deep learning, terminal equipment and storage medium
CN111990989A (en) Electrocardiosignal identification method based on generation countermeasure and convolution cyclic network
CN110638430B (en) Method for building cascade neural network ECG signal arrhythmia classification model
CN112426160A (en) Electrocardiosignal type identification method and device
Wang et al. Application of fuzzy cluster analysis for medical image data mining
CN113679393B (en) ECG data feature generation model based on contrast predictive coding
CN115221926A (en) Heart beat signal classification method based on CNN-GRU network model
Rahuja et al. A deep neural network approach to automatic multi-class classification of electrocardiogram signals
CN114004258B (en) Semi-supervised electrocardiographic abnormality detection method
Qu et al. ECG Heartbeat Classification Detection Based on WaveNet-LSTM
CN114613497A (en) Intelligent medical auxiliary diagnosis method of patient sample based on GBDT sample level
CN114118226A (en) ECG data classification method based on time convolution network model
CN114343665A (en) Arrhythmia identification method based on graph volume space-time feature fusion selection
CN112686091B (en) Two-step arrhythmia classification method based on deep neural network
CN114366116A (en) Parameter acquisition method based on Mask R-CNN network and electrocardiogram
CN114520031A (en) Method for predicting permeability of compound placental membrane based on machine learning
CN114091530A (en) Electrocardiogram classification method and domain-adaptive-based electrocardiogram classification model training method
Silipo et al. Supervised and unsupervised learning for diagnostic ECG classification
CN112365992A (en) Medical examination data identification and analysis method based on NRS-LDA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant