CN117421678B

CN117421678B - Single-lead atrial fibrillation recognition system based on knowledge distillation

Info

Publication number: CN117421678B
Application number: CN202311750008.5A
Authority: CN
Inventors: 彭博; 蔡家骏; 王俞松; 周秘; 谭静
Original assignee: Southwest Petroleum University; Chengdu Wenjiang District Peoples Hospital
Current assignee: Southwest Petroleum University; Chengdu Wenjiang District Peoples Hospital
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-03-22
Anticipated expiration: 2043-12-19
Also published as: CN117421678A

Abstract

The invention relates to the field of artificial intelligence, in particular to a knowledge distillation-based single-lead atrial fibrillation recognition system, which comprises: the twelve-lead preprocessing module is used for preprocessing original twelve-lead data; the II lead extraction module is used for extracting the II leads from the data processed by the twelve-lead preprocessing module; the teacher recognizes the network training module, utilize ResNet34 network as teacher's network to train twelve lead data; the student identification network training module is used for carrying out knowledge distillation by taking a ResNet10 network as a student network and utilizing knowledge of the teacher identification network training module; and the middle layer knowledge transfer module based on the attention feature fusion utilizes an attention mechanism to perform feature fitting on two layers adjacent to the student identification network training module to form a new middle layer. The invention uses the knowledge distillation single-lead electrocardiograph recognition model, can reduce the network memory requirement while maintaining high performance, so that the method can be deployed in an environment with limited resources and has higher prediction speed.

Description

Single-lead atrial fibrillation recognition system based on knowledge distillation

Technical Field

The invention belongs to the field of artificial intelligence, and particularly relates to a knowledge distillation-based single-lead atrial fibrillation recognition system.

Background

Electrocardiography (ECG) is one of the most commonly used noninvasive techniques for recording fluctuations in cardiac bioelectric activity in medical diagnosis, plays a significant role in detecting arrhythmias, and can help doctors to perform targeted therapy on patients; however, in practical clinical diagnostics, experienced doctors often spend a lot of time accurately identifying the electrocardiogram, which brings inefficiency and difficulty to long-term monitoring, and as the population ages, the number of arrhythmic patients also presents explosive growth, which forces us to seek more efficient, accurate and cost-effective automated electrocardiographic diagnostic tools, the development of which is crucial for timely finding and treating heart-related problems, improving the quality of life of patients and reducing adverse consequences of cardiovascular disease.

Therefore, since the 60 s of the 20 th century, computer-aided electrocardiographic interpretation systems have emerged for heart rhythm diagnosis, and with the upgrading of software and hardware and the gradual development of artificial intelligence, the degree of automation of electrocardiographic recognition has been increasing, especially in the field of deep learning.

The deep learning model has great potential in treating arrhythmia, because the deep learning model can automatically learn characteristics and can process a large amount of data so as to improve accuracy and robustness, but the traditional electrocardiographic data identification models are all multi-lead or single-lead-based models, and the researches show that ECG (electrocardiogram) shows excellent performance on the multi-lead model and the single-lead model, however, the memory parameters of the multi-lead model are huge, and the single lead lacks multi-lead electrocardiographic information, so that the models are difficult to deploy on edge equipment due to the defect of poor performance; therefore, the invention provides a knowledge distillation-based single-lead atrial fibrillation recognition system, which can effectively solve or alleviate the problems.

Disclosure of Invention

Based on the problems, the invention provides a knowledge distillation-based single-lead atrial fibrillation recognition system, which solves the difficulty that the application of a deep learning multi-lead model on edge equipment is limited by huge model parameters and effectively improves the performance of the single-lead model, and comprises the following steps:

the twelve-lead preprocessing module is used for preprocessing each electrocardiographic data lead based on the original twelve-lead electrocardiographic database;

the II lead extraction module is used for extracting II leads of the twelve lead modules and is used for training of the student identification network training module;

the teacher identification network training module is used for training the preprocessed twelve-lead electrocardiograph data through a ResNet34 deep neural network to obtain a teacher identification network;

the middle layer knowledge transfer module is used for reconstructing the middle layer of the student identification network training module, and performing feature fusion on the middle layers of the adjacent student identification network training modules by using an attention mechanism to form a middle layer of a new student identification network training module;

the student identification network training module is used for training the preprocessed II lead electrocardiographic data through an intermediate layer knowledge transfer module based on attention feature fusion and a ResNet10 deep neural network under the intermediate layer and output guidance of the teacher identification network training module to obtain a student identification network;

in an embodiment, the twelve-lead preprocessing module is specifically configured to use a butterworth band-pass filtering method and smooth convolution for twelve-lead data in the original twelve-lead electrocardiograph database, and perform standardized processing on all data, so as to remove noise and burrs of the data and improve generalization of the data;

in an embodiment, the II-lead extraction module is specifically configured to extract, according to the position index, all the II-lead data of the twelve leads from the data processed by the twelve-lead preprocessing module, and only retain the data shape of a single lead;

in one embodiment, the teacher recognition network training module uses the twelve-lead data set processed by the twelve-lead preprocessing module to train and predict, in the data set, 8384 pieces of data and corresponding labels marked by the professional doctor are used as training sets, and 2099 pieces of data and corresponding labels marked by the professional doctor are used as test sets; the teacher identification network training module uses a ResNet34 deep neural network, all network layers are used for extracting characteristics of electrocardiographic data, each layer except a convolution layer and a feedforward neural network consists of a plurality of residual blocks so as to avoid the problem of gradient disappearance, and multi-classification cross entropy loss is used as a loss function during model training;

in an embodiment, the middle layer knowledge transfer module based on attention feature fusion is configured to perform feature fusion on each pair of adjacent residual layers in the student identification network training module by using an attention mechanism-based method to form a middle layer of a new student identification network training module, and perform co-location matching with the middle layer of the teacher identification network training module, so as to enrich feature tensor information of the middle layer of the student identification network;

in one embodiment, the student identification network training module uses all the II lead data sets extracted by the II lead extraction module to train and predict, in the data sets, 8384 pieces of single lead data and corresponding labels marked by the specialist doctor are used as training sets, and 2099 pieces of single lead data and corresponding labels marked by the specialist doctor are used as test sets; the student identification network training module uses a ResNet10 deep neural network, all network layers are used for extracting characteristics of electrocardiographic data, wherein each layer except a convolution layer and a feedforward neural network consists of a plurality of residual blocks so as to avoid the problem of gradient disappearance, and the residual layers are modified by an intermediate layer knowledge transfer module based on attention characteristic fusion; the model training process adopts multi-classification cross entropy loss, information tensor transmitted by the middle layer of the teacher recognition network training module and information tensor transmitted by the middle layer of the student recognition network training module under the middle layer knowledge transmission module based on attention feature fusion to perform loss calculation by using a feature pyramid formula, and performs KL divergence calculation based on the output probability of the teacher recognition network training module and the output probability of the student recognition network training module, wherein the three are combined to serve as a loss function.

Compared with the prior art, the system firstly uses the twelve-lead preprocessing module to preprocess the original twelve-lead data set, which eliminates noise and burrs in the twelve leads and improves generalization; dividing the preprocessed twelve-lead data into a training set and a testing set, inputting the training set into a teacher identification network training module for training, and verifying by the testing set to obtain an offline teacher identification model; and then the II leads in the preprocessed twelve leads are extracted by the II lead extraction module, and finally the II leads are input into the student identification network training module for final feature classification according to the same method, and the residual layer in the student identification network training module adopts an intermediate layer knowledge transfer module based on attention feature fusion, so that the technology can effectively reduce the size of the multi-lead data model while improving the accuracy.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the following will briefly introduce the drawings that are needed in the description of the embodiments of the present invention or the prior art:

fig. 1 is a schematic structural diagram of a knowledge distillation-based single-lead atrial fibrillation recognition system, in which numbers are marked as serial numbers of modules, and the execution sequence of the system is indicated by combining line segments, in which 101 is the first operation of data processing, i.e. twelve-lead data after preprocessing is obtained, 102 is the second operation of data processing, II leads in the twelve-lead after preprocessing are extracted, 103 is the twelve-lead data obtained by using 101, the twelve-lead data are sent to a teacher recognition network training module for training, 105 is the II-lead data obtained by using 102 and sent to a student recognition network training module for training, 104 is the knowledge distillation of the student recognition network obtained by using 103 through an intermediate knowledge transfer module based on attention feature fusion, and finally the knowledge distillation-based single-lead atrial fibrillation recognition network is obtained;

FIG. 2 is a graph of the results of data processed by the twelve lead preprocessing module in one embodiment, wherein the labels in the graph represent 6 limb leads (I, II, III, aVR, aVL, aVF) and 6 chest leads (V1, V2, V3, V4, V5, V6);

FIG. 3 is a diagram of the extraction results of the II lead extraction module in one embodiment;

FIG. 4 is a schematic diagram of a knowledge distillation network in one embodiment;

FIG. 5 is a schematic diagram of a network structure of an intermediate layer knowledge transfer module based on attention-based feature fusion in an embodiment.

Description of the embodiments

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below; it is to be understood that the drawings in the following description are merely exemplary or embodiments of the invention and that it will be apparent to those skilled in the art that the invention may be applied to other similar situations in light of the accompanying drawings without undue effort, unless otherwise apparent from the language environment or otherwise indicated, in which like reference numerals refer to like structures or operations.

As used in the specification and claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plural number, unless the context clearly dictates otherwise; generally, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

It will be understood that when an element or module is referred to as being "connected," "coupled" to another element, module, or block, it can be directly connected or coupled or in communication with the other element, module, or block, or intervening elements, modules, or blocks may be present unless the context clearly dictates otherwise; the term "and/or" as used herein may include any and all combinations of one or more of the associated listed items.

As shown in fig. 1, an embodiment of the present invention provides a knowledge distillation-based single-lead atrial fibrillation recognition system, which includes a twelve-lead preprocessing module 101 for preprocessing each electrocardiographic data lead based on an original twelve-lead electrocardiographic database; the II lead extraction module 102 is used for extracting II leads of the twelve lead modules and training the student identification network training module; the teacher identification network training module 103 is used for training the preprocessed twelve-lead electrocardiographic data through a ResNet34 deep neural network to obtain a teacher identification network; the middle layer knowledge transfer module 104 is used for reconstructing the middle layer of the student identification network training module, and performing feature fusion on the adjacent middle layer of the student network by using an attention mechanism to form a middle layer of a new student identification network training module; the student identification network training module 105 is used for training the preprocessed II-lead electrocardiographic data through the intermediate layer knowledge transfer module based on attention feature fusion and the ResNet10 deep neural network under the intermediate layer and output guidance of the teacher identification network training module to obtain the student identification network.

In the embodiment, firstly, an original twelve-lead data set is subjected to data preprocessing by using a twelve-lead preprocessing module; this process eliminates noise and burrs in the twelve leads and improves generalization; next, dividing the preprocessed twelve-lead data into a training set and a testing set, inputting the training set into a teacher identification network training module for training, verifying by the testing set to obtain an offline teacher identification model, extracting II leads in the preprocessed twelve-lead data by using an II lead extraction module, inputting the II leads into a student identification network training module for final feature classification according to the same method, wherein a residual layer in the student identification network training module adopts an intermediate layer knowledge transfer module based on feature fusion of attention; therefore, the technology can effectively reduce the size of the multi-lead data model while improving the accuracy;

specifically, the original twelve lead data set includes twelve pieces of information data, I, II, III, AVR, AVL, AVF, V, V2, V3, V4, V5, and V6.

as shown in fig. 2, fig. 2 is a graph of the result of data processed by the twelve-lead preprocessing module in the present invention, except for the above processing technique, on the premise that the PQRST wave of the II leads is the most obvious known information, all the leads are uniformly divided according to the R-wave index position of the II leads, so as to ensure that the data waveform can be matched with the R-wave index position, thereby providing a reliable data prediction result;

the specific pretreatment process comprises the following steps:

inputting each sample according to the twelve-lead sequence, and intercepting signal frequencies beyond 50 Hz-1 Hz by using a Butterworth band-pass filter because electrocardiograph data need to exclude myoelectric interference, power frequency interference and baseline drift;

removing burrs from each lead by using a convolution smoothing window, wherein the size of the window is 5, so that a relatively smooth image can be obtained, and the burr is prevented from excessively interfering with model identification;

finally, standardization is adopted for each twelve-lead sample, so that the generalization degree of the sample is improved.

In an embodiment, the II-lead extraction module is specifically configured to extract the data processed by the twelve-lead preprocessing module according to the position index of the II-leads, extract all II-lead data of the twelve-leads, and only preserve the data shape of a single lead;

as shown in fig. 3, fig. 3 is an extraction result diagram using the II lead extraction module after passing through the twelve lead preprocessing module; the extraction steps are as follows:

directly storing II data obtained by sequentially indexing twelve lead data into a one-dimensional data tensor, but not clearing other lead data;

and superposing all the extracted II leads according to the same dimension to form a new II lead data set.

In one embodiment, the teacher recognition network training module uses the twelve-lead data set processed by the twelve-lead preprocessing module to train and predict, in the data set, 8384 pieces of data and corresponding labels marked by the professional doctor are used as training sets, and 2099 pieces of data and corresponding labels marked by the professional doctor are used as test sets; the teacher identification network training module uses a ResNet34 deep neural network, all network layers are used for extracting characteristics of electrocardiographic data, wherein each layer except a convolution layer and a feedforward neural network consists of a plurality of residual blocks so as to avoid the problem of gradient disappearance; the multi-classification cross entropy loss is adopted as a loss function during model training;

the multi-class cross entropy loss function is:；

wherein,representing the number of categories, the output of the model is a model containing +.>A vector of individual category scores; />Is a loss function; />Is the +.f. in the true class label vector>Element, if the sample belongs to +.>The number of categories is 1, otherwise 0; />Is the +.o in the output vector of the model>Element, representing model pair +>Prediction probabilities for individual categories;

as shown in res net34 (teacher recognition network) above fig. 4, the network receives data with tensor (12,2500), firstly, the data passes through a convolution layer with a convolution kernel of 7 and a step length of 2, then passes through a maximum pooling layer with a convolution kernel of 3 and a step length of 2, then passes through residual layers formed by 3,4,6,3 residual blocks respectively, and finally, the result is obtained by combining the average pooling layer with a full connection layer; the residual block consists of an input characteristic tensor and a characteristic tensor obtained by the characteristic tensor through a convolution layer with the convolution kernel of 3, and the output of one residual layer is set as an intermediate layer output tensor of the teacher identification network training module.

In an embodiment, the middle layer knowledge transfer module based on attention feature fusion is configured to perform feature fusion on each pair of adjacent residual layers in the student identification network training module by using an attention mechanism-based method to form a middle layer of a new student identification network training module, and perform co-location matching with the middle layer of the teacher identification network training module, so as to enrich feature tensor information of the middle layer of the student identification network training module;

as shown in fig. 5, the middle layer knowledge transfer module based on attention feature fusion fuses adjacent middle layer feature tensors of the residual layer output of the student identification network training module by using an attention mechanism-based method, and re-distributes the fused middle layer feature tensors as weights to two middle layers, and then forms a middle layer output of a new student identification network training module, as shown in a block of a res net10 (student identification network) below in fig. 4, the formula steps are as follows:

firstly, carrying out feature matching on two adjacent layers of a middle layer required to be output by a student identification network training module to unify tensors, then fusing the two tensors by using a concentration method, and then respectively transmitting fused results back to the original two layers as weights to finally obtain a new integrated output layer, wherein the method comprises the following steps:

；

wherein interpolate is used to match adjacent intermediate layer feature tensors, f _j And f _j+1 Front and rear feature tensors of adjacent intermediate layers respectively;

；

the AT is used for carrying out attention-based fusion on the feature layers after tensor matching, and the unsqueeze is used for increasing feature dimensions so as to facilitate the fusion;

；

and finally, respectively combining the new front and back layer images with the original front and back layers according to the sequence, and then combining to obtain better characteristic tensors.

In one embodiment, the student identification network training module uses all the II lead data sets extracted by the II lead extraction module to train and predict, in the data sets, 8384 pieces of single lead data and corresponding labels marked by the specialist doctor are used as training sets, and 2099 pieces of single lead data and corresponding labels marked by the specialist doctor are used as test sets; the student identification network training module uses a ResNet10 deep neural network, all network layers are used for extracting characteristics of electrocardiographic data, wherein each layer except a convolution layer and a feedforward neural network consists of a plurality of residual blocks so as to avoid the problem of gradient disappearance, and the residual layers are modified by an intermediate layer knowledge transfer module based on attention characteristic fusion; when the model is trained, the multi-classification cross entropy loss, the information tensor transmitted by the middle layer of the teacher recognition network training module and the information tensor of the middle layer of the student recognition network training module under the middle layer knowledge transmission module based on attention feature fusion are adopted to perform loss calculation by using a feature pyramid formula, the output probability of the teacher recognition network training module and the output probability of the student recognition network training module are used for performing KL divergence calculation, and the three are combined to be used as a loss function, and the formula steps are as follows:

matching the middle layer output of the new student identification network with the corresponding middle layer output of the offline teacher identification network, then fitting by using a feature pyramid, and finally calculating a loss function, wherein the formula is as follows:

；

wherein S is _z Intermediate layer knowledge representing student identification network training module using attention-based feature fusionThe new student obtained by the transmission module identifies the middle layer of the network training module, wherein n is 5;

T _z similarly, each middle layer output representing an offline teacher recognition network training module corresponding to a new middle layer tensor size of the student recognition network training module; performing loss calculation on the hierarchical structure context in the form of a feature pyramid for the layers of the two corresponding networks;

finally, calculating the total loss of the distillation network:

firstly, we extend the loss calculation of the hierarchical structure context of a certain corresponding layer in the teacher identification network training module and the student identification network training module after the intermediate layer knowledge transfer module based on the attention feature fusion to all required layers, and the total loss is as follows:

；

and then, using a knowledge distillation calculation method based on probability response to transmit knowledge to the student network from the result output by the offline teacher identification network, and calculating the loss of output probability:

；

wherein T is temperature, alpha is proportion, both variables are hyper-parameters, KLDivLoss means using KL divergence calculation, f _student And f _teacher Output tensors respectively representing output layers of the student network and the teacher network, stu_Loss representing output prediction Loss of the student network;

finally, the total loss is calculated:

；

wherein gamma is a superparameter, L _KD And Loss of _review The loss of the knowledge distillation based on the output probability and the loss of the knowledge distillation based on the intermediate layer characteristic tensor are respectively used as optimization targets, the loss function is minimized, and the distillation network model is optimized.

In one embodiment, 1 ten thousand samples collected from the original twelve lead dataset include 3889 sinus bradycardia, 1826 sinus rhythms, 1780 atrial fibrillation, 1568 sinus tachycardia, 445 atrial flutter, 399 sinus arrhythmia, 587 supraventricular tachycardia, all samples of which are used according to a 8:2 segmentation ratio to obtain a training set and a test set, the training set is sent to a teacher identification network training module and a student identification network training module of an intermediate knowledge transfer module based on attention feature fusion through a knowledge distillation method, the teacher identification network training module carries out knowledge transfer on the student identification network training module of the intermediate knowledge transfer module based on attention feature fusion, finally obtains a single lead atrial fibrillation identification network based on knowledge distillation, and the single lead atrial fibrillation identification network is tested by using the test set, and performance tests are carried out in the case of atrial fibrillation classification and seven classification respectively as shown in the following table 1;

table one: performance of single-lead atrial fibrillation recognition system based on knowledge distillation under two-classification and seven-classification standards of atrial fibrillation

；

In summary, the invention provides a knowledge distillation-based single-lead atrial fibrillation recognition system with better classification performance.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples merely represent several embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention; it should be noted that it is possible for a person skilled in the art to make several variations and modifications without departing from the concept of the present application, which are all within the scope of protection of the present application; accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A knowledge distillation-based single-lead atrial fibrillation recognition system, comprising at least the following steps:

the twelve-lead preprocessing module is used for realizing noise reduction preprocessing on each electrocardiographic data lead based on an original twelve-lead electrocardiographic database;

the II lead extraction module is used for extracting II leads of the twelve lead modules according to the position index, and superposing the II leads according to the same dimension to form a new II lead data set for training of the student identification network training module;

the teacher identification network training module is used for training the preprocessed twelve-lead electrocardiograph data through a ResNet34 deep neural network formed by two-dimensional convolution blocks to obtain a teacher identification network;

the middle layer knowledge transfer module based on the attention feature fusion is used for reconstructing the middle layer of the student identification network training module, performing the feature fusion on the middle layers of the adjacent student identification network training modules by using an attention mechanism, then, re-distributing the middle layers as weights, and finally, forming a middle layer output of a new student identification network training module;

the student identification network training module is transformed by the middle layer knowledge transfer module based on attention feature fusion, under the middle layer and output guidance of the teacher identification network training module, the preprocessed II lead electrocardiographic data is sent into the ResNet10 deep neural network formed by one-dimensional convolution blocks to be trained to obtain a student identification network, and finally, the student identification network is used for atrial fibrillation identification of two classifications and seven classifications, and the specific details are as follows:

the residual layer of the student identification network training module is transformed by the middle layer knowledge transfer module based on attention feature fusion to obtain a new middle layer of the student identification network training module, then multi-classification cross entropy loss is adopted during model training, the information tensor transferred by the middle layer of the teacher identification network training module and the information tensor transferred by the middle layer of the student identification network training module under the middle layer knowledge transfer module based on attention feature fusion are used for carrying out loss calculation by using a feature pyramid formula, and KL divergence calculation is carried out based on the output probability of the teacher identification network training module and the output probability of the student identification network training module, and the three are combined to serve as a loss function, wherein the steps and calculation formulas are as follows:

f _i ＝interpolate(f _j )；

f _i+1 ＝interpolate(f _j+1 )；

F _j ＝AT(unsqueeze(f _i )，unsqueeze(f _i+1 ))；

finally, combining the new front and back layer images with the original front and back layers respectively according to the sequence, and then combining to obtain better characteristic tensors;

and matching the middle layer output of the new student identification network with the corresponding middle layer output of the offline teacher identification network, fitting by using a characteristic pyramid, and finally calculating a loss function, wherein the formula is as follows:

S _z ＝s ₁ ，s ₂ ，s ₃ ，...，s _n ；

T _z ＝t ₁ ，t ₂ ，t ₃ ，...，t _n ；

wherein S is _z Representing a middle layer of a new student identification network training module obtained by the student identification network training module by using a middle layer knowledge transfer module based on attention feature fusion, wherein n is 5;

finally, calculating the total loss of the distillation network:

Distillation_Loss＝KLDivLoss(f _student ，f _teacher *(T^2)/f _student .shape[0]；

L _KD ＝α*Stu_Loss+(1-α)*Distillation_Loss；

finally, the total loss is calculated:

Loss_all＝γ*L _KD +(1-γ)*Loss _review ；