CN116649899A

CN116649899A - Electrocardiogram signal classification method based on attention mechanism feature fusion

Info

Publication number: CN116649899A
Application number: CN202310602673.3A
Authority: CN
Inventors: 臧俊斌; 廉城; 药帅; 张志东; 薛晨阳
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2023-05-26
Filing date: 2023-05-26
Publication date: 2023-08-29

Abstract

The application relates to the technical field of artificial intelligence, and particularly provides an electrocardiosignal classification method based on attention mechanism feature fusion, which comprises the following steps of S1, acquiring multi-mode input; s2, multi-instance segmentation is carried out on multi-mode input, and instance characteristics are extracted; s3, inputting the example characteristics into a preset neural network to obtain packet characteristics; s4, obtaining a classification result according to the package characteristics. The method classifies the long-term ECG signals, and the acquired multi-modal input contains time domain information and space domain information, so that the model can learn complementary information of a plurality of modal characteristics; obtaining example characteristics through multi-example segmentation and characteristic extraction; fusing to obtain package characteristics; and inputting the packet characteristics into a classifier to obtain a classification result. By adopting the feature fusion method of the attention mechanism, the example with more similarity with the top optimal activation example has larger attention weight, improves the accuracy of the classification result, and enables the classification result to accurately reflect arrhythmia and ST segment change.

Description

Electrocardiogram signal classification method based on attention mechanism feature fusion

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an electrocardiosignal classification method based on attention mechanism feature fusion.

Background

Various cardiovascular diseases exhibit different characteristics in Electrocardiograph (ECG) signals. Classification of electrocardiography using deep neural networks is the basis for medical digitization. Therefore, it is necessary to provide an electrocardiosignal classification method based on a deep neural network.

ECG signals are classified into short-term ECG signal classification and long-term ECG signal classification according to the length of acquisition time. Short-term ECG signal classification refers to about 10s or a single beat ECG signal, with shorter data lengths, long-term ECG signal classification refers to ECG signals that need to span hours or even days, with longer data lengths. Because only short data can be input, the existing deep neural networks used for ECG signal classification include Convolutional Neural Networks (CNNs), recurrent Neural Networks (RNNs), transformers, variants thereof, and the like, which are mainly used for short-term ECG signal classification; that is, the length of the ECG data of 1 hour and above is too large to be input as the deep learning model. The existing classification method performs feature extraction on the original ECG signal data, and learns a classification model according to the extracted features, so that the time domain features of the ECG signal can be extracted. However, existing classification methods do not explore the complementary information from multiple modality features well; the long-term ECG signals cannot be classified. The long-term ECG signal is able to accurately capture arrhythmias and ST segment changes, which are anomalies that are difficult to detect with the short-term ECG signal.

In summary, the existing classification method cannot classify the long-term ECG signal and only extract the time domain features of the electrocardiographic signal, so that the classification result cannot accurately reflect arrhythmia and ST segment change.

Disclosure of Invention

The application aims to provide an electrocardiosignal classification method based on attention mechanism feature fusion, aiming at the defects in the prior art, so as to solve the problems that the existing classification method can not classify long-term ECG signals and can only extract electrocardiosignal time domain features, so that the classification result can not accurately reflect arrhythmia and ST segment change.

In order to achieve the above purpose, the technical scheme adopted by the application is as follows:

the application provides an electrocardiosignal classification method based on attention mechanism feature fusion, which comprises the following steps: s1, acquiring multi-mode input; s2, multi-instance segmentation is carried out on multi-mode input, and instance characteristics are extracted; s3, inputting the example characteristics into a preset neural network to obtain packet characteristics; s4, obtaining a classification result according to the package characteristics.

Further, the multimodal input in step S1 includes a time series modality and a visual modality, respectively an ECG signal and a GAF image.

Further, the GAF image is obtained by performing normalization, polar coordinate transformation and Gramian matrix operation on the ECG signal.

Further, the pass expression is normalizedCompletion, wherein-1.ltoreq.a.ltoreq.b.ltoreq.1, a and b denote parameters, n denotes the length of the ECG signal, x _i Representing the signal per unit length before normalization, x _i The normalized signal length unit is represented by max (x) which is the maximum value of the signal length unit, and min (x) which is the minimum value of the signal length unit.

Further, the expression of the coordinate angle of the polar coordinate transformation is thatWherein (1)>Representing the normalized unit length signal; the expression of the radius of the coordinates is +.>Wherein t is _i M is less than or equal to M, M is a constant for positive transformation of polar coordinate space span, M=1, t _i Is a time stamp.

Further, in step S2, the time-series modal segmentation is expressed asWherein t represents a time sequence mode, C represents the number of channels, Z represents the length of each heart beat, K in the superscript represents the number of heart beats in the package, and x _k Representing a kth beat instance; expression for modal segmentation of visual sequencesWhere v denotes the visual modality, C denotes the number of channels, H and W denote the size of the image, J denotes the number of non-overlapping image blocks each GAF image is cut into, x _j Representing the j-th visual instance.

Further, the expression of the time sequence mode extraction characteristic is h _k ＝f _t (x _k ) Wherein f _t Represents ResNet1d_Wang, x _k Input heart beat instance of feature to be extracted, h _k Corresponding to the extracted heart beat example characteristics; the expression of the extracted characteristics of the visual mode is h _j ＝f _v (x _j ) Wherein f _v Expressed as SENet, x _j Visual instance of input feature to be extracted, h _j For the corresponding extracted visual instance features.

Further, the preset neural network in step S3 is a feature fusion network based on the attention mechanism.

Further, the expression of the packet feature obtained through the feature fusion network based on the attention mechanism is as followsWherein (1)>Representing series operation, W _C Representing a weight matrix, C representing a packet characteristic, B ^t Packet features representing time series modalities, B ^V A package feature representing a visual modality.

Further, the expression of the packet characteristics of the time series modality is B ^t ＝λQ _kmax +(1-λ)b ^t The expression of the package characteristics of the visual mode is B ^V ＝λQ _jmax +(1-λ)b ^V Wherein λ is a superparameter, Q _kmax And Q _jmax B) top optimal activation instance features corresponding to time series and visual modes respectively ^t And b ^V The feature vectors respectively correspond to the time sequence mode and the visual mode.

Compared with the prior art, the application has the beneficial effects that: the method classifies long-term ECG signals, firstly, multi-mode input is obtained, the ECG signals contain time domain information, and the GAF images contain space domain information, so that the model can learn complementary information of a plurality of mode characteristics; then carrying out multi-instance segmentation and feature extraction on the multi-mode input to obtain instance features; the example features obtained by the ECG signal and the GAF image, namely the example features of the time sequence mode and the example features of the visual mode are input into a feature fusion network based on an attention mechanism to be fused, so that package features are obtained; finally, the packet characteristics are input into a classifier to obtain a classification result. By adopting the feature fusion method of the attention mechanism, the example with more similarity with the top optimal activation example has larger attention weight, improves the accuracy of the classification result, and enables the classification result to accurately reflect arrhythmia and ST segment change.

Drawings

FIG. 1 is a schematic diagram of an electrocardiosignal classification method based on attention mechanism feature fusion provided by the application;

FIG. 2 is a schematic flow chart of an electrocardiosignal classification method based on attention mechanism feature fusion;

fig. 3 is a schematic diagram of a feature fusion network based on an attention mechanism in an electrocardiosignal classification method based on attention mechanism feature fusion provided by the application;

fig. 4 is a schematic diagram of a classifier in an electrocardiosignal classifying method based on attention mechanism feature fusion.

Detailed Description

In order to make the implementation of the present application more clear, the following detailed description will be given with reference to the accompanying drawings.

The application provides an electrocardiosignal classification method based on attention mechanism feature fusion, which is shown in fig. 1 and 2 and comprises the following steps:

s1, acquiring multi-mode input;

the classification of ECG signals depends on the waveform characteristics of the different disorders, and thus it is crucial to extract the integrated characteristics from the ECG signals. Existing methods use only the original ECG signal as input, which ignores spatially related information in the time series signal. The application uses the original ECG signal and the gram angle field (Gramian Angular Field, GAF) image after the ECG signal conversion as the multi-mode input, so that the model can learn the complementary information among different modes. That is, using as input the original ECG signal and the multiple modalities of the corresponding GAF image, features extracted from the ECG signal and the GAF image have different characteristics, the former containing time domain information and the latter containing spatial domain information, which enables the model to learn complete information from the different modalities.

The time series of ECG signals are transformed into GAF images, i.e. the time series into image signals, using GAF transformation, and the correlation operation of the co-ordinate transformation and the gram matrix is completed. Specifically, each point in time of the ECG signal is mapped to a polar coordinate system as another time series representation by an inverse cosine function (arccos) of the calculated value; therefore, the GAF image contains information on the spatial domain between the sampling points, providing a basis for extraction and learning of the spatial domain information.

In particular, let E be a single lead ECG signal of length n, which can be expressed as E= { x ₁ ,x ₂ ,...x _n First, in [ a, b ]]Within-range specificationThe normalized (i.e., normalized) E expression is:wherein a is more than or equal to-1 and b is more than or equal to-1, and the normalized product is obtainedThe normalization operation provides a value of 0, pi]Angle values in the range will help to obtain granularity of information in the GAF. Next, in order to obtain a representation of the polar coordinates of the data for cosine angle +.>And radius r are calculated, the expressions are +.>And->Where M is a constant for polar space span positive normalization, m=1, t _i Is a time stamp. After coordinate system conversion, the GAF conversion inputs the vector into a Gramian matrix, which may specifically be a sine value of the angle difference of the Gramian Angle Difference Field (GADF)/(sine value of the angle difference of the GADF)>Also can be the cosine value of the sum of the Gramian angle and the angle of the field (GASF), expressed as:the output of the Gramian matrix is the obtained GAF image, namely the calculation result of GADF or GASF is the GAF image.

The ECG signal is converted to a GAF image using GAF and the GAF image is combined with the original ECG signal as a multi-modal input, i.e. the original ECG signal is input together with the GAF image. The GAF image contains spatial domain information between sample points and the ECG signal contains temporal domain information between sample points; the GAF image and the ECG signal are used as multi-mode input, so that the model can learn complementary information among different modes, example features of the different modes can be obtained in the step S2, and fusion is further carried out, so that high-quality interaction information is realized, and finally, classification results can accurately reflect arrhythmia and ST segment change.

The data signals used in this embodiment may be directly acquired ECG signals or may be electrocardiogram data sets in an existing database. In embodiments of the present application, the Style Bitersburgh INCART arrhythmia 12 lead multi-labeled electrocardiogram dataset and the MIT-BIH supraventricular arrhythmia dataset are used. Dividing the two data sets into an intra-patient mode and an inter-patient mode; the internal mode of the patient refers to that all data are randomly distributed directly, one part is a training set, the other part is a testing set, and the inter-patient mode refers to that the data of different testees are subjected to feature extraction and classification; the intra-patient mode and inter-patient mode are performed entirely independently. The training set is used for training the model and estimating parameters; the test set is used for testing and evaluating the quality of the trained model, and cannot be used for training the model. Specifically, in the intra-patient mode, random sampling over the entire data set divides the samples into multiple groups, e.g., 10 groups; the first 70% is a training set, e.g., groups 1 to 7 are training sets; the remainder are test sets, for example, groups 8 to 10 as test sets. In inter-patient mode, patients are divided into multiple groups, 70% being the training set, 30 being the test set, e.g., 10 groups, with group 1 through group 7 patient samples as the training set and the remaining patient samples as the test set, by randomly sampling the entire patient. The samples are typically classified into positive and negative samples, with the arrhythmia ECG signal being classified as positive in this example, and the normal ECG signal being considered as negative.

S2, multi-instance segmentation is carried out on multi-mode input, and instance characteristics are extracted;

to avoid information loss, multi-instance learning (Multiple Instance Learning, MILs) is introduced into the present model, MILs being performed on multi-modal inputs. The packet is defined in MILs as a collection of examples, i.e., groups of tagged packets are formed, and instead of receiving a set of individually tagged instances, a learner receives a set of tagged packets, each having multiple instances. In the case of multi-instance binary classification, a packet may be marked negative if all instances in the packet are negative samples. On the other hand, if at least one of the packets is a positive sample, the packet is marked as positive. In this embodiment, abnormal ECG signals are classified as positive samples, for example, arrhythmia, and normal ECG signals are regarded as negative samples. The ECG signals in this embodiment all refer to long-term ECG signals in which the number of beats is large, the proportion of abnormal (arrhythmia) beats is small as compared with the whole area, and the label of the packet depends on whether or not the inside contains abnormal beats. Meanwhile, the original long-term ECG signal is divided into a plurality of examples by introducing MIL, so that the length of the signal is shortened, and the divided examples are used as the input of the characteristic fusion network based on the attention mechanism, so that the problem that the long-term ECG signal cannot be input into the neural network due to the limitation of hardware resources is solved.

First, multi-instance segmentation is performed on multi-modal input, and in the case of binary classification, a packet including K instances can be expressed as x= { X ₁ ,x ₂ ,...x _k Label of example y _k E {0,1}, where y _k =1 denotes positive example, y _k =0 represents a negative example. The present embodiment provides both example-based and feature-based strategies, each using the formula y=g (f (x ₀ ),f(x ₁ ),...,f(x _k ) A) implementation; specifically, one is an instance-based policy that classifies instances and obtains an instance classifier g, and then applies a maximum or average pooling operation f to each instance to obtain a label Y of the package; another feature-based strategy is based on features extracted from all instances in the steamed stuffed bun, f is an instance feature extractor, g is a fusion operation that fuses instance features to get the sub-labels of the steamed stuffed bun. Preferably, in this embodiment, a feature-based strategy is used, the examples are unlabeled, the difficulty in classifying the examples is high, the high-precision example classifier is not easy to train, the labels of the features are relatively easy to obtain, and the example features are easy to learn by a neural network, so that the classification performance is better.

Specifically, in a time series modality, N ECG signal training samples are takenRegarded as packets, each packetComprises K heart beats (examples)>Wherein the superscript t denotes a time-series modality, C denotes the number of channels, and Z denotes the length of each beat. Each heart beat x _k ∈R ^C×Z Are all considered an example; the cardiac beat here is a plurality of segments that divide the long-term electrocardiogram according to cardiac beats. In the visual modality, N GAF images +.>Viewed as a package, where each GAF image is cut into J non-overlapping tiles, as an example, may be denoted as X _i ^V ＝{x ₁ ,x ₂ ,...x _J }x _j ∈R ^J×C×H×W Where v represents the visual modality, C represents the number of channels, and H and W represent the size of the image; subscripts i, j, k in the present application all refer to sample numbers, and R represents the dimension of the instance. The visual modality in which each point in time of an ECG signal is mapped to a polar coordinate system, exhibiting more spatial domain information, the GAF image cannot be segmented by taking each beat as an example, employs a different method from the time series modality for multi-instance segmentation. Compared with directly using the whole GAF image as input, the introduction of MIL to divide the GAF image into instances as input promotes better interaction with time-series modal instances, and can extract complementary information of a plurality of modal features, thereby improving the classification accuracy of long-term ECG signals.

And after the multi-instance segmentation, extracting instance characteristics. Time series instance x _k Input into ReNet1d_Wang to extract time-series modal instance feature h _k ∈R ^L×1 The following is shown: h is a _k ＝f _t (x _k ) Wherein f _t Representing ResNet1d_Wang, L represents the length of each example feature. Visual example x _j Inputting SENet for extracting each visual modality instance feature h _j ∈R ^L×1 The following is shown: h is a _j ＝f _v (x _j ) Wherein f _v Expressed as SENet, L represents the length of each example feature. The resulting time seriesExample characteristics of the modality are expressed as: h is a ^t ＝{h ₁ ,h ₂ ,...h _k ,...h _K }∈R ^L×K Example characteristics of a visual modality may be expressed as: h is a ^V ＝{h ₁ ,h ₂ ,...h _j ,...h _J }∈R ^L×J Wherein h is _k ,h _j ∈R ^L×1 . ResNet1d_Wang and SENet represent the one-dimensional residual neural network and the compression and excitation neural network, respectively.

S3, inputting the example characteristics into a preset neural network to obtain packet characteristics;

the preset neural network is a feature fusion network based on an attention mechanism, and a specific schematic diagram is shown in fig. 3. It should be noted that the overall structure of the neural network is a Multi-mode Multi-instance learning neural network (MAMIL), which belongs to a convolutional neural network, and is a Multi-mode Multi-instance learning neural network for long-term ECG signal classification, and is composed of three parts, namely a Multi-mode input, multi-instance learning and a feature fusion network based on an attention mechanism; wherein the multi-modal input part refers to step S1, the multi-instance learning part refers to step S2, and the attention mechanism-based feature fusion network part refers to step S3.

And (3) inputting the example features of the multiple modes obtained in the step (S2), namely the instant sequence example features and the visual example features, into a feature fusion network based on an attention mechanism for integration, and obtaining final package features. Specifically, a maximum pooling operation is used to select a top optimal activation instance feature from instance features of each modality, then the activation instance features are regarded as queries to calculate the instance attention weights of all the remaining instance features, so that interaction between instance of different modalities is realized, namely, the instance features of different modalities are combined together, K+J instance features are combined together, and the optimal activation instance features in two modalities are used as queries to interact with the K+J instance features to calculate weights respectively. The last packet feature contains information from both modes and highlights the importance of the top best activation instance. The feature fusion network based on the attention mechanism can effectively eliminate redundant information, lower calculation complexity is realized, and the prediction accuracy is higher.

Specifically, the example characteristics of the time-series modality obtained in step S2 are expressed as: h is a ^t ＝{h ₁ ,h ₂ ,...h _k ,...h _K }∈R ^L×K Example characteristics of a visual modality may be expressed as: hv= { h ₁ ,h ₂ ,...h _j ,...h _J }∈R ^L × ^J Wherein h is _k ,h _j ∈R ^L ^×1 . In the max pooling process, all instances of each modality are projected to h ^t ∈R ^K×1 ,h ^v ∈R ^J×1 For calculating the scores of all instances, obtaining the top-best activation instance in each modality. More specifically, the maximum pooling corresponding to the time series modality is expressed as: h is a _kmax ＝maxpool(W ₀ h ₁ ,...,W ₀ h _k ) The operation of maximum pooling corresponding to the visual mode is expressed as: h is a _jmax ＝maxpool(W ₀ h ₁ ,...,W ₀ h _j ) Wherein W is ₀ Is a weight vector for the linear projection of example features, from the linear layer. Next, each instance feature and top-best-activated instance feature of each modality are converted into feature vectors, query Q _i ∈R ^L ^×1 ,Q _kmax ∈R ^L×1 ,Q _jmax ∈R ^L×1 The following are provided: time series modality corresponds to Q _k ＝W _q h _kmax Visual modality corresponds to Q _j ＝W _q h _jmax Wherein W is _q Is the weight matrix of the full connected layer, Q _kmax And Q _jmax Respectively representing top optimal activation examples corresponding to time sequence modes and visual modes, Q _i Representing any of the example features. Defining a correlation metric S as a metric of correlation between the top most active instance of each modality and all instances, the expression of the correlation metric corresponding to the time series modality being:the expression of the correlation metric corresponding to the visual modality is:therein, +..

To reduce the number of parameters, each query is not matched with additional key vectors, but rather queries are matched with other queries, and key vectors are not learned. For each modality, the vector Q is weighted by the correlation measure S _i Summing to obtain a feature term vector, wherein the feature vector corresponding to the time sequence mode is specifically as followsThe corresponding feature vector of the visual mode is +.>Here b ^t And b ^V Containing information from two modality instances, b ^t And b ^V Both the instance information of the time series modality and the instance information of the visual modality are contained. Fusing the top optimal activation instance feature and the feature vector B of each mode to obtain the package feature of each mode, specifically, the expression of the package feature corresponding to the time sequence mode is B ^t ＝λQ _kmax +(1-λ)b ^t The expression of the package characteristic corresponding to the visual mode is B ^V ＝λQ _jmax +(1-λ)b ^V Where λ is a super parameter, which can be set to 0.5. Finally, the multi-mode package features are fused together, and a final package feature C is obtained through a linear layer, wherein the calculation formula is as follows: />Wherein (1)>Representing series operation, W _C Is a weight matrix from the linear layer on the composite in fig. 3. The linear layer mainly outputs the features as one-dimensional vector, so that an end-to-end learning process is realized; the linear layers in the two branches in fig. 3 are used to obtain the query, and the linear layers on the combined branch are used to fuse the packet characteristics obtained by the two modal branches, so as to obtain the final packet characteristics.

The top-best activation instance is selected for the classification task in the fusion network based on the attention mechanism, and the correlation metric S is used to obtain correlation information between the top-best activation instance and other instances, so that instances with more similarity to the top-best activation instance will have a greater weight of attention. The existing method calculates the attention weight between all heart beat examples, the calculation process is very complex, and the positive Chang Xin beats dilute the attention weight of the abnormal heart beat. Therefore, the method has smaller calculated amount and focuses on abnormal heart beats more, so that the classification accuracy is higher.

S4, obtaining a classification result according to the package characteristics.

And (3) obtaining a classification result according to the packet characteristics obtained in the step (S3). The packet characteristics are input into a classifier, the classifier outputs a prediction result, and the structure of the classifier is shown in fig. 4 and consists of a linear layer, a noise rectifying layer and an activation function layer. Since the dataset used in this embodiment is a multi-labeled dataset, the activation function of the classifier uses sigmoid instead of softmax activation function.

In the embodiment of the application, the area under the curve (AUC), the F1 score, the average precision (mAP) and the recall rate can be used as the evaluation indexes of all data set electrocardiogram classification tasks. Since all data sets are multi-labeled, mAP is used instead of accuracy. Recall represents the proportion of predictive positives in all positive samples, which is of practical significance for disease classification. The indexes are obtained by calculating corresponding evaluation indexes through MAMIL output. Generally, the training process may be considered to end when the classification accuracy on the test set is higher than the existing model of the same type.

Based on the above-described san peterbi INCART arrhythmia 12-lead multi-labeled electrocardiogram dataset and MIT-BIH supraventricular arrhythmia dataset. The classification performance of different feature fusion methods on MAMIL was tested on the intra-patient and inter-patient data sets, and the feature fusion method provided in this embodiment has the best model performance on almost all indexes. This suggests that example features can be fused efficiently using the feature fusion method provided. The proposed MAMIL was compared to other existing common models on the intra-and inter-patient modes of the santa borborg INCART arrhythmia and MIT-BIH supraventricular arrhythmia datasets. "MIL-free" means MIL-free MAMIL as described in the present application, which uses only complete ECG data and GAF images as multimodal inputs, without splitting into instances. MAMIL achieves significantly better results in almost all of the evaluation indicators of the intra-patient model dataset. With the test set data in step S1, F1, mAP and Recall achieved over 2% improvement on the san peterbo INCART arrhythmia dataset and over 1% improvement on the MIT-BIH chamber arrhythmia dataset. MAMIL achieved a significant improvement in almost all metrics evaluated on the inter-patient pattern dataset, except that resnet1d_wang was slightly higher than MAMIL in AUC metrics.

The results indicate that MAMIL has good performance improvement and generalization capabilities in both intra-patient and inter-patient pattern data sets. However, without MILs, the performance of MAMIL may be significantly degraded. This indicates the effectiveness of MILs. The performance of MAMIL over long-term ECG classification tasks is significantly improved compared to other MILs models. All indicators of the intra-patient patterns of the san-peterbi INCART arrhythmia and MIT-BIH supraventricular arrhythmia datasets improved to some extent. When using the data set for inter-patient mode, MAMIL achieved more than 1% improvement over F1 and Recall, and mAP also improved to some extent, compared to other MILs models, except that DSMIL was 0.05% higher than MAMIL as described above in AUC index. The method of the application finally obtains the classification result through the multi-mode input, multi-instance segmentation, feature extraction and fusion processes, wherein not only the time domain information but also the space feature are extracted, so that the information complementation among the multi-modes is realized, thereby improving the classification result of the long-term ECG signal, and the classification result can accurately reflect arrhythmia and ST segment change.

The multi-modal learning model provided by the embodiment of the application can add more modal inputs, such as table modal data and audio modal data. The backbone of each modality may be replaced and the proposed attention mechanism based fusion approach may be applied to more modality inputs. The method of the present application is not limited to use with the long-term ECG signals described above, for example, the above model may be used to classify various long-term electrical signals, such as brain electrical signals, muscle electrical signals, stomach electrical signals, and the like, human biological electrical signals.

The above is only a preferred embodiment of the present application, and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An electrocardiosignal classification method based on attention mechanism feature fusion is characterized by comprising the following steps:

s1, acquiring multi-mode input;

s2, carrying out multi-instance segmentation on the multi-mode input, and extracting instance characteristics;

s3, inputting the example features into a preset neural network to obtain package features;

and S4, obtaining a classification result according to the package characteristics.

2. The method of classifying cardiac signals based on attention mechanism feature fusion according to claim 1, wherein the multi-modal input in step S1 includes a time-series modality and a visual modality, respectively an ECG signal and a GAF image.

3. The method for classifying cardiac signals based on attention mechanism feature fusion according to claim 2, wherein the GAF image is obtained from the ECG signal through normalization, polar coordinate transformation, and Gramian matrix operation.

4. An electrocardiographic signal classification method based on attention mechanism feature fusion according to claim 3, wherein the normalization is performed by an expressionCompletion, wherein, -1.ltoreq.a.ltoreq.b.ltoreq.1, a andb represents a parameter, n represents the length of the ECG signal, x _i Representing the signal per unit length before normalization, +.>The normalized signal length unit is represented by max (x) which is the maximum value of the signal length unit, and min (x) which is the minimum value of the signal length unit.

5. The method for classifying electrocardiographic signals based on attention mechanism feature fusion according to claim 4, wherein the expression of the coordinate angle of the polar coordinate transformation isWherein (1)>Representing the normalized unit length signal; the expression of the radius of the coordinates is +.>Wherein t is _i M is less than or equal to M, M is a constant for positive transformation of polar coordinate space span, M=1, t _i Is a time stamp.

6. The method for classifying cardiac signals based on attention mechanism feature fusion according to claim 1 or 5, wherein the expression for time-series modal segmentation in step S2 isWherein t represents a time sequence mode, C represents the number of channels, Z represents the length of each heart beat, K in the superscript represents the number of heart beats in the package, and x _k Representing a kth beat instance; expression for modal segmentation of the visual sequence +.>Where v denotes the visual modality, C denotes the number of channels, H and W denote the size of the image, J denotes the number of non-overlapping image blocks each of the GAF images is cut into, x _j Representing the j-th visual instance.

7. The method for classifying electrocardiographic signals based on attention mechanism feature fusion according to claim 6, wherein the expression of the time-series modality extraction feature is h _k ＝f _t (x _k ) Wherein f _t Represents ResNet1d_Wang, x _k Input heart beat instance of feature to be extracted, h _k Corresponding to the extracted heart beat example characteristics; the expression of the visual mode extraction characteristic is h _j ＝f _v (x _j ) Wherein f _v Expressed as SENet, x _j Visual instance of input feature to be extracted, h _j For the corresponding extracted visual instance features.

8. The method for classifying cardiac signals based on attention mechanism feature fusion according to claim 1 or 7, wherein the preset neural network in step S3 is an attention mechanism-based feature fusion network.

9. The method for classifying electrocardiographic signals based on attention mechanism feature fusion according to claim 8, wherein the expression for obtaining the packet feature through the attention mechanism-based feature fusion network is as followsWherein (1)>Representing series operation, W _C Representing a weight matrix, C representing the packet characteristics, B ^t A packet feature representing the time series modality, B ^V A package characteristic representing the visual modality.

10. According to claimThe method for classifying electrocardiographic signals based on attention mechanism feature fusion according to claim 9, wherein the expression of the packet features of the time series modality is B ^t ＝λQ _kmax +(1-λ)b ^t The expression of the package characteristics of the visual mode is B ^V ＝λQ _jmax +(1-λ)b ^V Wherein λ is a superparameter, Q _kmax And Q _jmax B, respectively, top optimal activation example characteristics corresponding to the time sequence mode and the visual mode ^t And b ^V And the feature vectors are respectively corresponding to the time sequence mode and the visual mode.