CN114764575A

CN114764575A - Multi-modal data classification method based on deep learning and time sequence attention mechanism

Info

Publication number: CN114764575A
Application number: CN202210376944.3A
Authority: CN
Inventors: 舒明雷; 朱佳媛; 刘辉; 陈达; 谢小云
Original assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Current assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2022-07-19
Anticipated expiration: 2042-04-11
Also published as: CN114764575B

Abstract

A multi-mode data classification method based on deep learning and time sequence attention mechanisms comprises the steps of firstly utilizing PC-TBG-ECG and PC-TBG-PCG models to respectively realize feature extraction of electrocardiosignals and heart sound signals, and then adopting an XGboost integration classification algorithm to perform feature selection and classification on the extracted features. While the operation efficiency is increased, the regularization is added, and overfitting is effectively prevented. The method is suitable for classification detection of data in different modes, and can analyze signals from various angles, so that the classification accuracy is improved.

Description

Multi-modal data classification method based on deep learning and time sequence attention mechanism

Technical Field

The invention relates to the field of multi-modal data classification, in particular to a multi-modal data classification method based on deep learning and time sequence attention mechanism.

Background

Electrocardiogram (ECG) and Phonocardiogram (PCG) are non-invasive and cost-effective signal acquisition tools, and potential features of two signals can be mined and analyzed from various angles according to complementarity between the two, so that the classification effect is improved. In the conventional research, related researchers mainly use single-mode data or a single classifier to classify signals, but the classification research using the method cannot classify the signals from the comprehensive point of view, so the classification method fusing multi-mode data proposed by the research is extremely suitable for the practical requirement.

Disclosure of Invention

In order to overcome the defects of the technology, the invention provides a method which is suitable for classification detection of data in different modes, can analyze signals from various angles and further improves the accuracy of classification.

The technical scheme adopted by the invention for overcoming the technical problems is as follows:

a multi-modal data classification method based on deep learning and time sequence attention mechanism comprises the following steps:

a) selecting training-a in PhysioNet/CinC Challenge 2016 as a data set, expanding the data set, and dividing the expanded data set into a training set and a test set;

b) establishing an electrocardiosignal model, wherein the electrocardiosignal model is sequentially composed of a PC module, a TBG module and a classification module;

c) resampling the electrocardiosignals in the training set and the testing set to 2048 sampling points, and then carrying out z-score normalization processing to obtain normalized electrocardiosignals x'_ecg；

d) Normalizing the electrocardiosignals x 'in the training set'_ecgInputting the signal into a PC module of the electrocardiosignal model, and outputting the signal to obtain a characteristic signal X₁The PC module is composed of four convolution branches and a 1 multiplied by 1 convolution block in sequence;

e) the characteristic signal X₁Inputting the signal into a TBG module of the electrocardiosignal model, and outputting the signal to obtain a characteristic signal X₂The TBG module consists of 3 convolutional coding modules and a bidirectional GRU layer with a TPA mechanism;

f) the characteristic signal X₂Inputting the prediction data into a classification module of the electrocardiosignal model, and outputting the prediction data to obtain a prediction category f_ecgThe classification module is sequentially composed of a full connection layer and a Softmax activation layer;

g) repeating the steps d) to f) for N times, and obtaining an optimal electrocardiosignal model after training by using an SGD optimizer and minimizing a cross entropy loss function;

h) establishing a heart sound signal model which sequentially consists of a PC module, a TBG module and a classification module;

i) resampling the heart sound signals in the training set and the test set to 8000 sampling points, and then carrying out z-score normalization processing to obtain normalized heart sound signals x'_pcg；

j) The heart sound signals x 'after normalization in the training set'_pcgInputting the signal into PC module of heart sound signal model, outputting to obtain characteristic signal Y₁The PC module is composed of four convolution branches and a 1 multiplied by 1 convolution block in sequence;

k) the characteristic signal Y₁Inputting the signal into TBG module of heart sound signal model, and outputting to obtain characteristic signal Y₂The TBG module consists of 4 convolutional coding modules and a bidirectional GRU layer with a TPA mechanism;

l) applying the characteristic signal Y₂Inputting the prediction into a classification module of a heart sound signal model, and outputting the prediction category f_pcgThe classification module is sequentially composed of a full connection layer and a Softmax activation layer;

m) repeating the steps j) to l) M times, and obtaining an optimal heart sound signal model after training by using an SGD optimizer and minimizing a cross entropy loss function;

n) manually dividing the data set into a new training set and a new testing set according to the proportion of 4:1, inputting the new training set into the optimal electrocardiosignal model, and outputting a 64-dimensional characteristic signal X through a TBG (tunnel boring generator) module of the optimal electrocardiosignal model₃Inputting the new training set into the optimal heart sound signal model, and passing through the optimal heart sound signal modelThe TBG module outputs a 64-dimensional characteristic signal Y₃By the formula PP^x＝[X₃,Y₃]Calculating to obtain spliced 128-dimensional feature fusion signal PP^x；

o) fusing features into a signal PP^xInputting the signals into an XGboost classifier to obtain a feature fusion signal PP^xThe importance score ranking of (2) and selecting the signal of the top 64 of the importance score ranking as the characteristic signal PP₁ ^xSelecting an optimal hyper-parameter by adopting 5-fold cross validation, and training the XGboost classifier by utilizing the optimal hyper-parameter to obtain an optimized XGboost classifier;

p) inputting the new test set into the optimal electrocardiosignal model, and outputting a 64-dimensional characteristic signal X through a TBG (tunnel boring generator) module of the optimal electrocardiosignal model₄Inputting the new test set into the optimal heart sound signal model, and outputting a 64-dimensional characteristic signal Y through a TBG (tunnel boring generator) module of the optimal heart sound signal model₄By the formula PP^c＝[X₄,Y₄]Calculating to obtain spliced 128-dimensional feature fusion signal PP^c；

q) feature fusion signal PP^cInputting the signals into an XGboost classifier to obtain a feature fusion signal PP^cThe importance score ranking of (2) and selecting the signal of the top 64 of the importance score ranking as the characteristic signal PP₁ ^c。

Preferably, the data set is expanded in step a) by using a sliding window segmentation method, and the data set is divided into 5 different training sets and test sets by using a five-fold cross validation method.

Further, in step c), the formula is used

Calculating to obtain a normalized electrocardiosignal x'_ecgIn the formula x_ecgFor training and testing the concentrated ECG signal, u_ecgIs the mean value, σ, of the electrocardiosignal_ecgIs the variance of the electrocardiosignals.

Further, step d) comprises the following steps:

d-1) the firstOne convolution branch is composed of a convolution layer with 32 channel numbers, convolution kernel size of 1 × 15 and step length of 1, a batch normalization layer and a ReLU activation layer in sequence, and electrocardiosignals x 'after normalization in a training set'_ecgInputting the signal into the first convolution branch, and outputting to obtain a 32-dimensional characteristic signal E₁；

d-2) the second convolution branch comprises a convolution layer with 32 channels, convolution kernel size of 1 × 13 and step length of 1, a batch normalization layer and a ReLU activation layer in sequence, and the electrocardiosignal x 'after normalization in the training set'_ecgInputting the signal into the second convolution branch, and outputting to obtain a 32-dimensional characteristic signal E₂；

d-3) the third convolution branch comprises a convolution layer with 32 channels, convolution kernel size of 1 × 9 and step size of 1, a batch normalization layer and a ReLU activation layer in sequence, and the electrocardiosignal x 'after normalization in the training set'_ecgInputting the signal into a third convolution branch, and outputting to obtain a 32-dimensional characteristic signal E₃；

d-4) the fourth convolution branch comprises a convolution layer with a channel number of 32, a convolution kernel size of 1 x 5 and a step length of 1, a batch normalization layer and a ReLU activation layer in sequence, and the electrocardiosignals x 'after normalization in the training set'_ecgInputting the signal into the fourth convolution branch, and outputting to obtain a 32-dimensional characteristic signal E₄；

d-5) converting the characteristic signal E₁Characteristic signal E₂Characteristic signal E₃Characteristic signal E₄Performing characteristic cascade to obtain a 128-dimensional characteristic signal E ═ E after cascade₁,E₂,E₃,E₄]；

d-6)1 × 1 convolution block is composed of convolution layers with 16 channels, 1 × 1 convolution kernel size, and 1 step size, and ReLU active layer, and the 128-dimensional characteristic signal E is set to [ E ═ E₁,E₂,E₃,E₄]Inputting the signal into a 1 × 1 convolution block, and outputting to obtain a 16-dimensional characteristic signal X₁。

Further, step e) comprises the steps of:

e-1) the first convolutional encoding module sequentially comprises convolutional layers with the number of channels being 32 and the size of convolutional kernel being 1 multiplied by 11, and batch normalizationLayer, ReLU active layer, and pooling layer of size 4, and applying the characteristic signal X₁Inputting the signal into a first convolution coding module, and outputting to obtain a 32-dimensional characteristic signal E₅；

E-2) the second convolutional coding module sequentially comprises a convolutional layer with the channel number of 64 and the convolutional kernel size of 1 multiplied by 7, a batch normalization layer, a ReLU activation layer and a pooling layer with the size of 2, and a characteristic signal E is obtained₅Inputting the signal into a second convolutional coding module, and outputting to obtain a 64-dimensional characteristic signal E₆；

E-3) the third convolutional coding module consists of a convolutional layer with the channel number of 128 and the convolutional kernel size of 3, a batch normalization layer, a ReLU activation layer and a pooling layer with the size of 2 in sequence, and a characteristic signal E is generated₆Inputting the signal into a third convolutional coding module, and outputting to obtain a 128-dimensional characteristic signal E₇；

E-4) converting the characteristic signal E₇Inputting into 32-unit bidirectional GRU layer with TPA mechanism, and outputting to obtain 64-dimensional characteristic signal X₂In the bidirectional GRU layer of TPA mechanism by formula

Calculating to obtain a characteristic signal X₂Where i ═ 1, 2.., n }, n ═ 128, T is transposition, τ is_iFor the attention weight of the ith row vector,

σ (-) is a sigmoid function,

is a time pattern matrix G^CRow i of (1), G^CConv1d (G), Conv1d (·) is a one-dimensional convolution operation, G is a hidden state matrix,

g_ifor the hidden state vector of the ith bidirectional GRU, i ═ 1,2, a_kIs a weight coefficient, g_tIs the hidden state vector of the bi-directional GRU at time t.

Further, in the step g), the value of N is 150, the learning rate of the SGD optimizer is 0.001, the learning rate is attenuated to be 0.1 at every 80 periods, and the formula is used

Calculating to obtain a cross entropy loss function cc (x), wherein L is the number of categories, L is 2, f_i(x) To predict class f_ecgThe predictive label of the ith category of (c),

as a prediction class f_ecgThe real category of the corresponding ith category; in the step m), the value of N is 180, the learning rate of the SGD optimizer is 0.001, the learning rate is attenuated to be 0.1 at every 90 periods, and the N is calculated according to a formula

Calculating to obtain a cross entropy loss function cc (y), wherein L is the number of categories, L is 2, f_i(y) is prediction class f_pcgThe predictive label of the ith category of (c),

as a prediction class f_pcgTrue category of the ith category of (1).

Further, in step i), the formula is used

Calculating to obtain a normalized heart sound signal x'_pcgIn the formula x_pcgFor the heart sound signals in the training set and test set, u_pcgIs the mean value, σ, of the heart sound signal_pcgIs the variance of the heart sound signal.

Further, step j) comprises the following steps:

j-1) the first convolution branch is composed of convolution layer with 32 channels, convolution kernel size of 1 × 15 and step size of 2, batch normalization layer and ReLU activation layer in sequence, and the heart sound signal x 'after normalization in training set'_pcgInputting the signal into the first convolution branch, and outputting to obtain a 32-dimensional characteristic signal P₁；

j-2) the second convolution branch comprises a convolution layer with 32 channels, convolution kernel size of 1 × 11 and step size of 2, a batch normalization layer and a ReLU activation layer in sequence, and the heart sound signal x 'after normalization in the training set'_pcgInputting the signal into the second convolution branch, and outputting to obtain 32-dimensional characteristic signal P₂；

j-3) the third convolution branch comprises a convolution layer with a channel number of 32, a convolution kernel size of 1 × 9 and a step size of 2, a batch normalization layer and a ReLU activation layer in sequence, and the heart sound signal x 'after normalization in the training set'_pcgInputting the signal into a third convolution branch, and outputting to obtain a 32-dimensional characteristic signal P₃；

j-4) the fourth convolution branch comprises a convolution layer with a channel number of 32, a convolution kernel size of 1 × 5 and a step size of 2, a batch normalization layer and a ReLU activation layer in sequence, and the heart sound signal x 'after normalization in the training set'_pcgInputting the signal into the fourth convolution branch, and outputting to obtain a 32-dimensional characteristic signal P₄；

j-5) converting the characteristic signal P₁Characteristic signal P₂Characteristic signal P₃Characteristic signal P₄Performing characteristic cascade to obtain a 128-dimensional characteristic signal P ═ P after cascade₁,P₂,P₃,P₄]；

j-6)1 × 1 convolution block is composed of convolution layer with 32 channel number, convolution kernel size 1 × 1 and step size 1 and ReLU active layer, and 128-dimensional characteristic signal P is [ P ═ P₁,P₂,P₃,P₄]Inputting into a 1 × 1 convolution block, and outputting to obtain a 32-dimensional characteristic signal Y₁。

Further, step k) comprises the steps of:

k-1) the first convolutional coding module sequentially comprises a convolutional layer with the number of channels being 16 and the size of a convolutional kernel being 1 multiplied by 1, a batch normalization layer, a ReLU active layer and a pooling layer with the size being 4, and a characteristic signal Y is obtained₁Inputting the data into a first convolutional coding module, and outputting to obtain a 16-dimensional characteristic signal P₅；

k-2) the second convolutional encoding module sequentially comprises 32 channels and convolutional cores1 × 11 convolution layer, batch normalization layer, ReLU active layer, and pooling layer of size 2, and applying characteristic signal P₅Inputting the signal into a second convolutional coding module, and outputting to obtain a 32-dimensional characteristic signal P₆；

k-3) the third convolutional coding module sequentially comprises a convolutional layer with the channel number of 64 and the convolutional kernel size of 1 multiplied by 7, a batch normalization layer, a ReLU active layer and a pooling layer with the size of 2, and a characteristic signal P is obtained₆Inputting the signal into a third convolutional coding module, and outputting to obtain a 64-dimensional characteristic signal P₇；

k-4) the fourth convolutional coding module sequentially comprises a convolutional layer with the channel number of 128 and the convolutional kernel size of 1 multiplied by 3, a batch normalization layer, a ReLU active layer and a pooling layer with the size of 2, and a characteristic signal P is obtained₇Inputting the signal into a fourth convolutional coding module, and outputting to obtain a 128-dimensional characteristic signal P₈；

k-5) converting the characteristic signal P₈Inputting into 32-unit bidirectional GRU layer with TPA mechanism, and outputting to obtain 64-dimensional characteristic signal Y₂In the bidirectional GRU layer of TPA mechanism by formula

Calculating to obtain a characteristic signal Y₂。

The invention has the beneficial effects that: firstly, the characteristics of electrocardiosignals and heart sound signals are respectively extracted by utilizing PC-TBG-ECG and PC-TBG-PCG models, and then the XGboost integrated classification algorithm is adopted to select and classify the extracted characteristics. While the operation efficiency is increased, the regularization is added, and overfitting is effectively prevented. The method is suitable for classification detection of data in different modes, and can analyze signals from various angles, so that the classification accuracy is improved.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

fig. 2 is a network configuration diagram of the PC module of the present invention.

Detailed Description

The present invention is further described with reference to fig. 1 and 2.

a) selecting training-a in PhysioNet/CinC Challenge 2016 as a data set, expanding the data set, and dividing the expanded data set into a training set and a test set.

b) And establishing an electrocardiosignal model (PC-TBG-ECG), wherein the electrocardiosignal model is sequentially composed of a PC module, a TBG module and a classification module.

c) Resampling the electrocardiosignals in the training set and the testing set to 2048 sampling points, and then carrying out z-score normalization processing to obtain a normalized electrocardiosignal x'_ecg。

d) Normalizing the electrocardiosignals x 'in the training set'_ecgInputting the signal into a PC module of the electrocardiosignal model, and outputting the signal to obtain a characteristic signal X₁The PC module, in turn, is made up of four convolution branches and a 1 x 1 convolution block.

e) The characteristic signal X₁Inputting the signal into a TBG module of the electrocardiosignal model, and outputting the signal to obtain a characteristic signal X₂The TBG module consists of 3 convolutional coding modules and a Bi-directional GRU layer (TPA-Bi-GRU) with TPA mechanism. f) The characteristic signal X₂Inputting the prediction data into a classification module of the electrocardiosignal model, and outputting the prediction data to obtain a prediction category f_ecgThe classification module is composed of a full connection layer and a Softmax activation layer in sequence.

g) And (f) repeating the steps d) to f) N times, and obtaining the trained optimal electrocardiosignal model by using an SGD optimizer and minimizing a cross entropy loss function.

h) And establishing a heart sound signal model (PC-TBG-PCG), wherein the heart sound signal model is composed of a PC module, a TBG module and a classification module in sequence.

i) Resampling the heart sound signals in the training set and the test set to 8000 sampling points, and then carrying out z-score normalization processing to obtain normalized heart sound signals x'_pcg。

j) The heart sound signals x 'after normalization in the training set'_pcgInputting the signal into PC module of heart sound signal model, outputting to obtain characteristic signal Y₁The PC module is sequentially composed of four volumesThe product branch is formed with a 1 x 1 convolution block.

k) The characteristic signal Y₁Inputting the signal into a TBG module of the heart sound signal model, and outputting to obtain a characteristic signal Y₂The TBG module consists of 4 convolutional coding modules and a Bi-directional GRU layer with TPA mechanism (TPA-Bi-GRU). l) applying the characteristic signal Y₂Inputting the prediction into a classification module of a heart sound signal model, and outputting the prediction category f_pcgThe classification module is composed of a full connection layer and a Softmax activation layer in sequence.

M) repeating the steps j) to l) M times, and obtaining the trained optimal heart sound signal model by minimizing a cross entropy loss function by using an SGD optimizer.

n) manually dividing the data set into a new training set and a new testing set according to the proportion of 4:1, inputting the new training set into the optimal electrocardiosignal model, and outputting a 64-dimensional characteristic signal X through a TBG (tunnel boring generator) module of the optimal electrocardiosignal model₃Inputting the new training set into the optimal heart sound signal model, and outputting a 64-dimensional characteristic signal Y through a TBG module of the optimal heart sound signal model₃By the formula PP^x＝[X₃,Y₃]Calculating to obtain spliced 128-dimensional feature fusion signal PP^x。

o) fusing features into a signal PP^xInputting the signals into an XGboost classifier to obtain a feature fusion signal PP^xThe importance score ranking of (2) and selecting the signal of the top 64 of the importance score ranking as the characteristic signal PP₁ ^xAnd selecting an optimal hyper-parameter by adopting 5-fold cross validation, and training the XGboost classifier by utilizing the optimal hyper-parameter to obtain the optimized XGboost classifier.

p) inputting the new test set into the optimal electrocardiosignal model, and outputting a 64-dimensional characteristic signal X through a TBG (tunnel boring generator) module of the optimal electrocardiosignal model₄Inputting the new test set into the optimal heart sound signal model, and outputting a 64-dimensional characteristic signal Y through a TBG (tunnel boring generator) module of the optimal heart sound signal model₄By the formula PP^c＝[X₄,Y₄]Calculating to obtain spliced 128-dimensional feature fusion signal PP^c。

The signals do not need to be subjected to noise reduction, filtering and other processing, the problems of low classification accuracy rate or low practicability and the like caused by unreasonable signal preprocessing in the prior art are solved, and the robustness of the model is ensured. Firstly, the characteristics of electrocardiosignals and heart sound signals are respectively extracted by utilizing PC-TBG-ECG and PC-TBG-PCG models, and then the XGboost integrated classification algorithm is adopted to select and classify the extracted characteristics. While the operation efficiency is increased, the regularization is added, and overfitting is effectively prevented. The method is suitable for classification detection of different modal data, and can analyze signals from various angles, thereby improving the accuracy of classification.

Example 1:

in the step a), the data set is expanded by using a sliding window segmentation method, and the data set is divided into 5 different training sets and test sets by using a five-fold cross validation method.

Example 2:

in step c) by the formula

Example 3:

the step d) comprises the following steps:

d-1) the first convolution branch comprises a convolution layer with 32 channels, convolution kernel size of 1 × 15 and step size of 1, a batch normalization layer and a ReLU activation layer in sequence, and the electrocardiosignal x 'after normalization in the training set'_ecgInputting the signal into the first convolution branch, and outputting to obtain a 32-dimensional characteristic signal E₁；

d-2) The second convolution branch comprises convolution layer with 32 channels, convolution kernel size of 1 × 13 and step size of 1, batch normalization layer and ReLU activation layer in sequence, and the electrocardiosignal x 'after normalization in training set'_ecgInputting the signal into the second convolution branch, and outputting to obtain a 32-dimensional characteristic signal E₂；

d-3) the third convolution branch comprises a convolution layer with a channel number of 32, a convolution kernel size of 1 x 9 and a step length of 1, a batch normalization layer and a ReLU activation layer in sequence, and the electrocardiosignals x 'after normalization in the training set'_ecgInputting the signal into a third convolution branch, and outputting to obtain a 32-dimensional characteristic signal E₃；

d-5) converting the characteristic signal E₁Characteristic signal E₂Characteristic signal E₃Characteristic signal E₄Performing feature cascade to obtain 128-dimensional feature signal E ═ E after cascade₁,E₂,E₃,E₄]；

Example 4:

step e) comprises the following steps:

e-1) the first convolutional coding module consists of a convolutional layer with the number of channels of 32 and the convolutional kernel size of 1 multiplied by 11, a batch normalization layer, a ReLU activation layer and a pooling layer with the size of 4 in sequence, and the characteristic signal X is converted into a characteristic signal₁Inputting the data into a first convolution coding module, and outputting to obtain a 32-dimensional characteristic signal E₅；

e-2) the second convolutional encoding module sequentially consists of 64 channels and 1 × 7 convolutional kernel sizeThe convolution layer, the batch normalization layer, the ReLU activation layer, and the pooling layer with size of 2, and the characteristic signal E₅Inputting the signal into a second convolutional coding module, and outputting to obtain a 64-dimensional characteristic signal E₆；

E-3) the third convolution coding module consists of a convolution layer with the channel number of 128 and the convolution kernel size of 3, a batch normalization layer, a ReLU activation layer and a pooling layer with the size of 2 in sequence, and a characteristic signal E is generated₆Inputting the signal into a third convolutional coding module, and outputting to obtain a 128-dimensional characteristic signal E₇；

σ (-) is a sigmoid function,

g_ifor the hidden state vector of the ith bi-directional GRU, i ═ 1,2, t-1, t is time, w_kIs a weight coefficient, g_tIs the hidden state vector of the bi-directional GRU at time t.

Example 5:

in the step g), the value of N is 150, the learning rate of the SGD optimizer is 0.001, the learning rate of every 80 periods is attenuated to be 0.1 currently, and the method is carried out by a formula

Calculating to obtain a cross entropy loss function cc (x), wherein L is the number of categories, L is 2, f_i(x) As a prediction class f_ecgThe predictive tag of the ith category of (a),

as a prediction class f_pcgTrue category of the ith category of (1).

Example 6:

in step i) by formula

Calculating to obtain a normalized heart sound signal x'_pcgIn the formula x_pcgFor the heart sound signals of the training set and the test set, u_pcgIs the mean value, σ, of the heart sound signal_pcgIs the variance of the heart sound signal.

Example 7:

the step j) comprises the following steps:

j-2) the second convolution branch consists of convolution layer with 32 channel number, convolution kernel size of 1 × 11 and step size of 2, batch normalization layer and ReLU activation layer in sequenceNormalized heart sound signal x 'in training set'_pcgInputting the signal into the second convolution branch, and outputting to obtain a 32-dimensional characteristic signal P₂；

Example 8:

step k) comprises the following steps:

k-1) the first convolutional coding module sequentially comprises a convolutional layer with the number of channels being 16 and the convolutional kernel size being 1 multiplied by 1, a batch normalization layer, a ReLU activation layer and a pooling layer with the size being 4, and the feature signal Y is processed₁Inputting the signals into a first convolutional encoding module, and outputting to obtain a 16-dimensional characteristic signal P₅；

k-2) the second convolutional coding module sequentially comprises a convolutional layer with the number of channels of 32 and the convolutional kernel size of 1 multiplied by 11, a batch normalization layer, a ReLU activation layer and a pooling layer with the size of 2, and the feature signal P is converted into a linear convolution function₅Input to a second convolutional encoding moduleIn the method, 32-dimensional characteristic signal P is obtained by output₆；

k-4) the fourth convolutional coding module sequentially comprises a convolutional layer with the channel number of 128 and the convolutional kernel size of 1 multiplied by 3, a batch normalization layer, a ReLU active layer and a pooling layer with the size of 2, and a characteristic signal P is obtained₇Inputting the data into a fourth convolutional coding module, and outputting to obtain a 128-dimensional characteristic signal P₈；

Calculating to obtain a characteristic signal Y₂。

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-modal data classification method based on deep learning and time-series attention mechanism is characterized by comprising the following steps:

b) establishing an electrocardiosignal model, wherein the electrocardiosignal model consists of a PC module, a TBG module and a classification module in sequence;

c) resampling the electrocardiosignals in the training set and the testing set to 2048 sampling points, and then carrying out z-score normalization processing to obtain a normalized electrocardiosignal x'_ecg；

f) the characteristic signal X₂Inputting the signals into a classification module of the electrocardiosignal model, and outputting to obtain a prediction category f_ecgThe classification module is sequentially composed of a full connection layer and a Softmax activation layer;

g) repeating the steps d) to f) N times, and obtaining an optimal electrocardiosignal model after training by using an SGD optimizer through a minimized cross entropy loss function;

j) Normalizing the heart sound signals x 'in the training set'_pcgInputting the signal into PC module of heart sound signal model, outputting to obtain characteristic signal Y₁The PC module is composed of four convolution branches and a 1 multiplied by 1 convolution block in sequence;

k) the characteristic signal Y₁Inputting the signal into a TBG module of the heart sound signal model, and outputting to obtain a characteristic signal Y₂The TBG module consists of 4 convolutional coding modules and a bidirectional GRU layer with a TPA mechanism;

l) applying the characteristic signal Y₂Input into a classification module of the heart sound signal model,output derived prediction class f_pcgThe classification module is sequentially composed of a full connection layer and a Softmax activation layer;

m) repeating the steps j) to l) M times, and obtaining an optimal heart sound signal model after training by minimizing a cross entropy loss function by using an SGD optimizer;

n) manually dividing the data set into a new training set and a new testing set according to the ratio of 4:1, inputting the new training set into the optimal electrocardiosignal model, and outputting through a TBG (tunnel boring gate) module of the optimal electrocardiosignal model to obtain a 64-dimensional characteristic signal X₃Inputting the new training set into the optimal heart sound signal model, and outputting a 64-dimensional characteristic signal Y through a TBG module of the optimal heart sound signal model₃By the formula PP^x＝[X₃,Y₃]Calculating to obtain spliced 128-dimensional feature fusion signal PP^x；

o) fusing features into a signal PP^xInputting the signals into an XGboost classifier to obtain a feature fusion signal PP^xRank the importance score of (2), and select the signal with the top 64 of the importance score rank as the characteristic signal PP₁ ^xSelecting an optimal hyper-parameter by adopting 5-fold cross validation, and training the XGboost classifier by utilizing the optimal hyper-parameter to obtain an optimized XGboost classifier;

p) inputting the new test set into the optimal electrocardiosignal model, and outputting a 64-dimensional characteristic signal X through a TBG (tunnel boring generator) module of the optimal electrocardiosignal model₄Inputting the new test set into the optimal heart sound signal model, and outputting a 64-dimensional characteristic signal Y through a TBG module of the optimal heart sound signal model₄By the formula PP^c＝[X₄,Y₄]Calculating to obtain spliced 128-dimensional feature fusion signal PP^c；

q) feature fusion signal PP^cInputting the signals into an XGboost classifier to obtain a feature fusion signal PP^cRank the importance score of (2), and select the signal with the top 64 of the importance score rank as the characteristic signal PP₁ ^c。

2. The multi-modal data classification method based on deep learning and time series attention mechanism as claimed in claim 1, wherein: in the step a), a sliding window segmentation method is used for expanding the data set, and a five-fold cross validation method is used for dividing the data set into 5 different training sets and test sets.

3. The multi-modal data classification method based on deep learning and time series attention mechanism as claimed in claim 1, wherein: in step c) by the formula

Calculating to obtain a normalized electrocardiosignal x'_ecgIn the formula x_ecgFor training and testing the concentrated ECG signal u_ecgIs the mean value, σ, of the electrocardiosignal_ecgIs the variance of the electrocardiosignals.

4. The multi-modal data classification method based on deep learning and time series attention mechanism as claimed in claim 1, wherein the step d) comprises the following steps:

d-2) the second convolution branch comprises a convolution layer with 32 channels, convolution kernel size of 1 × 13 and step length of 1, a batch normalization layer and a ReLU activation layer in sequence, and the electrocardiosignal x 'after normalization in the training set'_ecgInputting the signal into a second convolution branch, and outputting to obtain a 32-dimensional characteristic signal E₂；

5. The method for multi-modal data classification based on deep learning and temporal attention mechanism according to claim 1, wherein step e) comprises the following steps:

e-1) the first convolutional coding module consists of a convolutional layer with the number of channels of 32 and the convolutional kernel size of 1 multiplied by 11, a batch normalization layer, a ReLU activation layer and a pooling layer with the size of 4 in sequence, and the characteristic signal X is converted into a characteristic signal₁Inputting the signal into a first convolution coding module, and outputting to obtain a 32-dimensional characteristic signal E₅；

sigma (-) is a sigmoid function,

is a time pattern matrix G^CLine i of (1), G^CConv1d (G), Conv1d (·) is a one-dimensional convolution operation, G is a hidden state matrix,

6. The method for multi-modal data classification based on deep learning and temporal attention mechanism according to claim 1, characterized in that: in the step g), the value of N is 150, the learning rate of the SGD optimizer is 0.001, the learning rate is attenuated to be 0.1 at every 80 periods, and the formula is used

Calculating to obtain a cross entropy loss function cc (x), wherein L is the number of categories, L is 2, f_i(x) To predict class f_ecgThe predictive tag of the ith category of (a),

to predict class f_ecgThe real category of the corresponding ith category; in the step m), the value of N is 180, the learning rate of the SGD optimizer is 0.001, the learning rate is attenuated to be 0.1 at every 90 periods, and the N is calculated according to a formula

as a prediction class f_pcgTrue category of the ith category of (1).

7. The method for multi-modal data classification based on deep learning and temporal attention mechanism according to claim 1, characterized in that: in step i) by the formula

8. The method for multi-modal data classification based on deep learning and time series attention mechanism as claimed in claim 1, wherein step j) comprises the following steps:

j-2) the second convolution branch is composed of convolution layer with 32 channel number, convolution kernel size of 1 × 11 and step length of 2, batch normalization layer and ReLU activation layer in sequence, and the training is performedCentralizing normalized heart sound signal x'_pcgInputting the signal into the second convolution branch, and outputting to obtain a 32-dimensional characteristic signal P₂；

j-4) the fourth convolution branch comprises a convolution layer with 32 channels, convolution kernel size of 1 × 5 and step size of 2, a batch normalization layer and a ReLU active layer in sequence, and the heart sound signal x 'after normalization in the training set'_pcgInputting the signal into the fourth convolution branch, and outputting to obtain a 32-dimensional characteristic signal P₄；

j-5) converting the characteristic signal P₁Characteristic signal P₂Characteristic signal P₃Characteristic signal P₄Performing feature cascade to obtain a cascaded 128-dimensional feature signal P ═ P₁,P₂,P₃,P₄]；

9. The multi-modal data classification method based on deep learning and time series attention mechanism as claimed in claim 1, wherein step k) comprises the following steps:

k-2) the second convolutional encoding module sequentially comprises a convolutional layer with the number of channels being 32 and the size of convolutional kernel being 1 × 11, a batch normalization layer and a ReLUAn active layer, a pooling layer of size 2, and a feature signal P₅Inputting the signal into a second convolutional coding module, and outputting to obtain a 32-dimensional characteristic signal P₆；

k-3) the third convolutional coding module sequentially comprises a convolutional layer with the channel number of 64 and the convolutional kernel size of 1 multiplied by 7, a batch normalization layer, a ReLU activation layer and a pooling layer with the size of 2, and a characteristic signal P is obtained₆Inputting the signal into a third convolutional coding module, and outputting to obtain a 64-dimensional characteristic signal P₇；

Calculating to obtain a characteristic signal Y₂。