Biomedical time sequence signal application method for multi-scale and multi-mode contrast learning
Technical Field
The invention relates to the technical field of biomedical time sequence signal application, in particular to a biomedical time sequence signal application method for multi-scale and multi-mode contrast learning.
Background
Deep learning models have proposed solutions for various biomedical timing signal applications, including physiological indices and basic cardiovascular risk prediction. To improve the performance of these tasks, researchers often utilize multivariate biological signals. This involves using signals of multiple modalities, such as co-predicting OSAHS in combination with photoplethysmography (Photoplethysmography, PPG) and blood oxygen saturation (oxygen saturation, spO 2); or researchers employ various derivative signals of a single-mode signal, such as a spectrogram or the first derivative of the signal. In early studies, machine learning methods were used for biological applications. However, these methods rely on a great deal of a priori knowledge to design manual features, while the accuracy is not high. Deep learning facilitates medical biological signal analysis by automatically extracting effective features, and can achieve higher detection accuracy without prior knowledge. Researchers have mainly employed three deep neural networks to realize biological applications: (1) A recurrent neural network (recurrent neural network, RNN); (2) a transducer; (3) Convolutional neural networks (convolutional neural network, CNN). However, RNN has better potential because of the gradient vanishing/explosion problem, which makes it difficult to train a high-precision bio-application model, and transform has difficulty in modeling bio-time series data due to the inductive bias of displacement invariance. To learn complex timing dynamics from biological timing signals (temporary dynamic), researchers often employ multi-scale CNN algorithms. And three parallel convolution modules with different convolution kernel sizes are adopted to independently learn the repeated characteristics of different frequencies in the PPG, so that the accuracy of respiratory rate detection is improved. Shen proposes an OSAHS detection method based on a multi-scale dilation convolutional neural network, which also adopts a plurality of parallel convolution modules, and is different from other works in that different dilation convolution factors are adopted to provide different receptive fields, so that long-term dependence in an Electrocardiograph (ECG) signal is better captured. Although both methods can effectively extract multi-scale information from time series, due to the additional introduction of multi-path convolution modules, their model parameters multiply, and when processing longer sequences, it is likely that training and deployment on limited computational resources is not possible because the data and model are too large. In addition, the current algorithm adopts point-wise (point-wise) input, each input point has no semantics, and the performance optimization space is small.
In addition to time domain features, it is also critical for biological applications to design specific algorithms for multi-modal signals to better extract multi-modal features and modal dependencies. First, some biological signals (e.g., PPG, ECG, etc.) are periodic strong signals, while other biological signals (e.g., spO2, acceleration signals, etc.) are trending strong signals, so the features that both need to be extracted are heterogeneous. Second, physicians often need to synthesize multi-modal signals to make a diagnosis: if OSAHS occurs, spO2 is often reduced, however, the patient often causes SpO2 to be reduced due to hypoxia, and whether the respiratory component in the PPG signal is significantly reduced (whether the envelope of the PPG becomes flat) needs to be observed to determine whether OSAHS is present. Therefore, the multi-modal signal characteristics can help the model to achieve better biological application performance, the multi-modal algorithm is designed to comprehensively consider the changes of the multi-modal signals, and the modal dependence information of the multi-modal signal characteristics and the multi-modal signal characteristics is extracted to improve the biological application performance.
At present, multi-mode algorithms for biological application are relatively less researched, some researchers decouple time domain dependence and mode dependence, and based on time domain features of different mode signals, explicit mode dependence is extracted, however, the methods all adopt the same encoder, so that heterogeneous single-mode features are difficult to extract. The Jia designs encoders for the EEG signals and the EEG signals to extract heterogeneous single-mode characteristics, and adopts simple channel attention and linear layers to fuse different single-mode characteristics, thereby realizing better sleep stage detection performance. However, heterogeneous single-mode features extracted by the method have different embedded spaces, and the model is difficult to fuse the heterogeneous features.
Disclosure of Invention
The invention aims to provide a biomedical time sequence signal application method for multi-scale and multi-mode contrast learning, which aims to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions: a biomedical time sequence signal application method for multi-scale and multi-mode contrast learning comprises the following steps:
s1, performing patch and mask operations of different scales on an original signal through a multi-scale time domain dependent extraction module to obtain a multi-scale sequence, and mining a multi-scale time sequence patterns;
s2, dividing the multi-mode signals into different groups;
s3, respectively inputting sequences of different groups into a multi-scale time domain convolutional neural network;
s4, aligning the features of different modes in a cross-mode contrast learning mode;
S5, splicing the features of different modes, and forming a final multi-mode representation through average pooling;
S6, extracting characteristics related to downstream specific biological signal application through a downstream task projection module;
S7, classifying biological signals into classification tasks and regression tasks, wherein the classification tasks adopt cross entropy function training models, and the regression tasks adopt mean square error training models;
the multi-scale time domain dependency extraction module adopts patches with different lengths to obtain input mark scales with different semantic information, reduces time and space complexity, and adopts masks with different proportions to generate different context visual angles;
The downstream task projection module adopts two layers of full-connection layers, firstly maps the extracted high-dimensional representation to a low-dimensional space, then adopts a linear rectification function as an activation function, and for a classification task, the number of neurons of an output layer is the same as the number of categories, and for a regression task, the number of neurons of the output layer is 1.
As a preferred embodiment of the present invention: and S5, extracting heterogeneous single-mode characteristics by adopting a mode independent and grouping mode by the multi-mode learning framework, and fusing the single-mode characteristics by adopting cross-mode contrast learning.
As a preferred embodiment of the present invention: the multi-scale time domain dependent extraction module designs a multi-scale feature extraction algorithm based on different convolution kernel sizes, different patch lengths and different mask ratios.
As a preferred embodiment of the present invention: the encoder of the multi-scale time domain dependent extraction module is a time domain convolutional neural network encoder and adopts a plurality of parallel time domain convolutional neural network encoders;
the parallel time domain convolutional neural network encoder adopts different convolution kernel sizes and convolution layer numbers, so that information with different scales can be extracted from the sequence;
the parallel time domain convolutional neural network encoder performs patch operations with different lengths on an input sequence, and the different patch lengths are converted into symbols with semantic information at semantic-free time points under different scales, so that a multi-scale view angle is provided;
The parallel time domain convolutional neural network encoder performs different proportions of mask operations on the input sequence, encourages the encoder to learn context dependencies from the visible sequence, and simultaneously different mask proportions provide context perspectives of different scales.
As a preferred embodiment of the present invention: the multi-modal learning framework comprises a modal independent module, a modal grouping module and a cross-modal contrast learning module;
The modal independent module is used for independently extracting different modal biological signals by adopting encoders with different parameters to obtain heterogeneous characteristics;
The modal grouping module is used for grouping biological signals of the same type together, extracting homogeneous characteristics by adopting the same encoder, and avoiding redundancy and overfitting caused by adopting different encoders; adopting a mode independent method for different types of biological signals, and respectively adopting encoders with different parameters to model so as to obtain heterogeneous characteristics;
the cross-modal contrast learning module is used for drawing different modal characterizations of the same sample to the same hidden space, and learning modal dependence of different modalities in the process.
As a preferred embodiment of the present invention: the multi-modal characterization selects (x i1,xi2) as the positive sample pair, (x i,xj) as the negative sample pair, (x i1,xi2) as a representation of the first and second modal signals of the ith sample, and x j as any modality in the jth sample. The cross-modal contrast loss of any two modalities is defined by:
as a preferred embodiment of the present invention: the regression tasks include respiratory rate and exercise heart rate assessment.
As a preferred embodiment of the present invention: the classification tasks include action recognition and sleep apnea and hypopnea syndrome detection.
Compared with the prior art, the invention has the beneficial effects that: firstly, a novel multi-scale algorithm is provided, namely, an original signal is subjected to patch and mask operations of different scales to obtain a multi-scale sequence, multi-scale time sequence patterns are mined, and meanwhile, the operation complexity of the multi-scale algorithm is effectively reduced; secondly, a multi-modal biological signal learning framework is provided, modal independence, grouping and cross-modal contrast learning are adopted to learn richer heterogeneous characterization from multi-modal signals, and modal dependence is utilized to improve model performance.
Drawings
FIG. 1 is a diagram of an algorithm framework proposed by the present invention;
fig. 2 is a multi-scale time domain dependent extraction module according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1 to 2, the present invention provides a technical solution: a biomedical time sequence signal application method for multi-scale and multi-mode contrast learning comprises the following steps:
S1, performing patch and mask operations on an original signal through a multi-scale time domain dependent extraction module to obtain a multi-scale sequence, and mining a multi-scale time sequence patterns. In order to extract the features of different receptive fields, a multi-scale feature extraction algorithm uses parallel encoders with different scales, namely, the input sequences respectively pass through convolution networks with different convolution kernel sizes, different patch lengths and different mask ratios to extract the features with different scales;
s2, dividing the multi-mode signals into different groups;
s3, respectively inputting different groups of sequences into different multi-scale time domain convolutional neural networks;
s4, aligning the features of different modes in a cross-mode contrast learning mode;
s5, splicing the features of different modes, and obtaining final output through an average pooling layer;
S6, extracting characteristics related to downstream specific biological signal application through a downstream task projection module;
S7, classifying biological signal applications into classification tasks and regression tasks, wherein the classification tasks adopt cross entropy function training models, and the regression tasks adopt mean square error training models;
the multi-scale time domain dependency extraction module adopts patches with different lengths to obtain input mark scales with different semantic information, reduces time and space complexity, and adopts masks with different proportions to generate different context visual angles;
The downstream task projection module adopts two layers of full-connection layers, firstly maps the extracted high-dimensional representation to a low-dimensional space, then adopts a linear rectification function as an activation function, and for a classification task, the number of neurons of an output layer is the same as the number of categories, and for a regression task, the number of neurons of the output layer is 1.
Statistical graphs of bioapplication datasets verifying the validity of the present invention:
Data set |
Training data volume |
Test data volume |
Length of |
Number of modes |
Respiration rate |
9090 |
/ |
512 |
2 |
Exercise heart rate |
1768 |
1428 |
1000 |
5 |
Motion recognition |
7352 |
2947 |
128 |
9 |
OSAHS |
267001 |
82031 |
1920 |
2 |
Wherein the respiration rate data set is divided into a training set and a test set according to patients and adopts a cross validation;
performance profiles of the invention in different biological applications:
The multi-modal characterization in S5 adopts a mode independent and grouping mode to extract heterogeneous single-modal characteristics, and adopts cross-modal contrast learning to fuse the single-modal characteristics.
The multi-scale time domain dependent extraction module designs a multi-scale feature extraction algorithm based on different convolution kernel sizes, different patch lengths and different mask proportions.
The encoder of the multi-scale time domain dependent extraction module is a time domain convolutional neural network encoder, and different time domain convolutional neural network encoders adopt different convolution kernel sizes and lengths to extract features of different scales;
Different time domain convolutional neural network encoders perform different patch operations on an input sequence, convert different patch lengths into symbols with semantic information at time points without semantics under different scales, provide multi-scale view angles, and have the following advantages compared with point-by-point (patch-wise) convolution: patch enhances sequence locality, enabling models to learn potential patterns in time series more efficiently by processing input sequences with richer semantics, and Patch reduces temporal and spatial complexity, enabling longer time series to be processed with limited computational resources. By applying a patch operation of length L to a sequence of length N, the sequence length is reduced to about Thereby reducing computational and storage complexity by L square;
Different time domain convolutional neural network encoders generate different context view angles by adopting different mask ratios, thereby enhancing learning ability of a model on context time domain dependence of different scales Where m.epsilon.0, 1, the model may be encouraged to learn stronger context dependencies because the model has to learn features from the unmasked signal.
The multi-modal learning framework comprises a modal independent module, a modal grouping module and a cross-modal contrast learning module;
The modal independent module is used for independently extracting features from different modal biological signals, and the independently extracted single-modal features are connected with each other to obtain fusion features. The biological signals of different modes have unique characteristics, so that potential patterns in each mode are highly heterogeneous, the mixed modeling of the signals of different modes or the modeling of the signals of single modes by using encoders sharing parameters is difficult to extract heterogeneous multi-mode characteristics, the biological signals of different modes are respectively modeled by adopting a mode independent method, firstly, the characteristics are respectively extracted from the signals of different modes by adopting a plurality of single-mode encoders with the same structure but different parameters, and then, the characteristics of different single modes are mutually connected to obtain fusion characteristics, so that the heterogeneous characteristics in each mode can be well captured;
the modal grouping module is used for grouping biological signals of the same type together, and extracting homogeneous characteristics by adopting the same encoder, so that redundancy and overfitting of using different encoders are avoided; adopting a mode independent method for different types of biological signals, and respectively adopting encoders with different parameters to model so as to obtain heterogeneous characteristics;
The cross-modal contrast learning module is used for drawing different modal characterizations of the same sample to the same hidden space, and learning modal dependence of different modalities in the process has two main purposes: alignment heterogeneous single-mode characterization: the heterogeneous single-mode characterization is aligned to the same hidden space, so that a subsequent downstream task projection module can be trained better, and better downstream task performance is obtained; extracting modality dependence: in the process of aligning different modal characterization, the model can fully learn the correlation dependency relationship among different modal signals, so that the overall performance is improved.
Wherein the multi-modal characterization selects (x i1,xi2) as a positive sample pair, (x i,xj) as a negative sample pair, (x i1,xi2) as a representation of the first and second modal signals of the ith sample, x j as any modality in the jth sample, the modal contrast loss being defined by:
Among other things, regression tasks include respiratory rate and exercise heart rate assessment.
Wherein the classification tasks include motion recognition and sleep apnea and hypopnea syndrome detection.
A biomedical time sequence signal application method for multi-scale and multi-mode contrast learning comprises the following steps:
S1, mining multi-scale time sequence patterns based on different convolution kernel sizes, different patch lengths and different mask proportions by a multi-scale time domain dependent extraction module, capturing features of different scales by using a plurality of parallel time domain convolution neural networks as encoders by the multi-scale time domain dependent extraction module, and then adding to form multi-scale features; different time domain convolutional neural networks capture context dependence of different scales by adopting different convolution kernel sizes, convert semantic-free time points of patches of different lengths under different scales into symbols with semantic information, provide multi-scale visual angles, reduce time and space complexity, and generate different context visual angles by adopting masks of different proportions;
S2, dividing the multi-mode signals into different groups, grouping biological signals of the same type together, and extracting homogeneous characteristics by adopting the same multi-scale encoder; adopting a mode independent method for different types of biological signals, and respectively adopting encoders with different parameters to model so as to obtain heterogeneous characteristics;
S3, aligning the characteristics of different modes in a cross-mode contrast learning mode, selecting (x i1,xi2) as a positive sample pair, (x i,xj) as a negative sample pair, (x i1,xi2) as representations of first and second mode signals of an ith sample, and x j as any mode in the jth sample, wherein the definition formula of the mode contrast loss is as follows:
S4, extracting relevant characteristics of downstream specific biological signal application through a downstream task projection module, namely, through an average pooling and a full-connection layer, wherein a linear rectification function is adopted as an activation function between the full-connection layer;
S5, classifying the biological signals into classification tasks and regression tasks, wherein the classification tasks adopt cross entropy function training models, the regression tasks adopt mean square error training models, the regression tasks comprise respiratory rate and exercise heart rate assessment, and the classification tasks comprise action recognition and OSAHS detection.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.