CN114881105A

CN114881105A - Sleep staging method and system based on transformer model and contrast learning

Info

Publication number: CN114881105A
Application number: CN202210309836.4A
Authority: CN
Inventors: 柏杨; 赵亮; 丁长兴; 杨逸飞; 刘冠洲
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-08-09

Abstract

The invention discloses a method and a system for sleep staging based on a transformer model and contrast learning, wherein the method comprises the following steps: preprocessing physiological signal data, merging data frames, dividing a data set, performing data enhancement processing and initializing a transformer model; constructing a sleep stage feature extraction neural network based on a transformer, establishing a loss function and a back propagation model by using a self-supervision contrast learning method, pre-training the sleep stage feature extraction network, and adding a fully-connected network at the rear end of the pre-trained sleep stage feature extraction network for supervised training; adding a bidirectional long-time and short-time memory network at the rear end of the sleep stage feature extraction network for supervised training to perform supervised training; and training to obtain a sleep staging model, and inputting the test data set into the trained sleep staging model to obtain a classification result. The invention improves the accuracy of sleep staging.

Description

Sleep staging method and system based on transformer model and contrast learning

Technical Field

The invention relates to the technical field of sleep processing, in particular to a method and a system for sleep staging based on a transformer model and contrast learning.

Background

Currently, the sleep staging method adopted in the prior art is based on Polysomnography (PSG) monitoring, which requires that patients receive PSG monitoring overnight in professional sleep laboratories, the operation is cumbersome, and the generated medical data needs to be labeled by technicians trained professionally, and a skilled technician can label only 3 patients' information a day. So it often happens that a hospital sleeping department collects a large amount of patient PSG signals but does not have manpower to label. Therefore, in order to realize the substitution and automation of electroencephalogram classification, a mainstream sleep staging mode in the academic world is to train a neural network model capable of performing sleep staging by using a supervised learning method, but the neural network model is limited by high-quality labeled data which is difficult to obtain, and a model with high accuracy and good robustness is difficult to train at present.

In the deep learning field, the quantity and quality of data have great influence on the performance of a deep learning model. In the field of physiological signal classification represented by sleep stages, data with accurate labels is very lacking, and corresponding unlabeled data is not lacking. Therefore, how to develop a set of practical and feasible method can train a high-quality model for sleep staging by using massive non-tag data and a small amount of tag data.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention provides a sleep staging method and system based on a transformer model and contrast learning, an encoder component of the transformer model is reserved, and the realized self-attention mechanism can fully learn the characteristic representation of a physiological signal; compared with the method that the two-step migration method is used for comparative learning, the accuracy of sleep staging is improved in the condition that training set samples with high-quality labels are few and compared with the condition that data with labels are directly input into the network learning.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a sleep staging method based on a transformer model and contrast learning, which comprises the following steps of:

preprocessing physiological signal data, merging data frames according to a set length, dividing the data frames into a label-free pre-training data set, a labeled training data set and a test data set,

performing data enhancement processing on the physiological signal data subjected to data frame combination, and initializing a transformer model;

constructing a transducer-based sleep stage feature extraction neural network, which comprises a backbone layer and a mapping layer, wherein the backbone layer is composed of a transducer model and is used for extracting deep semantic information of physiological signal data, and two fully-connected network layers and a nonlinear activation layer are arranged behind the transducer model network and are used as the mapping layer for extracting the deep semantic information of the physiological signal again;

establishing a loss function and a back propagation model by using a self-supervision contrast learning method, and pre-training a sleep stage feature extraction network by using a label-free pre-training data set, wherein the pre-training operation is used for extracting the context dependence of physiological signal data;

adding a full-connection network at the rear end of the pre-trained sleep stage feature extraction network, performing transfer learning based on supervised learning, and performing supervised training by adopting a marked training data set;

adding a bidirectional long-time memory network at the rear end of the supervised training sleep stage characteristic extraction network, performing one-step transfer learning, and performing supervised training by adopting a marked training data set;

training to obtain a sleep staging model, determining final model parameters, and inputting the test data set into the sleep staging model to obtain a classification result.

As a preferred technical solution, the preprocessing the physiological signal data and merging the data frames according to a set length includes:

setting a cutting length, cutting physiological signal data into a plurality of sections of physiological signal segments, deleting redundant channels, and endowing a label for each frame of physiological signal segment according to clinical diagnosis rules;

randomly combining the physiological signal data frames, performing frame splicing operation, and taking the tag value of the combined intermediate frame as a new tag;

and filtering and normalizing the physiological signal data.

As a preferred technical scheme, the method for establishing the loss function and the back propagation model by using the self-supervision contrast learning method and pre-training the sleep stage feature extraction network by using the unmarked pre-training data set comprises the following specific steps:

the original physiological signal data is obtained into different physiological signal data by a data enhancement method, the enhanced physiological signal data is input into a sleep stage feature extraction network to calculate a feature vector, the similarity of the two extracted feature vectors is calculated through the similarity, a loss function is formed, and the value of the loss function is minimized by a method of back propagation and gradient reduction.

the signal data forms original data and enhanced data, and two data enhancement functions are set as t _i And t _j The enhanced data is v _i And v _j ：

v _i ＝t _i (X)，v _j ＝t _j (X)

The enhanced data is v _i And v _j Extracting neural networks by two same characteristics, wherein each network consists of a backbone layer and a mapping layer, and f is used for extracting neural networks _θ 、g _θ Represents;

enhanced data v _i The data obtained by calculation after input are respectively y _i 、z _i Two values, enhancement data v _j After inputting meterThe calculated data are respectively y _j 、z _j Two values, specifically expressed as:

y _i ＝f _θ (v _i )，y _j ＝f _θ (v _j )

z _i ＝g _θ (y _i )，z _j ＝g _θ (y _j )

performing similarity calculation s _i,j And data pair loss calculation l (i, j):

wherein N is the number of batches input into the neural network every time, and tau is a scaling factor;

calculating the similarity between every two samples, and calculating a loss value to obtain L:

the parameter update formula of the feature extraction neural network is expressed as:

wherein the optimizer is an optimizer, theta is a neural network parameter, eta is a learning rate,

which means that a gradient operation is performed on theta.

As a preferred technical scheme, a bidirectional long-time memory network is added at the rear end of a supervised training sleep stage feature extraction network, and a data enhancement step is also provided, wherein the data enhancement adopts any one or more of time domain translation, frequency domain translation, Gaussian noise addition, pulse waveform addition, cutting or speed change.

The invention also provides a sleep staging system based on a transformer model and contrast learning, which comprises: the system comprises a physiological signal data preprocessing module, a data frame merging module, a data dividing module, a data enhancing module, a transformer model initializing module, a sleep stage feature extraction neural network constructing module, a pre-training module, a first transfer learning module, a second transfer learning module, a training model output module and a classification result output module;

the physiological signal data preprocessing module is used for preprocessing physiological signal data;

the data frame merging module is used for merging the data frames according to a set length;

the data partitioning module is used for partitioning the data set into a label-free pre-training data set, a labeled training data set and a test data set,

the data enhancement module is used for carrying out data enhancement processing on the physiological signal data subjected to data frame combination;

the transformer model initialization module is used for initializing a transformer model;

the sleep stage feature extraction neural network construction module is used for constructing a transform-based sleep stage feature extraction neural network, and comprises a backbone layer and a mapping layer, wherein the backbone layer is formed by a transform model and is used for extracting deep semantic information of physiological signal data, and two fully-connected network layers and a nonlinear activation layer are arranged behind the transform model network and are used as the mapping layer for extracting the deep semantic information of the physiological signal again;

the pre-training module is used for establishing a loss function and a back propagation model by using a self-supervision contrast learning method, and pre-training a sleep stage feature extraction network by using a label-free pre-training data set, wherein the pre-training operation is used for extracting the context dependence of physiological signal data;

the first transfer learning module is used for adding a full-connection network at the rear end of the pre-trained sleep stage feature extraction network, performing transfer learning based on supervised learning, and performing supervised training by adopting a marked training data set;

the second transfer learning module is used for adding a bidirectional long-time memory network at the rear end of the supervised training sleep stage feature extraction network, performing one-step transfer learning and performing supervised training by adopting a marked training data set;

the training model output module is used for training to obtain a sleep staging model and determining final model parameters,

and the classification result output module is used for inputting the test data set into the sleep staging model and outputting a classification result.

As a preferred technical solution, the physiological signal data preprocessing module is configured to preprocess physiological signal data, and includes the specific steps of:

and filtering and normalizing the physiological signal data.

As a preferred technical scheme, the pre-training module is used for establishing a loss function and a back propagation model by using a self-supervision comparison learning method, and pre-training a sleep stage feature extraction network by using a label-free pre-training data set, and the method specifically comprises the following steps:

v _i ＝t _i (X)，v _j ＝t _j (X)

enhanced data v _i The data obtained by calculation after input are respectively y _i 、z _i Two values, enhancement data v _j The data obtained by calculation after input are respectively y _j 、z _j Two values, specifically expressed as:

y _i ＝f _θ (v _i )，y _j ＝f _θ (v _j )

z _i ＝g _θ (y _i )，z _j ＝g _θ (y _j )

wherein N is the number of batches input into the neural network each time, and tau is a scaling factor;

which means that a gradient operation is performed on theta.

As a preferred technical solution, the second migration learning module is further provided with a data enhancement module, and the data enhancement module adopts any one or more of time domain translation, frequency domain translation, gaussian noise addition, pulse waveform addition, clipping and speed change.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) the invention reserves the encoder component of the transformer model, the self-attention mechanism of the encoder component can fully learn the feature representation of the physiological signal, the two-step transfer learning is carried out by using the long-time memory network, the time sequence correlation information of the physiological signal can be learned, and the accuracy of sleep staging is improved to a certain extent.

(2) The invention adopts the technical scheme of comparative learning, solves the technical problem of low model accuracy after supervised learning under the condition of less training set samples of high-quality labels, achieves the effect of supervised learning of a large amount of labeled data by using a small amount of labeled data, and can exceed the effect of supervised learning of a common convolutional neural network if a large amount of labeled data is used for transfer learning.

Drawings

FIG. 1 is a schematic flow chart of a method for sleep staging based on a transformer model and comparative learning according to the present invention;

FIG. 2 is a flow chart of the sleep staging feature extraction neural network pre-training of the present invention;

FIG. 3 is a schematic diagram of a sleep stage feature extraction neural network pre-training framework principle according to the present invention;

FIG. 4 is a schematic diagram of a process of neural network training based on sleep stage feature extraction with supervised learning according to the present invention;

FIG. 5 is a schematic diagram of a neural network training framework principle of sleep stage feature extraction based on supervised learning according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

As shown in fig. 1, the present embodiment provides a sleep staging method based on a transform model and comparative learning, which includes the following specific steps:

the method comprises the following steps: acquiring physiological signal data to be processed, deleting redundant channels, reserving available signals for framing, and merging data frames according to a specified length;

in the present embodiment, the physiological signal includes, but is not limited to, a PSG signal, an electrocardiographic signal, or an electroencephalographic signal;

in the first step, acquiring physiological signals, deleting channels, performing framing processing, and performing data preprocessing operation of frame merging, the specific method comprises:

PSG physiological signals of sleep monitoring are obtained and cut into physiological signal segments of each frame for 30 seconds, and labels are given to the physiological signal segments of each frame according to clinical diagnosis rules.

Acquiring a PSG physiological signal of sleep monitoring, wherein two to five stable signal channels need to be ensured;

acquiring PSG physiological signals of sleep monitoring, randomly combining fragments of one array every 30 seconds according to a specified length to perform frame splicing operation, and taking a label value of an intermediate frame after combination as a new label.

The physiological signal is divided into a label-free pre-training dataset, a labeled training dataset, and a test dataset.

Filtering the signal by using a band-pass filter, and then normalizing the obtained physiological signal before inputting the signal into a neural network;

in this embodiment, a PSG physiological signal to be processed is acquired and preprocessed, and during preprocessing, the method adopted includes any one of the following: band-pass filter filtering, LMS adaptive filtering, wiener filtering, filtering methods based on statistical models, filtering methods based on neural networks, maximum and minimum normalization, sigmoid normalization, and the like.

Step two: performing enhancement operation on the merged physiological signal data, and initializing a transform model by using the enhanced data, wherein the specific steps comprise:

and acquiring a complete section of electroencephalogram physiological signal, and copying to generate the same section of data signal. Different enhancement operations are respectively carried out on the two sections of data, and a method that one data is enhanced and the other data is not enhanced can be adopted to enhance the inherent consistency of the data before and after the enhancement, and initialize the transform model. When enhancing, the method adopted comprises any one of the following methods: cropping zoom, cropping zoom flip, color perturbation, rotating image, cropping remove image, adding gaussian noise, gaussian blur, Sobel filtering, etc.

Two different signals are required in the embodiment, two different signals can be obtained by performing different enhancement on two data, one different signal can be obtained by performing data enhancement and the other different signal can be obtained by performing no data enhancement, and the two sections of generated data are respectively used in the subsequent data processing and signal extraction processes.

Step three: constructing a transform-based network model, and extracting features by using an encoding layer for realizing a multi-head self-attention mechanism, wherein the method specifically comprises the following steps:

the sleep stage characteristic extraction neural network is divided into two parts, wherein the first bone stem layer is composed of a transformer. The Transformer is a coding layer network for realizing a multi-head self-attention mechanism, a multi-layer stacking structure can well capture long-range dependence of data such as image text physiological signals, and an automatic sleep staging model capable of being used for extracting high-level representation information of the physiological signals is built by utilizing an attention layer, an activation layer, a discarding layer and a batch standardization layer, so that after a section of physiological signals are input, the network can extract deep semantic information of the section of physiological signals. The second part is connected with two fully-connected network layers and a nonlinear activation layer as a mapping layer at the rear end of a transducer network, can extract information of deep semantic information of a physiological signal again, performs nonlinear feature mapping on a feature vector output by the transducer, removes data enhancement related information, and retains semantic information of essence of an input signal;

in the present embodiment, the attention layer: and calculating the weight of each channel of the input signal and carrying out different weighting on each channel, so that the network extracts characteristic information which is more helpful to classification.

An active layer: a nonlinear function is set, the nonlinear fitting capability of the network is increased, and a Rectified rectifying Unit (ReLU) function is used to achieve the nonlinear effect.

Discarding the layer: and setting a drop out function, and carrying out a certain number of signal values zero on the signals input to the discarding layer, wherein the number is determined by the probability p in the drop out function parameters, and the general p is 50%.

Batch standardization layer: setting the batch normaize function to normalize all data in one batch to a normal distribution enables the data to be distributed in areas where the gradient of the activation function is large, thus speeding up the training of the network model.

Step four: establishing a loss function and a back propagation model by using a self-supervision contrast learning method, pre-training a sleep stage feature extraction network, and performing pre-training operation to extract the context dependence of physiological signals;

as shown in fig. 2, the original physiological signal data is processed by a data enhancement method to obtain different physiological signal data, the enhanced physiological signal data is input to a sleep stage feature extraction network to calculate feature vectors, the similarity of the two extracted feature vectors is calculated through the similarity, a loss function is formed, and the value of the loss function is minimized through a back propagation and gradient descent method.

In the embodiment, a large amount of label-free PSG physiological signal data is used as a sample training feature extraction network, a data set formed by a data enhancement method is used as a positive sample data set, deep extraction of feature meaning and integration and refining of high-level abstract information are realized through a neural network, so that feature vectors are formed, and further feature extraction and dimension reduction are performed through neural network mapping. Then, the similarity of the two extracted feature vectors is calculated through the similarity, a loss function is formed, then the numerical value of the loss function is minimized through a method of back propagation and gradient descent, the model is gradually optimized, the trained network is used for subsequent tasks such as supervised classification and feature analysis, and the specific steps comprise:

as shown in fig. 3, the signal data is frame-spliced, i.e. a frame of data is spliced with the previous and next data frames to form new data, so that the timing information can be reflected in each frame. Splicing every three frames into input X (t) of the neural network, and taking second frame data as input X (t) and a label Y (t) matched with the input X (t), wherein formulas are shown as (1) and (2).

X(t)＝[x(t-1)，x(t)，x(t+1)] (1)

Y(t)＝y(t) (2)

Wherein x is original data divided into one frame by 30 seconds, x (t) is the t-th frame of the original data, and y (t) is a label corresponding to the t-th frame of the original data.

Original data and enhanced data are formed for the signal data, and by taking a spliced frame of new data as an example, the original data is recorded as X, and two data enhancement functions are recorded as t _i And t _j The enhanced data is v _i And v _j As shown in equation (3):

v _i ＝t _i (X)，v _j ＝t _j (X) (3)

the enhanced data is v _i And v _j Respectively extracting neural networks by two same characteristics, wherein each network consists of a bone stem layer and a mapping layer which are respectively formed by f _θ 、g _θ To represent, data v _i The data obtained by calculation after input are respectively y _i 、z _i Data v _j The data obtained by calculation after input are respectively y _j 、z _j Two values, as shown in equation (4) and equation (5).

y _i ＝f _θ (v _i )，y _j ＝f _θ (v _j ) (4)

z _i ＝g _θ (y _i )，z _j ＝g _θ (y _j ) (5)

Followed by similarity calculation s _i,j And data pair loss calculations l (i, j), as shown in notations (6) and (7):

wherein N is the number of batches input into the neural network each time, and tau is a scaling factor.

And (5) calculating the similarity between every two samples, and calculating a loss value to obtain L, as shown in a public display (8).

Parameter update notice (9) of the neural network,

representing gradient calculation of theta;

therefore, the model can be timely updated and effectively converged. Discarding g after the update operation is complete _θ Using only neural networks f _θ ；

And performing subsequent parameter initialization, wherein the pre-training model after initialization can be further used for data classification work of subsequent transfer learning.

Step five: and (3) performing transfer learning on the pre-trained model, namely adding a full connection layer at the rear end of the sleep stage feature extraction network based on a supervised learning principle and training by using a small amount of training set data.

In the fifth step, the full-connection network is added to the rear end of the sleep stage feature extraction network and the parameters are finely adjusted, and the method specifically comprises the following steps:

as shown in fig. 4 and 5, a fully-connected network is added after a pre-trained feature extraction network to form a network with classification capability, the fully-connected network performs nonlinear transformation of feature vectors, the feature extraction network performs supervised training on a model by using physiological signals in labeled training data, and a classifier is trained to enable the whole network model to complete sleep staging tasks. All model parameters are set to be gradient variable, and meanwhile, appropriate hyper-parameters such as an optimizer, a learning rate, training fragment quantity, training round and the like are set. And training the model, and calculating indexes such as accuracy, recall rate, accuracy rate and the like of the model on the test data set until the model converges and the accuracy of the test set is stable.

Step six: and performing one-step migration learning on the basis of the model, using the same labeled data, adding a sleep stage feature extraction network of the fully-connected network at the rear end in the step five, adding a bidirectional long-short-term memory network BilSTM at the rear end, training, splicing the forward feature vectors and the backward feature vectors by the bidirectional long-short-term memory network BilSTM, inputting the spliced forward feature vectors and backward feature vectors into a classifier, and combining forward time sequence information and backward time sequence information to help the classifier to perform better classification.

In the sixth step, a bidirectional long-time memory network BilSTM is added at the rear end of the sleep stage feature extraction network, and then transfer learning is carried out, and the specific method is as follows:

and adding a bidirectional long-time and short-time memory network BilSTM after the characteristic extraction network, learning by using labeled training data, carrying out supervised training on the model by using the same data to carry out physiological signals, setting all model parameters to be gradient invariant, and simultaneously setting appropriate hyper-parameter training models such as an optimizer, a learning rate, a training fragment quantity, a training round and the like.

In this embodiment, a bidirectional long-and-short term memory network BiLSTM is added after the feature extraction network, and supervised training is performed, and in this supervised classification task, in order to increase the model generalization and robustness and improve the accuracy, a data enhancement method can be used, and the method includes, but is not limited to, time domain translation, frequency domain translation, gaussian noise enhancement, pulse waveform addition, clipping, speed change and the like.

Step seven: training to obtain a sleep staging model, and determining final model parameters; the test data set is processed by the same data preprocessing method before being used, and is input into the sleep staging model to obtain a classification result and calculate indexes such as accuracy, precision, recall rate and the like until the model converges and the accuracy of the test set is stable, so that the data set with the accuracy of 80.5% is finally obtained in the embodiment.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A sleep staging method based on a transformer model and contrast learning is characterized by comprising the following steps:

2. The method for sleep staging based on transform model and contrast learning of claim 1, wherein the preprocessing of physiological signal data and the merging of data frames according to a set length comprises:

and filtering and normalizing the physiological signal data.

3. The transform model and contrast learning-based sleep staging method according to claim 1, wherein the self-supervised contrast learning method is used to establish a loss function and a back propagation model, and a label-free pre-training dataset is used to pre-train the sleep staging feature extraction network, and the specific steps include:

the original physiological signal data is obtained into different physiological signal data by a data enhancement method, the enhanced physiological signal data is input into a sleep stage feature extraction network to calculate a feature vector, the similarity of the two extracted feature vectors is calculated through the similarity, a loss function is formed, and the value of the loss function is minimized by a back propagation and gradient reduction method.

4. The method for sleep staging based on transform model and comparative learning of claim 3, wherein the method for sleep staging based on self-supervised comparative learning is used to establish a loss function and a back propagation model, and a sleep staging feature extraction network is pre-trained using a label-free pre-training dataset, and the specific steps include:

v _i ＝t _i (X)，v _j ＝t _j (X)

y _i ＝f _θ (v _i )，y _j ＝f _θ (v _j )

z _i ＝g _θ (y _i )，z _j ＝g _θ (y _j )

which means that a gradient operation is performed on theta.

5. The method for sleep staging based on transform model and contrast learning according to claim 1, characterized in that a bidirectional long-and-short time memory network is added at the back end of the supervised training sleep staging feature extraction network, and a data enhancement step is further provided, wherein the data enhancement adopts any one or more of time domain translation, frequency domain translation, gaussian noise enhancement, pulse waveform enhancement, clipping or speed change.

6. A system for sleep staging based on a transform model and contrast learning, comprising: the system comprises a physiological signal data preprocessing module, a data frame merging module, a data dividing module, a data enhancing module, a transformer model initializing module, a sleep stage feature extraction neural network constructing module, a pre-training module, a first transfer learning module, a second transfer learning module, a training model output module and a classification result output module;

7. The transform model and contrast learning based sleep staging system of claim 6, wherein the physiological signal data preprocessing module is configured to preprocess physiological signal data, and the specific steps include:

and filtering and normalizing the physiological signal data.

8. The transform model and contrast learning-based sleep staging system according to claim 6, wherein the pre-training module is configured to establish a loss function and a back propagation model by using an auto-supervised contrast learning method, and pre-train the sleep staging feature extraction network by using a label-free pre-training dataset, and the specific steps include:

9. The transform model and contrast learning-based sleep staging system according to claim 8, wherein the pre-training module is configured to establish a loss function and a back propagation model by using an auto-supervised contrast learning method, and pre-train the sleep staging feature extraction network by using a label-free pre-training dataset, and the specific steps include:

v _i ＝t _i (X)，v _j ＝t _j (X)

y _i ＝f _θ (v _i )，y _j ＝f _θ (v _j )

z _i ＝g _θ (y _i )，z _j ＝g _θ (y _j )

which means that a gradient operation is performed on theta.

10. The fransformer model and contrast learning based sleep staging system of claim 6, wherein the second migratory learning module is further configured with a data enhancement module, the data enhancement using any one or more of time domain translation, frequency domain translation, gaussian noise, pulse shape, clipping, or variable speed.