CN113158964A

CN113158964A - Sleep staging method based on residual learning and multi-granularity feature fusion

Info

Publication number: CN113158964A
Application number: CN202110494434.1A
Authority: CN
Inventors: 段立娟; 李梦颖; 乔元华; 张文博; 苗军
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2021-07-23
Anticipated expiration: 2041-05-07
Also published as: CN113158964B

Abstract

The invention provides a sleep staging method based on residual learning and multi-granularity feature fusion, and belongs to the field of signal processing and pattern recognition. Firstly, preprocessing an original sleep electroencephalogram signal to obtain a plurality of sleep electroencephalogram signal data samples, wherein each data sample comprises data of N channels. And then, HHT is respectively carried out on the N channel data of each data sample to obtain time-frequency matrix characteristics of the N electroencephalogram data. And finally, sending the time-frequency matrix characteristics of all samples into a sleep staging network based on residual learning and multi-granularity characteristic fusion to finish multi-granularity characteristic extraction and classification tasks of the sleep electroencephalogram signals. The method can efficiently fuse different granularity characteristics on the self-adaptive extracted sleep electroencephalogram time-frequency characteristics, and experiments prove that the network training speed provided by the invention is high, the dependence on expert priori knowledge is less, and the classification performance and efficiency of the existing automatic sleep staging method are effectively improved.

Description

Sleep staging method based on residual learning and multi-granularity feature fusion

Technical Field

The invention relates to the field of signal processing and pattern recognition, in particular to a sleep electroencephalogram signal processing and multi-granularity feature extraction and fusion classification method.

Background

The sleep staging is an important step in the process of sleep quality grading or sleep-related disease diagnosis, and professional doctors need to observe sleep data of patients all night and classify and evaluate sleep stages, which is undoubtedly a diagnosis work which is time-consuming and labor-consuming. With the continuous development of artificial intelligence, the research on improving the working efficiency of doctors by using the computer-aided diagnosis and treatment technology is also in depth. For tasks with heavy workload such as sleep staging, the development and application of an automatic sleep electroencephalogram signal analysis method are necessary to reduce the workload of doctors. An electroencephalogram (EEG) is a kind of bioelectrical signal data that records brain activity and can reflect electrophysiological activity of human brain nerve cells. At present, the study on electroencephalogram signals is one of the hot and leading directions in academia, and mainly comprises the aspects of signal acquisition, signal preprocessing, feature extraction and feature classification. However, the traditional electroencephalogram feature extraction and feature selection based on manual design have great influence on the recognition result of signals, so that the automatic feature extraction and feature classification are concerned in electroencephalogram research.

In some existing automatic sleep staging research methods, the classification means of sleep data are mainly divided into two types: the classification method comprises the steps of directly utilizing a deep learning network to learn preprocessed time series sample data and classify the sample data, utilizing a signal processing algorithm to extract a primary feature from the preprocessed data, and utilizing the deep learning network to further learn high-order features and then classify the data. Since the sleep sample data is a timing signal, in the first classification means, the characteristics learned by the network often reflect the timing information of the signal to a greater extent. However, the significant characteristics of the electroencephalogram signals in different sleep stages are frequency domain information and amplitude information of the signals, so that the classification effect of the method is often not accurate enough. Furthermore, researchers have proposed a second classification means for the above problem, that is, extracting time-frequency domain features from the preprocessed data, and then learning high-order features by using a deep learning network. The method can better reflect the frequency domain and amplitude information of the signal, but more parameter values need to be set in the time-frequency feature extraction process, and the influence of the parameter values on the final result is also great. The parameter setting step usually requires the guidance of a professional doctor or multiple experimental analyses, resulting in poor robustness of the network model.

In order to solve the above problems, the present invention uses Hilbert-Huang Transform (HHT) to adaptively decompose the sleep sample data. The method is a self-adaptive signal processing method for extracting signal time-frequency information, which is proposed by Huang et al. Firstly, an Empirical Mode Decomposition (EMD) algorithm is used for decomposing a signal into a plurality of Intrinsic Mode Functions (IMFs) and a Decomposition margin, and the IMFs are further converted and integrated into a time-frequency information matrix through Hilbert conversion. Compared with short-time Fourier transform and wavelet transform, the method completely analyzes and processes according to the characteristics of the sleep electroencephalogram signals, parameter setting is not needed to be performed manually, and the influence of parameter setting on feature extraction and feature classification is reduced to a great extent. The necessity of applying the method to perform time-frequency feature conversion in the invention has two points: on one hand, compared with time-series sleep sample data, the time-frequency matrix characteristics can better simulate the spectrum and amplitude analysis strategy of a doctor focusing on signals in the sleep stage process; on the other hand, compared with the time-series sleep sample data, the transformed time-frequency matrix characteristics are more fit with the use scene of the convolutional neural network.

In addition, in the study of the automatic sleep staging algorithm of the convolutional neural network related to sleep staging, most networks extract local feature expression by using a local learning strategy, and the local feature is fused by deepening the layer number of the network and the global feature expression is carried out. On one hand, the methods do not fully mine the association between the global information and the local information, on the other hand, the difficulty of network learning is high, and the effectiveness and the robustness of the information with large local variance are lacked. In order to solve the problems, the multi-granularity feature representation in the data is efficiently extracted by constructing a deep fusion network based on residual learning and multi-granularity feature fusion, and the global information and the local information of the sleep data are fully represented by feature fusion, so that the sleep staging effect is improved. As in the task of re-identification of pedestrians, Wang et al combines global and local features via a multi-granular network architecture to improve resolution and identification accuracy. According to the invention, a multi-branch deep network module is set up to be used for extracting multi-granularity characteristics of different frequency domain intervals. The output result of any branch in the module is finally divided into a plurality of sub-areas in the horizontal direction for representing the identification information of different frequency domain intervals, and the local identification information is cooperatively supplemented to the global identification information, so that the multi-granularity characteristic learning strategy is realized. Finally, the recognition performance of the network is effectively improved by fusing different channel characteristics of the sleep electroencephalogram data.

Disclosure of Invention

In order to solve the problems of data parameter setting and how to efficiently fuse the multi-granularity characteristics of sleep sample data of a plurality of channels, the invention provides a sleep staging method based on residual learning and multi-granularity characteristic fusion. After preprocessing operation, the method firstly utilizes HHT to extract time-frequency characteristics to obtain a time-frequency matrix with characterization information. Furthermore, the multi-granularity feature extraction and fusion part respectively extracts and fuses the multi-granularity features of the time-frequency matrixes of the channel data. In the multi-granularity feature extraction and fusion module, firstly, a three-branch residual learning network is used for learning and inputting global information and local information of a time-frequency matrix, and then an attention sensing fusion module is used for fusing feature information with different granularities in a single channel. Finally, in the fusion classification part, feature information among different channels is fused by using an attention perception fusion module. Finally, the method is proved by experiments that the classification precision and the classification performance of the sleep staging task can be effectively improved.

The main ideas for realizing the method are as follows: firstly, performing data preprocessing steps, namely filtering and segmenting operations on original data; further, performing time-frequency domain transformation on the segmented sleep electroencephalogram sample data by using HHT (Hilbert-Huang transform), and extracting time-frequency domain characteristics; then, in the multi-granularity feature extraction and fusion process, a multi-granularity feature extraction and fusion module is used for extracting and fusing different granularity features of the single-channel time-frequency features; and finally, fusing the characteristics of a plurality of channels in the sleep electroencephalogram data for classification in the fusion classification process.

The sleep staging method based on residual learning and multi-granularity feature fusion comprises the following steps:

step one, preprocessing operation of an original sleep electroencephalogram signal. Firstly, performing band-pass filtering on original data with a sampling rate of T and containing N electroencephalogram channels to remove noise; next, the electroencephalogram signal is divided into P pieces of sample data in total without overlapping using a sliding time window having a length of 30 s. Each sample data S_i,jAre data comprising N channels, each channel having a data length of 30T, where i ∈ [1,2]Indicates the serial number of the current sample data, j belongs to [1,2]Representing the channel of the current sample data.

And secondly, extracting time-frequency characteristics of the sample data. And performing Hilbert-Huang transform on each sample data obtained in the step one by one according to the channel sequence to extract the time-frequency characteristics of the data. In the step, the time-frequency feature resolution of the sample is set to be M manually, and the other parameters are calculated by algorithm self-adaption, so that the dimension of a matrix of the sample data after time-frequency transformation is N M.

And step three, extracting and fusing multi-granularity features. And (3) the time-frequency matrix is used as the input of the network and is sent into the deep network, and the multi-granularity characteristic of the time-frequency characteristic of each data channel is learned and fused. The time-frequency characteristic data of different channels use the same network architecture, but do not share network parameters. And the network used by each channel data is used for learning and fusing different granularity characteristics of the channel time-frequency characteristics as the significance characteristics of the channel data.

And step four, multi-channel feature fusion and classification. And step three, the significance characteristics of the sleep electroencephalogram data of the N channels can be obtained, the N channels are weighted and fused by the attention sensing fusion module, and then the classification result is obtained by using a classifier.

Compared with the prior art, the method provided by the invention has the following advantages:

the existing automatic sleep staging method is usually based on the characteristics of manual design, the classification performance has strong dependence on parameter value setting and strong dependence on priori knowledge, and sleep electroencephalogram time sequence signals are not suitable for input of a residual convolution neural network. In addition, the general sleep staging method based on deep learning is mostly based on the idea of local learning, and global information and local information are not comprehensively combined. The invention well overcomes the defects through a self-adaptive time-frequency conversion method and a multi-granularity fusion deep network.

Has the advantages that:

the invention creatively combines HHT with multi-granularity feature extraction and feature fusion deep network, utilizes self-adaptive time-frequency conversion and multi-granularity feature learning to simulate the diagnosis idea of distinguishing time-frequency information in data in the sleep stage process of a doctor, reduces the manual design features, simultaneously enables the extracted features to have more representativeness and identifiability, and further improves the classification precision.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a diagram of the overall network framework of the present invention;

FIG. 3 is a schematic diagram of a multi-granular feature extraction and fusion module according to the present invention;

FIG. 4 is a schematic view of an attention-aware fusion module according to the present invention;

fig. 5 is a schematic diagram comparing the results of the classification method according to the present invention with the results of manual classification by a specialist.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

A sleep staging method based on residual learning and multi-granularity feature fusion is disclosed, wherein a flow chart is shown in figure 1, and an overall network framework diagram is shown in figure 2.

Step 1, data preprocessing.

Assume that there is a set of training data TrainData and a set of test data TestData for the method framework. The amount of data used for training contained in TrainData is D_trainThe amount of data for the test contained in the TestData is D_test. Each piece of data is data with sampling frequency T containing N electroencephalogram channel signals.

And performing band-pass filtering on each piece of electroencephalogram signal data to remove noise in the signal, wherein the filtering range is set to be 0.5-32 Hz. For data with a higher sampling frequency, a down-sampling operation is also adopted to reduce the sampling frequency of the data to 100 Hz. Finally, the filtered data is subjected to non-overlapping segmentation operation with the time window length of 30 seconds, so that sub-segment sample data with the number of P can be obtained. In the sliding segmentation process, since there is no overlapping part between every two adjacent subsections, the sliding window segmentation process of each piece of data can be expressed as

For TrainData, the TrainData can be obtained after preprocessing

For the TestData, the sub-segment sample data can be obtained after preprocessing

The sub-segment sample data is N × T (30 × T).

In the practice of the present invention, the experimental data used involves two data sets. The first related data set is a public data set SLEEP-EDF, SLEEP dual-channel electroencephalogram data are selected for experiments, the electrode positions are Fpz-Cz and Pz-Oz respectively, and the sampling frequency is 100 Hz; the other related data set is a real data set acquired in a hospital, dual-channel sleep electroencephalogram data are selected for experiments, the electrode positions are respectively located at F3-A2 and F4-A1, and the sampling frequency is 1000 Hz.

And 2, extracting time-frequency characteristics.

Defining any segmented sample data S ═ S containing N electroencephalogram data channels₁,S₂,...,S_N]. For each channel data S_jExtracting corresponding time-frequency matrix characteristic F by using HHT algorithm_jWhere j is an element [1,2]Representing the channel of the current sample data.

HHT transformation of single channel signal data requires the following processing:

(1) single channel signal S using EMD algorithm_jThe n intrinsic mode functions IMF and a residual component R are adaptively decomposed. During decomposition, each IMF needs to satisfy the following two conditions: first, the number of extrema and the number of zeros of the signal components must differ by less than or equal to 1; second, the local average of the upper envelope defined by the signal component maximum and the lower envelope defined by the signal component minimum is 0. The decomposition process can be expressed as

Wherein S is_j(t) represents the signal-channel signal to be decomposed, IMF_u(t) represents the intrinsic mode function IMF, and R (t) represents the decomposition margin of the signal.

(2) For each IMF, the instantaneous frequency and instantaneous amplitude of the IMF are obtained using the Hilbert transform. The instantaneous amplitude is defined herein as a_u(t) and defining the instantaneous frequency as ω_u(t), the calculation of the two variables is as follows:

ω_u(t)＝d[arctan(c_u(t)/c_u(t))]/dt

wherein,

is the u-th IMF signal to perform Hilbert transformAnd (5) changing the result. The Hilbert transform process can be expressed as Hc_u(t)]Reconstructed signal z_u(t) using instantaneous frequency ω_u(t) implementation.

Finally, the timing signal S_j(t) is expressed as the formula H (ω, t) for time, frequency and amplitude by the hilbert-yellow transform:

in the present invention, the square of the amplitude H is used²(omega, t) describes the energy value of the electroencephalogram signal, and the calculated result forms a time-frequency matrix characteristic F ═ F₁,F₂,...,F_N]And artificially setting the time-frequency resolution of the features to be M. The time-frequency characteristics F of the sample data of a single channel_jThe size is M × M, and the feature size of the entire sample data is N × M.

And 3, extracting and fusing multi-granularity features.

As can be seen from fig. 2, the sample data F after the above processing is ═ F₁,F₂,...,F_N]Sending the data as input data into a structure containing N network branches to perform time-frequency matrix F aiming at different channels_jThe learning and integration of sleep multi-granular features. The structure and operation steps of each branch network are as follows:

(1) firstly, a single-channel time-frequency feature matrix F with the size of M is used_jAs input, the hidden layer output characteristic hidden _1 is obtained through a1 × 1 convolution layer, a ReLU activation layer and a maximum pooling layer_j. This operation is used to expand the number of channels, improve the non-linear learning capability of the network and reduce featuresThe size of the graph.

(2) Then, the hidden layer characteristic hidden _1 output in (1) is used_jAs input to the multi-granularity feature extraction and fusion module of fig. 2. The network diagram of the module is shown in fig. 3, and the module is composed of a multi-granularity residual error learning module and a multi-granularity feature fusion module. The multi-granularity residual learning part mainly takes a residual block ResBlock as a basic unit of the network, and is divided into four branch networks after passing through a residual block ResBlock1 of a main network. The part learns the input data hidden _1 using the branch networks_jThe characteristics of different granularity can represent the global and local information of the sleep electroencephalogram time-frequency characteristics. The network branch1 includes two residual blocks (ResBlock2, ResBlock4) and a pooling layer Maxpool 2. The network branch is used for learning input data hidden _1_jThe residual block ResBlock2 is set to step size 2 for reducing the dimensionality of the data, resulting in the output result of the branch, hidden _ branch1_j. The network branch2 is further divided into two-way network structures after passing through two residual blocks (ResBlock3, ResBlock 4). After one network passes through the pooling layer Maxpool3, the hidden layer output hidden _ branch2 of the network is obtained_jAfter the other network passes through the pooling layer Maxpool4, the obtained output is divided into two fine-grained characteristics with equal size along the horizontal direction to be used as hidden layer output hidden _ branch2_ seg1 of the other network_jAnd hidden _ branch2_ seg2_j. The network branch3 shares a similar network structure as the network branch2, which is divided into two paths after passing through two residual blocks (ResBlock3, ResBlock 4). Hidden layer output hidden _ branch3 obtained after one network passes through pooling layer Maxpool3_jAfter the other network passes through the pooling layer Maxpool5, the obtained output is divided into three fine-grained characteristics with equal size along the horizontal direction to be used as hidden layer output hidden _ branch3_ seg1 of the other network_j、hidden_branch3_seg2_jAnd hidden _ branch3_ seg3_j. It is noted that the residual block ResBlock3 is set to step size 1 in branch2 and branch3, and certain acceptance fields are reserved for subsequent fetching of local information. In addition, as can be seen from fig. 3, the multi-granularity residual learning part of the present invention further includes a branch structure including a1 × 1 convolutional layer and a pooling layer Maxpool1, the branch structure is used for performing dimension reduction expression on hidden layer features after passing through a main residual block ResBlock1, and an output result a of the branch structure is added to output results of a total of 8 hidden layers of the three branch networks to be a final output hidden of the multi-granularity residual learning part_j,kWhere j is an element [1,2]Represents the channel in which the current output data is located, k ∈ [1,2]Sequence numbers representing the above-mentioned 8 final outputs, respectively, e.g. high_j,1＝hidden_branch1_j+ A. The main role of this summation operation is to supplement the shallow significance information lost in the multi-granular feature learning process.

(3) Further, 8 high-order multi-granularity characterization features hidden output in the step (2) are used_j,kAttention awareness fusion is performed as an input to the multi-granular feature fusion portion in fig. 3. A schematic diagram of this operation is shown in fig. 3 and 4. As can be seen from FIG. 3, the operation comprises two steps, namely convolution dimensionality reduction and attention perception fusion. In the convolution dimension reduction part, 8 hiddenes are respectively subjected to convolution by 1-by-1_j,kAnd (4) performing dimension reduction in the channel dimension direction, and activating the network neurons by using a ReLU activation layer after the convolutional layer while adding a nonlinear factor to the network. The 8 outputs after dimensionality reduction will serve as inputs to the attention sensing fusion module shown in FIG. 4. The module firstly splices 8 input data to obtain a feature map Concat, and then feature matrixes Concat different granularities_k8 granularity feature channels of the feature map Concat are formed. Then, as can be seen from fig. 4, the module is divided into two paths. The first path is used for learning the data weight of 8 input data, so as to distribute the attention of network convergence. Reducing the size of the feature matrix to 1 on the granularity channel dimension through global average pooling by the network to obtain the directional features GAP representing different granularity features_k，k∈[1,2,...,8]The granularity characteristic channel where the directivity characteristic is located is expressed as follows:

where S × T represents the size of a feature matrix of a certain granularity, Concat_k(u, v) represents the value of each element on the kth granular feature matrix.

Further, the first path of the attention-sensing fusion module is provided with two linear layers for learning the directivity characteristics GAP on different granularity characteristic channels_kInformation weight value of_kSetting the activation functions of the two linear layers as a ReLU function and a Sigmoid function respectively, wherein the specific expression of the step is as follows:

weight_k＝σ(W₂δ(W₁GAP_k))

wherein, W₁And W₂Network parameters of two linear layers are respectively expressed, delta (·) and sigma (·) respectively express a ReLU function and a Sigmoid function, and k is equal to [1,2]And representing the granularity characteristic channel where the directivity characteristic and the learned weight value are located.

(4) Finally, it can be seen from FIG. 4 that the second pass of the attention-aware fusion module is an identity mapping operation. Combining two networks of the module, namely, the first learned weight of the characteristic channels with different granularities_kAnd the original input data Concat_kMultiplication allows for the assignment of attention weights to the input data. The invention weighs the scaled data based on attention weight_kSumming along granularity channel dimensions to obtain final attention-aware Fusion output Fusion_j，j∈[1,2,...,N]。Fusion_jAnd the final output result of certain channel data in the original N-channel time-frequency characteristics after multi-granularity characteristic extraction and fusion is represented, and all time-frequency information of the channel can be represented.

And 4, fusing and classifying the multichannel characteristics.

From the step 3, we can obtain N single-channel multi-granularity Fusion feature Fusion_j. We fuse the characteristics of the N channel data_jAs an input, the attention-aware fusion module mentioned in step three (3) is used for fusion between multi-channel features (during this process, the parameters k e [1, 2.,. N ] of the attention-aware fusion module]) And drawing the features obtained after fusion into a one-dimensional vector with the length of outdim. At the end of the whole model framework, the invention sets a linear classification layer with the neuron number of 5 according to the international standard AASM sleep stage standard, and classifies the extracted features into a sleep stage 5, namely a waking stage (W), a non-rapid eye movement stage (NREM, which comprises three types of N1, N2 and N3) and a rapid eye movement stage (REM). In addition, the number and class of sleep stages are not uniform criteria, and the number of classification neurons of the invention can be modified according to actual needs and specific situations.

Training optimization network

In the experimental process, the training set and the test set are divided according to a one-out-of-one verification method, namely, the sleep data of one subject is selected as the test set data, and the sleep data of all the other subjects are combined into the training set data. In the verification process of the model performance, the data of ten randomly selected subjects are respectively used as test data to carry out ten times of cross verification, and the stability of the model is verified.

In the training and optimizing process of the network, the high-order characteristics of single-channel data output by the N multi-granularity branch networks and the characteristics after multi-channel attention perception fusion are respectively constrained by using a plurality of cross entropy Loss functions to correct and adjust the weight and deviation in the network, and the Loss function Loss is_ceThe definition is as follows:

wherein p is_iIs a network output prediction, y_iThis indicates whether the prediction result is the same as the label, 1 for the same and 0 for the different. In addition, in the experiment, a Momentum optimizer is used for iteratively updating the network parameters, the Momentum value is set to be 0.4, and the learning rate is set to be 0.001.

The training of the network uses batch data (minipatch) to carry out optimization training, all sample data in a training set or a test set is divided into K minipatches, and each minipatch contains 1/K data of all the sample data. The network calculates the gradient of the sample in each minimatch in the training process, and updates the gradient parameter after averaging. And when the network finishes the cycle training of all K minibands, defining the network to finish a complete epoch. In the present invention, the minimatch data size is set to 512, and the network is set to train 100 epochs.

The network parameter settings for the entire framework are shown in table 1. In the experiment process, the time-frequency resolution M of the HHT is set to be 30.

Table 1 network framework parameter set-up for the method of the invention

Network performance evaluation

The present invention uses evaluation indicators common in several machine learning domains involved in model performance evaluation, including classification Accuracy (Accuracy), coarcapa value (Kappa), F1 values. These evaluation indices were calculated from True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN).

Classification accuracy refers to the ratio of the number of correctly predicted samples to the total number of predicted samples.

Kappa is a form of correlation coefficient used to measure the consistency level of the classification estimates of the data set.

The F1 value is used for comprehensively considering two indexes of accuracy and recall value and comprehensively reflecting the indexes of the whole model.

Wherein the two indexes of accuracy and recall value are respectively defined as

Results of the experiment

The effectiveness of the modules of the invention was verified by ablation experiments using SLEEP-EDF data, the results of which are plotted in table 2. The branch1, branch2, and branch3 represent three multi-granularity feature extraction branch networks in the multi-granularity feature extraction module, respectively, and the segrge 2 and segrge 3 represent network models for performing two-branch and three-branch multi-granularity feature fusion using the attention-sensing fusion module proposed in the present invention, respectively.

As shown in table 2, the branch3 network can achieve better classification effect due to richer granularity information in the experimental results of the three branch networks. In the whole ablation experiment result, the average classification accuracy of the three-branch fusion network using the attention-aware fusion strategy is the highest and reaches 85.40%. Secondly, a two-branch multi-granularity fusion network using an attention-aware fusion strategy achieves an average accuracy of 84.93%. The result shows that the multi-granularity feature extraction and fusion module can effectively extract effective features in the input data to complete the sleep staging task, and the classification performance of the network can be further improved by using the attention perception fusion strategy.

TABLE 2 Cross-fold validation results of ablation experiments by the method of the present invention

Table 3 shows the results of the inventive process compared to other prior art model processes. Compared with other methods, the method can obtain better classification precision on the premise of experiments with the same data set and the same number of subjects. In addition, the method verifies the classification effect of the model on real data collected by a hospital. The result shows that the model of the invention can also keep stable and efficient classification performance on a real data set.

TABLE 3 comparison of the Process of the invention with other Processes

In conclusion, the method provided by the invention integrates and classifies the global information and the local information of the sleep electroencephalogram signal by excavating deep multi-granularity characteristics of the sleep electroencephalogram time-frequency information, thereby effectively improving the classification effect of the automatic sleep stage task. The method can obtain ideal experimental results on both the public data set and the real data set acquired by the hospital, and further proves the effectiveness and robustness of the model.

Claims

1. A sleep staging method based on residual learning and multi-granularity feature fusion is characterized by comprising the following steps:

step one, preprocessing an original sleep electroencephalogram signal:

performing band-pass filtering on an original sleep electroencephalogram signal containing N channels to remove noise, and finally obtaining an N sleep-leading electroencephalogram signal with a sampling rate of T; further, dividing the preprocessed signal into P segments of sub-signals S of N channels without overlapping by adopting a sliding time window with the length of 30 seconds_i,jWhere i ∈ [1, 2.,. P. ], P]Indicates the serial number of the current sample data, j belongs to [1,2]Representing channels of current sample data, and a signal for each channelEach number contains 30 × T sampling points;

step two, HHT extracts time-frequency characteristics:

for any segmented sample data S containing N sleep electroencephalogram data channels_i＝[S_i1,S_i2,...,S_iN]Wherein S is_iN＝[S_i,1 ... S_i,N]，S_i,NI-th data representing an N-th pilot signal. Extracting corresponding time-frequency matrix characteristics from each channel data by using HHT; manually setting the time-frequency resolution of Hilbert-Huang transform to M, and obtaining a time-frequency matrix F ═ F with the characteristic size of any sample data being N × M₁,F₂,...,F_N]Wherein the time-frequency characteristic F of each EEG channel_jSize M x M;

step three, multi-granularity feature extraction and fusion:

time-frequency matrix F of each channel by using multi-granularity feature extraction method_jLearning and integrating sleep multi-granularity characteristics to finally obtain output Fusion representing all time-frequency information of each channel_j，j∈[1,2,...,N](ii) a The multi-granularity feature extraction method comprises three stages, namely data preprocessing, multi-granularity feature extraction on each single channel by using a multi-granularity residual learning module, and obtaining final attention perception Fusion output Fusion of each single channel by using a multi-granularity feature Fusion module_j；

Step four, multi-channel feature fusion and classification:

utilizing the multi-granularity feature Fusion module in the step three to perform multi-granularity Fusion feature Fusion on N channels_jAnd performing fusion, namely drawing the fused features into a one-dimensional vector which is used as the input of a linear classification layer to perform sleep stage classification.

2. The sleep staging method based on residual learning and multi-granularity feature fusion as claimed in claim 1, wherein the preprocessing of step one includes band-pass filtering and down-sampling, specifically including:

1) performing band-pass filtering on the original electroencephalogram signals at 0.5-32 Hz;

2) and if the sampling frequency of the original electroencephalogram signal is higher than 100Hz, using down-sampling operation to reduce the sampling frequency of the filtered data to 100 Hz.

3. The sleep staging method based on residual learning and multi-granularity feature fusion as claimed in claim 1, wherein the multi-granularity feature extraction method of step three includes the following steps:

1) data preprocessing: for any M x M single-channel time-frequency feature matrix, firstly, a1 x 1 convolution layer, a ReLU activation layer and a maximum pooling layer are respectively passed through to obtain hidden layer output feature hidden _1_jThe operation is used for expanding the number of channels, improving the nonlinearity of the deep network and reducing the size of the feature map;

2) hidden layer characteristic hidden _1 output in (1) by utilizing multi-granularity residual error learning module_jAnd performing multi-granularity feature extraction, wherein the multi-granularity residual error learning module takes a residual block ResBlock as a basic unit of a network, and the working process of the multi-granularity residual error learning module is as follows: single channel hidden layer feature hidden _1_jAfter a backbone network composed of a residual block ResBlock1 is input, the backbone network is divided into four branch networks, and input data hidden _1 is learned by the four branch networks_jThe first branch1 sequentially comprises a residual block ResBlock2, a residual block ResBlock4 and a pooling layer Maxpool2, and is used for learning and outputting input data hidden _1_jGlobal information hidden _ branch1_jWhere ResBlock2 uses a residual block with step size 2 to reduce the dimensionality of the data; the second branch2 is further divided into two-way network structure after sequentially passing through a residual block ResBlock3 and a residual block ResBlock4, wherein one way network passes through a pooling layer Maxpool3 to obtain a hidden layer output hidden _ branch2 of the way network_jAfter the other network passes through the pooling layer Maxpool4, the obtained output is divided into two fine-grained characteristics with equal size along the horizontal direction to be used as hidden layer output hidden _ branch2_ seg1 of the other network_jAnd hidden _ branch2_ seg2_j(ii) a The third branch3 passes throughThe residual block ResBlock3 and the residual block ResBlock4 are divided into two paths, wherein one path of network passes through the hidden layer output hidden _ branch3 obtained by the pooling layer Maxpool3_jAfter the other network passes through the pooling layer Maxpool5, the obtained output is divided into three fine-grained characteristics with equal size along the horizontal direction to be used as hidden layer output hidden _ branch3_ seg1 of the other network_j、hidden_branch3_seg2_jAnd hidden _ branch3_ seg3_j(ii) a Wherein, the residual block ResBlock3 in the branch2 and branch3 is set to step size 1, which is used to reserve a certain acceptance domain for the subsequent extraction of local information; the fourth branch structure sequentially comprises a1 × 1 convolution layer and a maximum pooling layer and is used for carrying out dimensionality reduction expression on hidden layer characteristics passing through a main residual block ResBlock1, and finally, 8 output results of the first three branches are added with output results of the fourth branch respectively to obtain 8 output hidden_j,kWhere j is an element [1,2]，k∈[1,2,...,8]；

3) Obtaining final attention perception Fusion output Fusion of each single channel by utilizing multi-granularity feature Fusion module_jThe working process of the multi-granularity feature fusion module is as follows:

first, to 8-way output hidden_j,kPerforming convolution dimensionality reduction, namely using 1-by-1 convolution to respectively perform convolution on 8 hiddenes_j,kPerforming dimension reduction in the channel dimension direction, and activating network neurons by using a ReLU activation layer after the convolutional layer while adding a nonlinear factor to the network;

then, an attention-sensing fusion module is used for completing 8-path hidden through convolution dimensionality reduction_j,kThe specific process comprises the following steps: 8 hidden elements after convolution dimensionality reduction_j,kPerforming splicing to obtain a feature map, Concat, which is the result of splicing 8 features with different granularities, Concat_kData referring to the kth channel in Concat, k ∈ [1, 2., 8 ∈]Then, the size of the feature matrix is reduced to 1 on the granularity channel dimension through global average pooling to obtain the directional features GAP representing different granularity features_kIt is specifically expressed as follows:

where S × T represents the size of the input feature matrix of a certain channel, Concat_k(u, v) represents the value of each element on the k-th channel feature matrix;

two linear layers are set next for learning the directivity characteristics GAP on different granularity characteristic channels_kInformation weight value of_kThe activation functions of the two linear layers are respectively a ReLU function and a Sigmoid function, and the information weight value formula is expressed as follows:

weight_k＝σ(W₂δ(W₁GAP_k))

wherein, W₁And W₂Respectively representing the weight values of two linear layers, wherein delta (·) and sigma (·) respectively represent a ReLU function and a Sigmoid function;

finally, to Concat_kThe attention weight is distributed to obtain the data Weighted after the attention weight is scaled_kAnd summing along the dimension of the granularity channel to obtain the final attention perception Fusion output Fusion of each single channel_j：