CN111631688A

CN111631688A - Algorithm for automatic sleep staging

Info

Publication number: CN111631688A
Application number: CN202010591697.XA
Authority: CN
Inventors: 刘铁军; 王林; 吕彬; 范宇熊; 宋晓宇; 郜东瑞; 尧德中
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-09-08
Anticipated expiration: 2040-06-24
Also published as: CN111631688B

Abstract

The invention discloses an algorithm for automatically staging sleep, which comprises the following steps: a. the feature layer extracts abstract features by using a multilayer perceptron and combines the traditional manual features based on expert experience to be used as information representation of sleep; b. then, a Bi-directional gating circulation unit Bi-GRU is used as a network model in the machine model layer; c. and finally, using a conditional random field CRF method as time continuity correction in a correction layer. The method solves the problems of poor stability, universality and practicability of the existing sleep automatic staging algorithm.

Description

Algorithm for automatic sleep staging

Technical Field

The invention relates to the field of sleep algorithms, in particular to an algorithm for automatically staging sleep.

Background

Different sleep time periods can be divided into five normal categories of WAKE, REM, N1, N2 and N3 periods according to international sleep medical standards by using physiological information during the sleep process of a person.

In recent years, a method of monitoring a sleep state of a person and automatically classifying the same using a computer device and a program has come into wide use. Although modern electronic information technology, machine learning theory, biomedical engineering and other aspects are rapidly developed, the sleep automatic staging method utilizing the machine learning theory still has no internationally recognized standard in the fields of scientific research, medical application, consumer electronics and the like. The main reasons include the lack of thorough and complete theory of the underlying mechanisms of human sleep, the lack of sufficient understanding and trust of clinicians and researchers in the international standards for artificial staging of sleep, the low consistency of sleep state classifications, the lack of expert experience in the developers of sleep monitoring systems, the intermingling of abnormal sleep patterns in normal sleep patterns, etc.

All the automatic sleep staging algorithms relate to characteristic engineering, developers have difficulty in deeply understanding the physiological process of sleep with expert experience, and the expert knowledge mainly comes from a sleep technical instruction manual of the American sleep medical society and an artificial staging criterion obtained by groping in practice.

The sleep automatic staging algorithm relates to a neural network, actual data is difficult to fit by an algorithm model, an under-fit model condition is caused, and the accuracy of a judgment result of a machine is far lower than an expectation.

In general, existing systems for automatically staging sleep are not accurate enough to distinguish between normal sleep states. The existing machine learning algorithm based on feature engineering and the machine learning algorithm based on a neural network reach the accuracy rate of 60% to 90% in a small data range, the existing algorithm does not consider the time continuity by taking the physiological signal data segment of the whole night sleep as an independence hypothesis, and the stability, the universality, the practicability and the like of the whole automatic staging technology are urgently required to be optimized.

Disclosure of Invention

The invention aims to provide an automatic sleep staging algorithm, which solves the problems of poor stability, universality and practicability of the conventional automatic sleep staging algorithm.

In order to solve the technical problems, the invention adopts the following technical scheme:

an algorithm for sleep automatic staging, comprising the steps of: a. the feature layer extracts abstract features by using a multilayer perceptron and combines the traditional manual features based on expert experience to be used as information representation of sleep; b. then, a Bi-directional gating circulation unit Bi-GRU is used as a network model in the machine model layer; c. and finally, using a conditional random field CRF method as time continuity correction in a correction layer.

As a further preferred aspect of the present invention, the step a of extracting abstract features by using a multi-layer perceptron and combining traditional expert experience-based manual features as the information characterization of sleep includes the following steps: s1, making data set from physiological data collected during sleep, including EEG signal, eye electrical signal and mandible electromyographic signal, where C ═ C₁，C₂，...，C_NS2, dividing continuous data of each person into S segments according to time sequence, each segment representing different sleep stage information, each segment needing to learn fitting by different algorithm models along with time lapse, so each segment separately making a data set to generate S segment data sets, each segment data set containing N × M sample data, M being the number of sample points contained in each segment data, one sample point representing a sleep period, the time span of which is usually 30 seconds, and each sample point containing L signal sampling points, so the data set sample size of C is S × N × M × L, S3, the S segment data sets are respectively subjected to feature engineering, wherein the method comprises the abstract feature extraction of a self-encoder and the traditional features of time domain feature extraction, frequency domain feature extraction and nonlinear dynamics feature extractionS4, in the machine model layer, the bidirectional gating circulation unit Bi-GRU network model is trained by using the segment sequences as samples, S segment data sets respectively comprise N × M segment sequences, each segment data set is divided into a training set and a test set according to a certain proportion, after training of each segment data set is finished, the model is stored, S5, sample points for training are input into the trained bidirectional gating circulation unit Bi-GRU network model, classification labels of sleep stages are obtained finally, S6, the label sequences of S segment samples sleeping at night are spliced into an ultralong sequence, namely, the whole data finally corresponds to a complete whole label sequence, the whole label sequence is a one-dimensional vector, the whole label sequence is T ×, the whole night label set contains M overnight labels, and the training set contains three previous label sequences C_tThe strip, the verification set contains C_vThe test set comprises C strips; the step c of using a conditional random field CRF method as time continuity correction in the correction layer comprises a step S7, wherein in the step S7, the label training set sequence obtained in the step S6 is input into the correction layer, the advantages of the conditional random field CRF method in the aspect of context information transfer extraction are used, specifically, a CRF linear chain method is used for modeling the sequence, an optimal label sequence path is decoded by a Viterbi algorithm, and an overnight label sequence is corrected to be continuously consistent with a label sequence of an expert artificial sleep stage judgment result.

As a further preferred aspect of the present invention, the step S1 further includes the following sub-steps: s11, monitoring, recording and storing human body physiological signal data according to American society for sleep medical Science (SOH) standard by a polysomnography device with a physiological signal acquisition function during human sleep; s12, after sampling and digitalizing the original signal data of different kinds of physiological signals, respectively carrying out zero-phase digital filtering to prevent the physiological signals with non-stationary property from phase distortion, and removing extremely low frequency baseline, power frequency noise and high frequency noise in the signalsFinishing the pretreatment; s13, extracting the electro-ocular signal, the mandibular electromyographic signal and the electroencephalogram signal in the physiological signals, and carrying out original data set C ═ C by using a three-lead signal₁，C₂，...，C_NMaking N overnight sleep data from N individuals, and dividing each subset by person.

As a further preferred aspect of the present invention, the step S2 further includes the following sub-steps: s21, reasonably setting a segmentation stage mode, wherein single sleep data are continuous in time for one night, the whole continuous process is divided into S stages by the algorithm, physiological signals of segmented sections represent different information of the sleep process, the algorithm simulates the prejudgment experience of experts on the whole night signal data in a plurality of trend stages when the experts artificially sleep the stages, and the number M of sample points in each stage is set by the aid of how long the stage has; s22, preparing a specific data set for the early stage of an algorithm model, dividing S data sets on the basis of the original data set, wherein the data size cannot be wrong, each data set comprises N multiplied by M sample points from N individuals, each sample point is the minimum unit of sleep staging and has the time span of 30 seconds, one sample point corresponds to one category label, and the five category labels comprise WAKE, REM, N1, N2 and N3; and S23, aligning the label with the data, and creating and storing the check segment data set into a file. The above data set production is the key step of the present invention, and the result and performance of the whole algorithm are deeply influenced.

As a further preferred aspect of the present invention, the step S3 further includes the following sub-steps: s31, extracting traditional characteristics, and calculating characteristic vector x according to the signals_iThe dimension is k. Feature vector x_iThe m features of (1) include: the dynamic characteristic vector comprises a time domain characteristic vector, a frequency domain characteristic vector and a nonlinear dynamic characteristic vector, wherein the time domain characteristic vector comprises a statistical characteristic vector and a geometric characteristic vector. The frequency domain characteristic quantity comprises a power spectral density characteristic quantity and a time frequency characteristic quantity, the nonlinear dynamics characteristic quantity comprises a fractal dimension characteristic quantity and a complexity characteristic quantity, and each characteristic quantity is determined by respective parameters and a calculation mode; and S32, extracting abstract features. Self-encoder through artificial neural network domainAbstract feature extraction is carried out, an artificial neural network capable of efficiently representing input data is fitted by utilizing the unsupervised learning characteristic of the artificial neural network, no additional artificial auxiliary work is added, the input signal data is efficiently represented by a fixed low-dimensional vector, namely self-coding, the output dimension of the self-coding is generally smaller than the input signal dimension, namely the dimension reduction characteristic of self-coder data, the self-coder is adopted, the self-coder is provided with a plurality of coding layers, the complexity of coding depends on the number of layers of neural network stacking layers, the number of stacking layers is properly increased, the input data can be effectively compressed and represented, S33, feature engineering is carried out on each sample point, and the feature vectors ξ are spliced into the feature vector_i，(ξ_i∈R^kξ_i∈R^k) The sample points in each segment must be arranged in a segment sequence in time sequence, the corresponding eigenvectors are also arranged in a segment sequence seq, and actually, the whole eigenspace of the segment sequence structure is a three-dimensional tensor X ∈ R^N×M×k。

As a further preferred embodiment of the present invention, the step S4 further comprises the sub-steps of S41 dividing the tensor X of the feature space generated by seq into a final training data set, a verification data set and a test data set, and S42 inputting the section sequence seq with time sequence into the Bi-directional gating cycle unit Bi-GRU network model, and generating the tensor X ∈ R of the feature space generated by seq^N×M×kReasonably setting the structure, training mode and initial parameters of the network model, and then loading a data set to start the training of the model; and S43, storing the Bi-directional gating circulation unit Bi-GRU network model after the Bi-directional gating circulation unit Bi-GRU network model reaches a preset termination condition.

As a further preferred aspect of the present invention, the step S5 further includes the following sub-steps: s51, all data are transmitted forward by using the network model to obtain a label of a sleep classification stage corresponding to a data sample, wherein the data sample is a training set and a verification set generated by the data set; and S52, comparing the classification result labels of the machine models with the manual classification result labels of the experts, recording evaluation indexes such as accuracy, recall rate, F1 scores and the like of each data set, and completing construction of the network model.

As a further preferred aspect of the present invention, the step S7 further includes the following sub-steps: s71, constructing a conditional random field CRF model, inputting the tag sequence training set in the step S6 and the tag sequences corresponding to the expert manual staging into the conditional random field CRF model, setting the number K of feature functions, iteratively training out optimal parameters, and further obtaining the conditional probability P (y | x) of the conditional random field, namely context information of time-dependent transfer from the model learning to the sleep stage, wherein the stage transfer is closely related to the sleep time, and the conditional random field CRF model is also a key step of probability transfer by utilizing the CRF model according to the sleep characteristics; s72, testing the corrected result of the CRF model, and utilizing the conditional probability P (y | x) and the label sequence x in the verification set_sTo calculate the optimal tag sequence y^*And finally, calculating evaluation indexes such as accuracy and the like. And (5) counting and comparing the results of the CRF correction model and the network model.

Compared with the prior art, the invention can at least achieve one of the following beneficial effects:

1. selecting a proper data segment for segmentation, so that the algorithm has stronger adaptability to each sleep stage;

2. traditional characteristics and abstract characteristics are fused, and the expression capability of the characteristics is amplified, so that the algorithm accuracy is higher;

3. the interpretation of the algorithm on the time continuity is effectively enhanced through the time-associated network and the probability correction.

Drawings

FIG. 1 is an overall block diagram of the algorithm of the present invention.

FIG. 2 is a schematic view of a data set generation process according to the present invention.

FIG. 3 is a schematic diagram of data set generation according to the present invention.

FIG. 4 is a sample structure of data according to the present invention.

Fig. 5 is a block diagram of a stacked self-encoder according to the present invention.

FIG. 6 is a schematic diagram of a first layer of a stacked self-encoder according to the present invention.

FIG. 7 is a diagram of a second layer of a stacked self-encoder according to the present invention.

FIG. 8 is a diagram of the output layer of a stacked self-encoder according to the present invention.

Fig. 9 is an overall schematic diagram of a stacked self-encoder according to the present invention.

FIG. 10 is a diagram of a Bi-GRU network model according to the present invention.

Fig. 11 shows the internal structure of a GRU node according to the present invention.

FIG. 12 is a schematic view of an overall machine model layer incorporating CRF corrections in accordance with the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Specific example 1:

fig. 1, fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7, fig. 8, fig. 9, fig. 10, fig. 11, and fig. 12 show an algorithm for sleep automatic staging, and as shown in fig. 1, the overall idea of the present invention is to refine the construction of a sleep automatic staging algorithm model step by step according to three processes of data set production, feature engineering, and model layering.

As shown in fig. 2, the data set production flow is divided into three subsections, data acquisition, data pre-processing and overnight data segmentation. First, a data set is created from physiological data collected during human sleep, including conventional physiological signals such as electroencephalogram (EEG), Electrooculogram (EOG), and mandibular Electromyogram (EMG). C ═ C₁，C₂，...，C_NC represents the data set of all people and N represents the number of people (or the number of times of completing overnight sleep).

Human body physiological signal data are monitored, recorded and stored according to the American society for sleep medicine standard by a polysomnography device with a physiological signal acquisition function during the sleep of a human body.

The original signal data of different kinds of physiological signals are digitized after sampling, and then zero-phase digital filtering is respectively carried out to prevent the physiological signals with non-stationary property from phase distortion. And removing extremely low-frequency baseline, power frequency noise and high-frequency noise in the signal, and finishing signal preprocessing.

Extracting the electro-oculogram signal, the mandible myoelectricity signal and the brain electricity signal in the physiological signals, and carrying out original data set C ═ C by using the three-lead signals₁，C₂，...，C_NAnd (6) making. N overnight sleep data from N individuals, each subset divided by person.

The structure of the data is visualized, as shown in fig. 3 and 4, the data of each person which is continuous overnight is divided into S segments according to the time sequence, each segment represents different sleep stage information, and each sleep data needs to be fitted by different algorithm models along with the time. Thus, each segment is individually made into a data set, resulting in S segment data sets, each segment data set containing N × M sample data, where M is the number of sample points contained in each segment of data. One sample point represents a sleep period, which is typically 30 seconds in time span. Each sample point contains L signal sampling points. The data set sample size for C is sxnxnxnxmxmx.

And reasonably setting the mode of the segmentation stage. The single overnight sleep data are continuous in time, the whole continuous process of the data is divided into S stages by the algorithm, and the physiological signals of the divided stages represent different information of the sleep process. The algorithm simulation expert can have a plurality of trend stages of prejudgment experience on the signal data of the whole night when the artificial sleep stage is divided. Further, how long a stage has is to set the number M of sample points per stage.

And preparing a specific data set for the early preparation of the algorithm model, and dividing S section data sets on the basis of the original data set. The data size cannot be wrong, each data set comprises NxM sample points from N persons, each sample point is a minimum unit of sleep stage, the time span is 30 seconds, and one sample point corresponds to one category label. There are five category labels, which include WAKE, REM, N1, N2, and N3.

And aligning the label with the data, and checking the segment data set to prepare and store the segment data set to a file. The above data set production is the key step of the present invention, and the result and performance of the whole algorithm are deeply influenced.

And respectively performing feature engineering on the S segment data sets, wherein the mode comprises the following steps of extracting abstract features: an auto-encoder; traditional features are extracted: time domain features, frequency domain features, and nonlinear dynamics features. All the features are spliced into vectors, M vectors are generated from each segment of data, and the vectors are sequentially arranged to form a sequence, namely a segment sequence.

In which conventional features are extracted. Computing a feature vector x from the signal_iThe dimension is k. Feature vector x_iThe m features of (1) include: time domain feature quantity, frequency domain feature quantity and nonlinear dynamics feature quantity. The time-domain feature quantity includes a statistical feature quantity and a geometric feature quantity. The frequency domain feature quantity includes a power spectral density feature quantity and a time frequency feature quantity. The nonlinear dynamics characteristic quantity comprises a fractal dimension characteristic quantity and a complexity characteristic quantity. Each feature quantity is determined by respective parameters and calculation modes.

Wherein abstract features are extracted. Abstract feature extraction is performed by an auto-encoder (Autoencoders) in the field of artificial neural networks. And fitting an artificial neural network capable of efficiently representing input data by using the unsupervised learning characteristic of the artificial neural network. The method has the advantages that extra manual assistance work is not added, and the input signal data are effectively represented by fixed low-dimensional vectors, namely self-coding. The output dimension is generally smaller than the input signal dimension, i.e., the dimension reduction characteristic of the self-encoder data. The invention adopts a self-encoder (SA) which is provided with a plurality of encoding layers, the complexity of encoding depends on the number of the stacking layers of the neural network, the stacking layers are properly increased, and the input data can be effectively compressed and expressed.

The network structure principle of the stacked self-encoder is shown in fig. 5, 6, 7, 8 and 9, and abstract features are extracted.

Performing feature engineering on the sample points to form a feature vector ξ_i，(ξ_i∈R^k) The sample points in each segment must be arranged in a sequence of segments in time sequence, and the corresponding eigenvectors are also arranged in a sequence of segments seq^N×M×k。

Through a data set manufacturing process and a characteristic project, the data set structure achieves the design idea of the invention, and fully embodies the time dependence of characteristic fusion and data.

As shown in fig. 10, at the machine model layer, the Bi-directional gated cyclic unit Bi-GRU network model is trained using the above segment sequences as samples, and the S segment data sets respectively include N × M segment sequences.

The tensor X of the feature space generated by the seq is divided into a final training data set, a verification data set and a test data set.

After cross validation, training of each section of data set is finished, and finally the model is stored.

Inputting the above sequence of segments seq with time sequence into Bi-directional gating cycle unit Bi-GRU network model tensor X ∈ R of eigenspace generated by seq^N×M×kShould meet the Bi-GRU input layer requirements. Reasonably setting the structure, the training mode and the initial parameters of the network model, and then loading the data set to start the training of the model.

And the Bi-directional gating circulation unit Bi-GRU network model is stored after reaching a preset termination condition.

And transmitting all data forward by using the network model to obtain the label of the sleep classification stage corresponding to the data sample. Wherein the data samples are a training set and a validation set generated by the data sets.

And comparing the classification result labels of the machine model with the manual classification result labels of the experts, recording evaluation indexes such as accuracy, recall rate, F1 scores and the like of each data set, and completing the construction of the network model.

And inputting the sample points for training into the Bi-directional gating circulation unit Bi-GRU network model which is trained, and finally obtaining the classification labels of the sleep stage.

Splicing the label sequences of S section samples sleeping at night into an ultra-long sequence, namely, a whole night data corresponds to a complete whole night label sequence, wherein the whole night label sequence is a one-dimensional vector with the dimension of T ═ S × M_tThe strip(s) are (are),the verification set contains C_vBars, test set contains C bars.

And inputting the label training set sequence into a correction layer, modeling the sequence by using a Conditional Random Field (CRF) linear chain method by utilizing the advantages of the CRF method in the aspect of context information transfer extraction, and decoding an optimal label sequence path by using a Viterbi algorithm. And correcting the overnight label sequence to make the overnight label sequence continuously coincide with the label sequence of the judgment result of the expert artificial sleep stage.

And (4) constructing a conditional random field CRF model. Inputting the training set of tag sequences in the step S6 and the tag sequences corresponding to the expert manual staging into a conditional random field CRF model, setting the number K of feature functions, iteratively training out optimal parameters, and further obtaining a conditional probability P (y | x) of the conditional random field, that is, context information of the model learning to sleep stage time dependency transfer, stage transfer and sleep time are closely related, which is also a key step of the present invention in utilizing the probability transfer of the CRF model for sleep characteristics.

And testing the corrected result of the CRF model. Using the conditional probability P (y | x) and the tag sequence x in the verification set_sTo calculate the optimal tag sequence y^*. And finally, calculating evaluation indexes such as accuracy and the like. And (5) counting and comparing the results of the CRF correction model and the network model.

Although the invention has been described herein with reference to a number of illustrative embodiments thereof, it should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure. More specifically, various variations and modifications are possible in the component parts and/or arrangements of the subject combination arrangement within the scope of the disclosure, the drawings and the appended claims. In addition to variations and modifications in the component parts and/or arrangements, other uses will also be apparent to those skilled in the art.

Claims

1. An algorithm for automatic sleep staging, characterized by: the method comprises the following steps: a. the feature layer extracts abstract features by using a multilayer perceptron and combines the traditional manual features based on expert experience to be used as information representation of sleep; b. then, a Bi-directional gating circulation unit Bi-GRU is used as a network model in the machine model layer; c. and finally, using a conditional random field CRF method as time continuity correction in a correction layer.

2. The algorithm for sleep automatic staging according to claim 1, characterized in that: the step a of extracting abstract features by using a multilayer perceptron and combining traditional manual features based on expert experience as the information representation of sleep comprises the following steps: s1, making data set from physiological data collected during sleep, including EEG signal, eye electrical signal and mandible electromyographic signal, where C ═ C₁,C₂,...,C_NS2, dividing continuous data of each person in whole night into S sections according to time sequence, each section represents different sleep stage information, each section sleep data needs to be learned and fitted by different algorithm models along with the time lapse, so each section independently produces a data set to generate S section data sets, each section data set comprises N × M sample data, M is the number of sample points contained in each section data, one sample point represents a sleep period, the time span is usually 30 seconds, each sample point comprises L signal sampling points, so the data set sample size of C is S × N × M × L, S3, the S section data sets are respectively subjected to feature engineering, wherein the method comprises the abstract extraction of self-encoder, the extraction of traditional features, namely time domain features, frequency domain features and nonlinear dynamics features, all the features are spliced into vectors, each data set generates M vectors, the sequences are sequentially arranged to form sequences, namely the sequences of the S section data sets are sequentially arranged, the self-encoder is used for respectively inputting the data sets of bidirectional test sections, the bidirectional training model sets, the bidirectional training model sequences comprise GRU-segment model sequences, the steps of a GRU 6778-segment training unit, and a bidirectional training unit is respectively used for dividing the data set according to the GRU-S6778-S training unit, the bidirectional training unit, the steps of the bidirectional training unit, namely, the bidirectional training unit comprises the steps of a GRU, a GRU-S-segment training unit, a bidirectional training unit, a GRU training unit, a bidirectional training unit, aS6, splicing the label sequences of S section samples sleeping at night into an ultra-long sequence, namely, the overnight data corresponds to a complete overnight label sequence, the overnight label sequence is a one-dimensional vector, the dimension of the overnight label sequence is T-S × M, the sequence set containing N overnight labels is still divided into three sets according to the previous division, and the training set contains C_tThe strip, the verification set contains C_vThe test set comprises C strips; the step c of using a conditional random field CRF method as time continuity correction in the correction layer comprises a step S7, wherein in the step S7, the label training set sequence obtained in the step S6 is input into the correction layer, the advantages of the conditional random field CRF method in the aspect of context information transfer extraction are used, specifically, a CRF linear chain method is used for modeling the sequence, an optimal label sequence path is decoded by a Viterbi algorithm, and an overnight label sequence is corrected to be continuously consistent with a label sequence of an expert artificial sleep stage judgment result.

3. The algorithm for sleep automatic staging according to claim 2, characterized in that: the step S1 further includes the following sub-steps: s11, monitoring, recording and storing human body physiological signal data according to American society for sleep medical Science (SOH) standard by a polysomnography device with a physiological signal acquisition function during human sleep; s12, after sampling and digitizing the original signal data of different types of physiological signals, respectively carrying out zero-phase digital filtering to prevent the physiological signals with non-stationary properties from phase distortion, removing extremely low frequency base lines, power frequency noise and high frequency noise in the signals, and completing signal preprocessing; s13, extracting the electro-ocular signal, the mandibular electromyographic signal and the electroencephalogram signal in the physiological signals, and carrying out original data set C ═ C by using a three-lead signal₁，C₂，。。。，C_NMaking N overnight sleep data from N individuals, and dividing each subset by person.

4. The algorithm for sleep automatic staging according to claim 2, characterized in that: the step S2 further includes the following sub-steps: s21, reasonably setting a segmentation stage mode, wherein single sleep data are continuous in time for one night, the whole continuous process is divided into S stages by the algorithm, physiological signals of segmented sections represent different information of the sleep process, the algorithm simulates the prejudgment experience of experts on the whole night signal data in a plurality of trend stages when the experts artificially sleep the stages, and the number M of sample points in each stage is set by the aid of how long the stage has; s22, preparing a specific data set for the early stage of an algorithm model, dividing S data sets on the basis of the original data set, wherein the data size cannot be wrong, each data set comprises N multiplied by M sample points from N individuals, each sample point is the minimum unit of sleep staging and has the time span of 30 seconds, one sample point corresponds to one category label, and the five category labels comprise WAKE, REM, N1, N2 and N3; and S23, aligning the label with the data, and creating and storing the check segment data set into a file. The above data set production is the key step of the present invention, and the result and performance of the whole algorithm are deeply influenced.

5. The algorithm for sleep automatic staging according to claim 2, characterized in that: the step S3 further includes the following sub-steps: s31, extracting traditional characteristics, and calculating characteristic vector x according to the signals_iThe dimension is k. Feature vector x_iThe m features of (1) include: the dynamic characteristic vector comprises a time domain characteristic vector, a frequency domain characteristic vector and a nonlinear dynamic characteristic vector, wherein the time domain characteristic vector comprises a statistical characteristic vector and a geometric characteristic vector. The frequency domain characteristic quantity comprises a power spectral density characteristic quantity and a time frequency characteristic quantity, the nonlinear dynamics characteristic quantity comprises a fractal dimension characteristic quantity and a complexity characteristic quantity, and each characteristic quantity is determined by respective parameters and a calculation mode; and S32, extracting abstract features. Abstract feature extraction is carried out through a self-encoder in the field of artificial neural networks, the unsupervised learning characteristic is utilized to fit the artificial neural networks capable of efficiently representing input data, no additional manual auxiliary work is added, the input signal data is efficiently represented by fixed low-dimensionality vectors, namely self-encoding is carried out, the output dimensionality of the self-encoder is generally smaller than that of the self-encoder, and the output dimensionality of the self-encoder is generally smaller than that of the self-encoderThe invention adopts a self-encoder which is provided with a plurality of encoding layers, the complexity of encoding depends on the number of stacking layers of a neural network, the stacking layers are properly increased, the input data can be effectively compressed and expressed, S33, the characteristic engineering is carried out on each sample point, and the sample points are spliced into a characteristic vector ξ_i，(ξ_i∈R^kξ_i∈R^k) The sample points in each segment must be arranged in a segment sequence in time sequence, the corresponding eigenvectors are also arranged in a segment sequence seq, and actually, the whole eigenspace of the segment sequence structure is a three-dimensional tensor X ∈ R^N×M×k。

6. The sleep automatic staging algorithm as claimed in claim 2, wherein the step S4 further includes the sub-steps of S41 dividing the tensor X of the seq-generated eigenspace into a final training data set, a verification data set and a test data set, S42 inputting the sequence of segments seq with time sequence into the Bi-directional gating cycle unit Bi-GRU network model, and the tensor X ∈ R of the seq-generated eigenspace^N×M×kReasonably setting the structure, training mode and initial parameters of the network model, and then loading a data set to start the training of the model; and S43, storing the Bi-directional gating circulation unit Bi-GRU network model after the Bi-directional gating circulation unit Bi-GRU network model reaches a preset termination condition.

7. The algorithm for sleep automatic staging according to claim 2, characterized in that: the step S5 further includes the following sub-steps: s51, all data are transmitted forward by using the network model to obtain a label of a sleep classification stage corresponding to a data sample, wherein the data sample is a training set and a verification set generated by the data set; and S52, comparing the classification result labels of the machine models with the manual classification result labels of the experts, recording evaluation indexes such as accuracy, recall rate, F1 scores and the like of each data set, and completing construction of the network model.

8. According to claimThe sleep automatic staging algorithm of claim 2, characterized by: the step S7 further includes the following sub-steps: s71, constructing a conditional random field CRF model, inputting the tag sequence training set in the step S6 and the tag sequences corresponding to the expert manual staging into the conditional random field CRF model, setting the number K of feature functions, iteratively training out optimal parameters, and further obtaining the conditional probability P (y | x) of the conditional random field, namely context information of time-dependent transfer from the model learning to the sleep stage, wherein the stage transfer is closely related to the sleep time, and the conditional random field CRF model is also a key step of probability transfer by utilizing the CRF model according to the sleep characteristics; s72, testing the corrected result of the CRF model, and utilizing the conditional probability P (y | x) and the label sequence x in the verification set_sTo calculate the optimal tag sequence y^*And finally, calculating evaluation indexes such as accuracy and the like, and counting and comparing results of the CRF correction model and the network model.