CN115374815A - Automatic sleep staging method based on visual Transformer - Google Patents
Automatic sleep staging method based on visual Transformer Download PDFInfo
- Publication number
- CN115374815A CN115374815A CN202210965248.6A CN202210965248A CN115374815A CN 115374815 A CN115374815 A CN 115374815A CN 202210965248 A CN202210965248 A CN 202210965248A CN 115374815 A CN115374815 A CN 115374815A
- Authority
- CN
- China
- Prior art keywords
- psg
- signal samples
- psg signal
- time
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000000007 visual effect Effects 0.000 title claims abstract description 26
- 238000012545 processing Methods 0.000 claims abstract description 54
- 230000008667 sleep stage Effects 0.000 claims abstract description 27
- 108010076504 Protein Sorting Signals Proteins 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 10
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 9
- 238000013526 transfer learning Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims abstract description 7
- 238000001228 spectrum Methods 0.000 claims description 60
- 230000000903 blocking effect Effects 0.000 claims description 12
- 230000000873 masking effect Effects 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 239000000758 substrate Substances 0.000 claims description 6
- 239000000654 additive Substances 0.000 claims description 5
- 230000000996 additive effect Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 5
- 230000004424 eye movement Effects 0.000 description 4
- 238000000638 solvent extraction Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 210000003205 muscle Anatomy 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000019116 sleep disease Diseases 0.000 description 2
- 230000003860 sleep quality Effects 0.000 description 2
- 230000003187 abdominal effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000007865 diluting Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 230000000241 respiratory effect Effects 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 208000020685 sleep-wake disease Diseases 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 210000004243 sweat Anatomy 0.000 description 1
- 230000035900 sweating Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4806—Sleep evaluation
- A61B5/4812—Detecting sleep stages or cycles
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7203—Signal processing specially adapted for physiological signals or for diagnostic purposes for noise prevention, reduction or removal
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7253—Details of waveform analysis characterised by using transforms
- A61B5/7257—Details of waveform analysis characterised by using transforms using Fourier transforms
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Public Health (AREA)
- Signal Processing (AREA)
- Veterinary Medicine (AREA)
- Pathology (AREA)
- Animal Behavior & Ethology (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Surgery (AREA)
- Psychiatry (AREA)
- Physiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Fuzzy Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses an automatic sleep staging method based on a visual Transformer. The method comprises the steps of processing an original PSG signal through a sliding window to obtain a PSG signal sequence; carrying out data enhancement on the PSG signal sequence to obtain an enhanced signal sample; establishing a sleep staging network by cascading a visual Transformer frame-level encoder, a bidirectional GRU sequence-level encoder and a softmax layer, inputting each group of PSG signal samples into the sleep staging network, predicting the sleep stage of the PSG signal samples, initializing the network by cross-modal transfer learning, establishing a loss function by combining the real sleep stage of the PSG signal samples, and training by using an ADAM optimizer to obtain an optimized sleep staging network; and (4) acquiring a PSG signal in real time, and predicting a sleep stage by passing a PSG signal sample through an optimized sleep stage network. According to the invention, data enhancement is designed according to the noise and the artifact of the PSG signal, so that the robustness of the network to the noise and the artifact of the PSG signal is improved; an encoder based on a visual Transformer is introduced to improve the network feature representation capability; through transfer learning, dependence on a large amount of PSG data is relieved.
Description
Technical Field
The invention belongs to the technical field of sleep quality assessment, and particularly relates to an automatic sleep staging method based on a visual Transformer.
Background
Polysomnography (PSG) is a standard technology clinically used for sleep state monitoring, comprehensively records various physiological indexes of a monitored object in a sleep process, including nerve signals such as electroencephalogram (EEG), electrooculogram (EOG) and Electromyogram (EMG) and respiratory monitoring data such as oral-nasal airflow, chest and abdominal pressure and blood oxygen saturation concentration, and can be used as an effective basis for evaluating sleep quality and diagnosing sleep disorder symptoms. However, sleep physiological signal analysis for a long time depends on sleep experts to manually check polysomnography signals, the method has the problems of low efficiency, high labor cost and the like, and subjective differences of expert knowledge can also cause errors of evaluation results. Therefore, there is a need to develop a robust, high-performance automated sleep staging tool to assist physicians in sleep staging, improving sleep staging efficiency and accuracy.
The automated sleep staging technique is the basis for expanding sleep assessment and diagnosis, serving millions of people with sleep disorders, and making sleep monitoring possible in a home environment. Although the existing automatic sleep staging model obtains good automatic sleep staging performance and can exceed the staging accuracy of a single human expert, the existing model still has some problems to be solved:
visual transformers can extract valid feature representations, but the performance on PSG signals has not been explored; while the position information on the time and frequency axes is crucial for the fourier transformed PSG signal. However, in the recent sleep staging method, only the position relationship on the time axis is considered, and the visual Transformer can capture the position information on the time axis and the frequency axis simultaneously, so that the defects of the existing model are overcome.
The Transformer-based deep learning approach requires a large amount of training data to surpass the performance of CNN. Current Transformer-based automated sleep staging models perform well when pre-trained on large-scale PSG datasets, but the accuracy of sleep staging decreases significantly on smaller datasets. However, it is difficult to obtain large-scale, accurately labeled PSG data sets, and training the model from scratch on large PSG data sets consumes a significant amount of computational resources.
The existing automatic sleep staging model has low robustness to noise and artifacts. Due to human factors and the influence of an acquisition environment, noise and artifacts which are difficult to avoid exist in the PSG signal. However, research into the design of data enhancement modules for PSG signals is still quite limited. Most work has taken enhancement means directly in image and audio tasks without taking into account the characteristics of the PSG signal itself.
There is therefore a substantial need to develop a robust automatic sleep staging technique based on visual transducers.
Disclosure of Invention
Aiming at the problems of limited performance, scarce reliable PSG data set, poor robustness of a model to PSG signal noise and artifacts and the like of feature representation in a sleep staging task, the invention introduces a visual transform-based encoder, relieves the dependence on a large amount of PSG data through transfer learning, and designs a data enhancement module aiming at the noise and artifacts of the PSG signal, thereby learning the feature representation with high performance and high robustness.
The model comprises three key ideas: designing a frame-level encoder based on a visual Transformer, capturing short-term context information by adopting a sliding window, and realizing long-term sequence-level modeling by using a GRU (generalized regression Unit); through cross-modal transfer learning, a pre-training model on an out-of-domain data set is used for fine tuning on a sleep PSG data set so as to reduce dependence on a large-scale PSG data set; a dynamic data enhancement module for EEG and EOG channels is proposed to enable the model to learn more robust feature representations.
The method is an automatic sleep staging method based on a visual transducer, and comprises the following specific steps:
step 1: introducing original PSG signals of a plurality of channels at a plurality of moments, and processing the original PSG signals of each channel at the plurality of moments through a sliding window to obtain a plurality of PSG signal sequences of each channel;
step 2: obtaining multiple groups of PSG signal samples after data enhancement processing of each channel by carrying out data enhancement processing on multiple PSG signal sequences of each channel, constructing each group of PSG signal samples through the same group of data-enhanced PSG signal samples of the multiple channels, and manually marking the real sleep stage of each group of PSG signal samples;
and step 3: sequentially cascading a visual Transformer frame-level encoder, a bidirectional GRU sequence-level encoder and a softmax layer to construct a sleep staging network, inputting each group of PSG signal samples into the sleep staging network to predict to obtain a predicted sleep stage of each group of PSG signal samples, initializing the sleep staging network through cross-modal transfer learning, constructing a sleep staging network loss function model by combining the real sleep stages of each group of PSG signal samples, and training through an ADAM (automatic dynamic analysis and analysis) optimizer to obtain an optimized sleep staging network;
and 4, step 4: PSG signals at multiple moments are collected in real time, real-time PSG signal samples are obtained through sliding window processing in the step 1, and the stages of real-time sleep are obtained through predicting the real-time PSG signal samples through an optimized sleep staging network.
Preferably, the original PSG signals of each channel at multiple time points in step 1 are defined as:
datac=(datac,1,datac,2,...,datac,L)
c∈[1,C]
wherein, dac represents the original PSG signals of the C channel at multiple moments, dac, n represents the original PSG signals of the C channel at the nth moment, n belongs to [1, L ], L represents the number of original moments, and C represents the number of channels;
the window coverage range of the sliding window processing in the step 1 is as follows: (n- (T) 0 -1)/2) to (n + (T) 0 -1)/2);
The window length of the sliding window processing in the step 1 is as follows: t is a unit of 0 ;
The multiple PSG signal sequences of each channel in step 1 specifically include:
c∈[1,C]
wherein Sdata c Representing the PSG signal, S, in a sliding window at a plurality of times in the c-th channel c,i Represents the PSG signal in the sliding window at the ith time of the c-th channel, i ∈ [1, T 1 ],T 1 Represents the number of PSG signal sequences, C represents the number of channels;
preferably, the data enhancement processing in step 2 specifically includes:
signal denoising processing, signal channel interference processing, signal additive noise processing and signal masking frequency processing are respectively carried out data enhancement processing according to certain random probability;
c∈[1,C]
wherein, sdata' c Data enhanced PSG signal, dS ', representing a plurality of instants of the c-th channel' c,m The mth group data representing the mth channel enhances the processed PSG signal sample, and m belongs to [1, T ] 1 ],T 1 Representing the number of PSG signal samples after data enhancement processing;
S′ i =(dS′ 1,i ,dS′ 2,i ,...dS′ C,i )
i∈[1,T 1 ]
wherein, S' i Representing the i-th group of PSG signal samples, T 1 Represents the number of PSG signal samples, C represents the number of channels;
preferably, the visual Transformer frame-level encoder in step 3 is formed by sequentially cascading a time-frequency transform layer, a time-frequency spectrum partitioning layer, a linear projection layer, a position encoding layer, a multi-head attention layer, a full connection layer and a token connection layer;
the time frequency conversion layer converts the ith group of PSG signal samples S' i Calculating a short-time Fourier transform time-frequency spectrum of the ith group of PSG signal samples through short-time Fourier transform, and expressing the short-time Fourier transform time-frequency spectrum as
F i =(dF 1,i ,dF 2,i ,...dF C,i )
i∈[1,T 1 ]
Wherein, F i Representing the short-time Fourier transform time-frequency spectrum, T, of the ith set of PSG signal samples 1 Representing the number of PSG signal samples, C representing the number of channels, dF c,i A short-time Fourier transform time-frequency spectrum representing the ith set of PSG signal samples of the c channel;
will dF 1,i ,dF 2,i ,...dF G,i Splicing along a frequency axis to obtain a spliced time-frequency spectrum X of the ith group of PSG signal samples fft,i To X fft,i Carrying out logarithmic transformation to obtain spliced logarithmic-time frequency spectrum of the ith group of PSG signal samples, and normalizing the spliced logarithmic-time frequency spectrum of the ith group of PSG signal samples by a normal distribution method to obtain normalized time-frequency spectrum X 'of the ith group of PSG signal samples' fft,i ;
The time frequency spectrum blocking layer normalizes the time frequency spectrum X 'of the ith group of PSG signal samples' fft,i The patch sequence is divided into N pieces of p × p patch sequences, and is expressed as a partitioned time spectrum of the ith set of PSG signal samples, which is as follows:
X i =(x 1,i ,x 2,i ,...,x n,i ,...,x N,i )
n∈[1,N]
wherein x is n,i Representing the nth patch in the blocked time spectrum of the ith group of PSG signal samples, wherein N is the total number of the patches in the blocked time spectrum of the ith group of PSG signal samples;
each patch in the time spectrum after the blocking of the ith group of PSG signal samples is sequentially converted into a patch vector sequence of the ith group of PSG signal samples through linear projection by the linear projection layer, and the specific definition is as follows:
E i =(E i,1 ,E i,2 ,...,E i,N )
wherein E is i,n Representing the nth patch vector of the ith group of PSG signal samples, wherein N is the total number of patches in the spectrum after the partitioning of the ith group of PSG signal samples;
the position coding layer embeds the vector random superposition position of each patch to obtain the coded characteristic sequence of the ith group of PSG signal samples, and the specific definition is as follows:
wherein, P i,n Indicating that the nth patch of the ith group of PSG signal samples is embedded, N is the total number of patches in the frequency spectrum after the blocking of the ith group of PSG signal samples,representing encoded features of an nth patch vector of an ith set of PSG signal samples;
constructing an input Transformer input characteristic sequence of the ith group of PSG signal samples, which comprises the following specific steps:
wherein,CLS representing input of ith set of PSG signal samples]The position of the mark is embedded in the mark,learnable [ CLS ] for sequence start of ith set of PSG signal samples]The mark is marked on the surface of the substrate,representing the n-th patch coded feature in the blocked time spectrum of the ith group of PSG signal samples;
the multi-head attention layer and the full connection layer are toObtaining an output characteristic sequence of the ith group of PSG signal samples through the processing of a multilayer Transformer encoder
Wherein,an output signature sequence representing the i-th set of PSG signal samples,CLS representing the output of the ith set of PSG signal samples]The mark is marked on the surface of the substrate,representing the output characteristic of the nth patch of the ith set of PSG signal samples;
defining the length of a target sleep frame as N 0 And constructing an output characteristic vector sequence of the ith group of PSG signal samples, which is defined as follows:
the token connection layer is toAnd connecting with the average value of Di to obtain the characteristics of a single sleep frame of the ith group of PSG signal samples, which are defined as follows:
wherein,representing the output characteristic of the nth patch of the ith set of PSG signal samples, concat representing the splice;
π i =(π i,1 ,π i,2 ,...,π i,K ) T
wherein, pi i,k Representing the probability of being predicted as sleep stage k for the ith set of PSG signal samples;
the sleep staging network loss function model in the step 3 specifically comprises:
wherein, y i A true sleep stage one-hot encoded vector representing the ith set of PSG signal samples,
aiming at the problems of poor robustness, limited feature representation capability, scarce reliable PSG data set, and the like of PSG noise and artifacts in a sleep staging task, a data enhancement module is designed according to the noise and artifacts of a PSG signal, so that the robustness of the model to the noise and artifacts of the PSG signal is improved; a visual Transformer-based encoder is introduced to improve the capability of model feature representation; and the dependence on a large amount of PSG data is relieved through migration learning.
Drawings
FIG. 1: the general structure of the embodiment of the invention;
FIG. 2 is a schematic diagram: according to the embodiment of the invention, the de-noised EEG upper waveform, EOG lower waveform signals and time-frequency spectrograms are obtained;
FIG. 3: according to the method, after signal interference, EEG upper waveform signals, EOG lower waveform signals and time-frequency spectrograms are obtained;
FIG. 4: according to the embodiment of the invention, after high-frequency noise is added, waveform signals above an EEG and waveform signals below an EOG and a time-frequency spectrogram are obtained;
FIG. 5: according to the embodiment of the invention, after low-frequency noise is added, waveform signals above an EEG and waveform signals below an EOG and a time-frequency spectrogram are obtained;
FIG. 6: according to the embodiment of the invention, after low-frequency noise is added, waveform signals above an EEG and waveform signals below an EOG and a time-frequency spectrogram are obtained;
FIG. 7: the cross-mode transfer learning implementation scheme of the embodiment of the invention is shown schematically;
FIG. 8: the invention discloses a schematic diagram of a bidirectional GRU-based sequence encoder;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In specific implementation, a person skilled in the art can implement the automatic operation process by using a computer software technology, and a system device for implementing the method, such as a computer readable storage medium storing a corresponding computer program according to the technical solution of the present invention and a computer device including the corresponding computer program, should also be within the scope of the present invention.
An automatic sleep staging method based on a visual Transformer according to an embodiment of the present invention is described below with reference to fig. 1 to 8, which includes:
step 1: introducing original PSG signals of a plurality of channels at a plurality of moments, and processing the original PSG signals of each channel at the plurality of moments through a sliding window to obtain a plurality of PSG signal sequences of each channel;
datac=(datac,1,datac,2,...,datac,L)
c∈[1,C]
wherein datac represents the original PSG signal at multiple times of the c-th channel, datac, n represents the original PSG signal at the n-th time of the c-th channel, and n ∈ [1, l ]:
the window coverage of the sliding window processing in the step 1 is as follows: (n- (T) 0 -1)/2) to (n + (T) 0 -1)/2);
The window length of the sliding window processing in the step 1 is as follows: t is 0 ;
The multiple PSG signal sequences of each channel in step 1 specifically include:
c∈[1,C]
wherein Sdata c Representing the PSG signal, S, in a sliding window at a plurality of times in the c-th channel c,i Represents the PSG signal in the sliding window at the ith time of the c channel, i ∈ [1, T ] 1 ],T 1 Representing the number of PSG signal sequences, T 1 =21 denotes the number of original moments, T 0 =3 indicates the window length of the sliding window processing is, and C =2 indicates the number of channels;
step 2: obtaining multiple groups of PSG signal samples after data enhancement processing of each channel by carrying out data enhancement processing on the multiple PSG signal sequences of each channel, constructing each group of PSG signal samples by using the same group of data-enhanced PSG signal samples of the multiple channels, and manually marking the real sleep stage of each group of PSG signal samples;
the data enhancement processing in the step 2 specifically comprises the following steps:
carrying out data enhancement processing on signal denoising processing, signal channel interference processing, signal additive noise processing and signal masking frequency processing according to certain random probability respectively;
c∈[1,C]
wherein, sdata' c Data enhanced PSG signal, dS ', representing a plurality of instants in time of the c-th channel' c,m The mth group of data representing the c channel enhances the processed PSG signal sample, m is [1, T ] 1 ],T 1 Representing the number of PSG signal samples after data enhancement processing;
S′ i =(dS 1,i ,dS′ 2,i ,...dS′ C,i )
i∈[1,T 1 ]
wherein, S' i To representI th group of PSG signal samples, T 1 Represents the number of PSG signal samples, C represents the number of channels;
and the signal denoising treatment: the low-pass signal and the high-pass signal have important values in sleep-related studies. This enhancement uses bandpass filtering to reduce the noise of the PSG signal. The signal passes through a first order Butterworth filter, retaining only in-band frequencies. The probability of band-pass filtering denoising being active during training is 0.5.
As shown in fig. 2, EEG (upper waveform) and EOG (lower waveform) time domain signals and time spectra after signal denoising processing;
the signal channel interference processing comprises the following steps: since F3 and F4 are relatively close to the eye, eye movement artifacts due to eye movement are picked up by frontal leads and the associated deflection of eye movement can be seen in the EEG signal. The eye movement artifact is embodied in that deflections in the EOG leads occur in the frontal area leads. Similarly, the EOG channel will sometimes receive signals from the EEG channel. This artifact is simulated by superimposing the EEG and EOG signals at a particular scale. During training, the likelihood of signal interference activation is 0.4, where the likelihood of receiving an EEG signal by an EOG channel and the likelihood of receiving an EOG signal by an EEG channel are both 50%.
As shown in fig. 3, EEG (upper waveform) and EOG (lower waveform) time domain signals and time spectra after signal channel interference processing;
the signal additive noise processing: the slow frequency artifacts and muscle artifacts that may occur are simulated by adding high frequency low amplitude or low frequency high amplitude noise on the EEG and EOG channels. The slow frequency artifacts are typically due to sweating or body motion associated with breathing. Sweat changes the potential of the electrode, diluting the conductive medium between the electrode and the skin, creating an artifact that resembles a delta wave. Muscle artifacts are typically produced by local muscle activity, which has a frequency of 20-200Hz. These artifacts are simulated by adding separate, identically distributed high frequency low amplitude or low frequency high amplitude noise on the EEG and EOG channels. The probability of additive noise being active during training is 0.5 and the probability of adding high or low frequency noise is 50% each.
EEG (upper waveform) and EOG (lower waveform) time domain signals and time spectra after addition of high frequency noise and addition of low frequency noise, as shown in fig. 4, 5;
the signal masking frequency processing: masking techniques are widely used in research in the field of audio and video, but the effectiveness of masking strategies in the spectrum of PSG signals has not yet been explored. In this enhancement module, a set of consecutive frequency channels or time steps is masked by using a frequency mask and a time mask. Frequency masking is achieved by filtering the time signal through a band-stop filter, while time masking is achieved by setting consecutive sampling points to zero.
As shown in fig. 6, the signals mask the original time-frequency spectrum and masked time-frequency spectrum of the frequency-processed EEG (upper waveform) and EOG (lower waveform);
and step 3: sequentially cascading a visual Transformer frame-level encoder, a bidirectional GRU sequence-level encoder and a softmax layer to construct a sleep staging network, inputting each group of PSG signal samples into the sleep staging network to predict to obtain a predicted sleep stage of each group of PSG signal samples, initializing the sleep staging network through cross-mode transfer learning, constructing a sleep staging network loss function model by combining the real sleep stages of each group of PSG signal samples, and training through an ADAM (adaptive dynamic analysis and analysis) optimizer to obtain an optimized sleep staging network;
the visual Transformer frame-level encoder is formed by sequentially cascading a time-frequency transform layer, a time-frequency spectrum blocking layer, a linear projection layer, a position encoding layer, a multi-head attention layer, a full connection layer and a token connection layer;
the time frequency conversion layer converts the ith group of PSG signal samples S' i Calculating a short-time Fourier transform time-frequency spectrum of the ith group of PSG signal samples through short-time Fourier transform, and expressing the short-time Fourier transform time-frequency spectrum as
F i =(dF 1,i ,dF 2,i ,...dF G,i )
i∈[1,T 1 ]
Wherein, F i Short-time Fourier transform time-frequency spectrum, T, representing the i-th set of PSG signal samples 1 Representing the number of PSG signal samples, C representing the number of channels, dF c,i Denotes the ith group of the c channelA short-time fourier transform time-frequency spectrum of the PSG signal samples;
will dF 1,i ,dF 2,i ,...dF C,i Splicing along a frequency axis to obtain a spliced time-frequency spectrum X of the ith group of PSG signal samples fft,i To X fft,i Performing logarithmic transformation to obtain spliced logarithmic-time frequency spectrum of the ith group of PSG signal samples, and normalizing the spliced logarithmic-time frequency spectrum of the ith group of PSG signal samples by a normal distribution method to obtain normalized time frequency spectrum X 'of the ith group of PSG signal samples' fft,i ;
The time frequency spectrum blocking layer normalizes the time frequency spectrum X 'of the ith group of PSG signal samples' fft,i The partitioning into N p × p patch sequences is expressed as a block-wise time spectrum of the ith set of PSG signal samples, which is as follows:
X i =(x 1,i ,x 2,i ,...,x n,i ,...,x N,i )
n∈[1,N]
wherein x is n,i Representing the nth patch in the partitioned time spectrum of the ith group of PSG signal samples, wherein N is the total number of the patches in the partitioned time spectrum of the ith group of PSG signal samples;
each patch in the time spectrum after the blocking of the ith group of PSG signal samples is sequentially converted into a patch vector sequence of the ith group of PSG signal samples through linear projection by the linear projection layer, which is specifically defined as follows:
E i =(E i,1 ,E i,2 ,...,E i,N )
wherein E is i,n Representing the nth patch vector of the ith group of PSG signal samples, wherein N is the total number of patches in the spectrum after the partitioning of the ith group of PSG signal samples;
the position coding layer embeds the vector random superposition position of each patch to obtain the coded characteristic sequence of the ith group of PSG signal samples, and the specific definition is as follows:
wherein, P i,n Indicating that the position of the nth patch of the ith group of PSG signal samples is embedded, N is the total number of patches in the spectrum after the blocking of the ith group of PSG signal samples,representing the encoded features of the nth patch vector of the ith set of PSG signal samples;
constructing an input Transformer input characteristic sequence of the ith group of PSG signal samples, which comprises the following steps:
wherein,CLS representing input of ith set of PSG signal samples]The position of the mark is embedded in the mark,
learnable [ CLS ] for the beginning of a sequence of i-th group of PSG signal samples]The mark is marked on the surface of the substrate,representing the n-th patch coded feature in the blocked time spectrum of the ith group of PSG signal samples;
the multi-head attention layer and the full connection layer are toObtaining an output characteristic sequence of the ith group of PSG signal samples through the processing of a multilayer Transformer encoder
Wherein,representing the output signature sequence of the ith set of PSG signal samples,CLS representing the output of the ith set of PSG signal samples]The mark is marked on the surface of the substrate,representing the output characteristic of the nth patch of the ith set of PSG signal samples;
defining the length of a target sleep frame as N 0 And constructing an output characteristic vector sequence of the ith group of PSG signal samples, which is defined as follows:
the token connection layer is toAnd connecting with the average value of Di to obtain the characteristics of a single sleep frame of the ith group of PSG signal samples, which are defined as follows:
wherein,representing the output characteristic of the nth patch of the ith set of PSG signal samples, concat representing the splice;
The channel weight average: and averaging the weights corresponding to the three input channels of the linear projection layer of the pre-trained Transformer to serve as the weight of the linear projection layer of the visual Transformer frame-level encoder.
The input length is adaptive: since the input shape of the pre-trained Transformer is fixed (224 × 224 or 384 × 384), this is different from a typical PSG time-frequency diagram. In order to solve the problem of inconsistent position embedding length, the position embedding is self-adapted by using a cutting method and a bilinear interpolation method.
FIG. 7 is a schematic diagram of a cross-modal migration learning implementation;
Fig. 8 is a schematic diagram of a bidirectional GRU based sequence encoder;
π i =(π i,1 ,π i,2 ,...,π i,K ) τ
wherein, pi i,k Representing the probability of being predicted as sleep stage k for the ith set of PSG signal samples;
the sleep staging network loss function model in the step 3 specifically comprises:
wherein, y i Representing the one-hot encoded vector of the i-th group of PSG signal samples;
and 4, step 4: the polysomnography monitoring device is worn on a human body according to sleep medical standards, PSG signals at multiple moments are collected in real time and transmitted to a computer, the computer processes the PSG signals collected at the multiple moments in real time through a sliding window in the step 1 to obtain real-time PSG signal samples, and the real-time PSG signal samples are predicted through an optimized sleep staging network to obtain the stages of the real-time sleep.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (6)
1. A vision transducer-based automatic sleep staging method is characterized by comprising the following steps:
step 1: introducing original PSG signals of a plurality of channels at a plurality of moments, and processing the original PSG signals of each channel at the plurality of moments through a sliding window to obtain a plurality of PSG signal sequences of each channel;
step 2: obtaining multiple groups of PSG signal samples after data enhancement processing of each channel by carrying out data enhancement processing on the multiple PSG signal sequences of each channel, constructing each group of PSG signal samples by using the same group of data-enhanced PSG signal samples of the multiple channels, and manually marking the real sleep stage of each group of PSG signal samples;
and 3, step 3: sequentially cascading a visual Transformer frame-level encoder, a bidirectional GRU sequence-level encoder and a softmax layer to construct a sleep staging network, inputting each group of PSG signal samples into the sleep staging network to predict to obtain a predicted sleep stage of each group of PSG signal samples, initializing the sleep staging network through cross-modal transfer learning, constructing a sleep staging network loss function model by combining the real sleep stages of each group of PSG signal samples, and training through an ADAM (automatic dynamic analysis and analysis) optimizer to obtain an optimized sleep staging network;
and 4, step 4: PSG signals at multiple moments are collected in real time, real-time PSG signal samples are obtained through sliding window processing in the step 1, and the real-time PSG signal samples are predicted through an optimized sleep staging network to obtain real-time sleep stages.
2. The visual transducer-based automated sleep staging method according to claim 1, characterized in that:
step 1, the original PSG signals of each channel at multiple times are defined as:
datac=(datac,1,datac,2,...,datac,L)
c∈[1,C]
wherein, dac represents the original PSG signals of multiple moments of the C-th channel, dac, n represents the original PSG signals of the n-th moment of the C-th channel, n belongs to [1, L ], L represents the number of original moments, and C represents the number of channels;
the window coverage of the sliding window processing in the step 1 is as follows:
(n-(T 0 -1)/2) to (n + (T) 0 -1)/2);
The window length of the sliding window processing in the step 1 is as follows: t is 0 ;
The multiple PSG signal sequences of each channel in step 1 specifically include:
c∈[1,C]
wherein Sdata c Representing the PSG signal, S, in a sliding window at a plurality of times in the c-th channel c,i Represents the PSG signal in the sliding window at the ith time of the c channel, i ∈ [1, T ] 1 ],T 1 Indicates the number of PSG signal sequences, and C indicates the number of channels.
3. The visual Transformer-based automated sleep staging method according to claim 1, characterized in that:
the data enhancement processing in the step 2 specifically comprises the following steps:
signal denoising processing, signal channel interference processing, signal additive noise processing and signal masking frequency processing are respectively carried out data enhancement processing according to certain random probability;
step 2, the multiple groups of data of each channel are subjected to enhancement processing to obtain PSG signal samples, specifically:
c∈[1,C]
wherein, sdata' c Data enhanced PSG signal, dS ', representing a plurality of instants in time of the c-th channel' c,m The mth group of data representing the c channel, m ∈ [1, T, samples of the post-enhancement PSG signal 1 ],T 1 Representing the number of PSG signal samples after data enhancement processing;
step 2, constructing each group of PSG signal samples by the PSG signal samples after the same group of data enhancement processing of the plurality of channels, specifically as follows:
S′ i =(dS′ 1,i ,dS′ 2,i ,...dS′ G,i )
i∈[1,T 1 ]
wherein, S' i Representing the ith set of PSG signal samples, T 1 Representing the number of PSG signal samples and C representing the number of channels.
4. The visual Transformer-based automated sleep staging method according to claim 1, characterized in that:
the visual Transformer frame-level encoder is formed by sequentially cascading a time-frequency transform layer, a time-frequency spectrum blocking layer, a linear projection layer, a position encoding layer, a multi-head attention layer, a full connection layer and a token connection layer;
the time frequency conversion layer converts the ith group of PSG signal samplesCalculating a short-time Fourier transform time-frequency spectrum of the ith group of PSG signal samples through short-time Fourier transform, and expressing the short-time Fourier transform time-frequency spectrum as
F i =(dF 1,i ,dF 2,i ,...dF G,i )
i∈[1,T 1 ]
Wherein, F i Representing the short-time Fourier transform time-frequency spectrum, T, of the ith set of PSG signal samples 1 Representing the number of PSG signal samples, C representing the number of channels, dF c,i A short-time Fourier transform time-frequency spectrum representing the ith group of PSG signal samples of the c channel;
will dF 1,i ,dF 2,i ,...dF G,i Splicing along a frequency axis to obtain a spliced time-frequency spectrum X of the ith group of PSG signal samples fft,i To X fft,i Carrying out logarithmic transformation to obtain spliced logarithmic time spectrum of the ith group of PSG signal samples, and normalizing the spliced logarithmic time spectrum of the ith group of PSG signal samples by a normal distribution method to obtain normalized time spectrum of the ith group of PSG signal samples
The time frequency spectrum blocking layer normalizes the time frequency spectrum of the ith group of PSG signal samplesThe patch sequence divided into N pieces of p × p size is expressed as the i-th set of PSG signalsThe time-frequency spectrum after the sample is partitioned specifically is as follows:
X i =(x 1,i ,x 2,i ,...,x n,i ,...,x N,i )
n∈[1,N]
wherein x is n,i Representing the nth patch in the blocked time spectrum of the ith group of PSG signal samples, wherein N is the total number of the patches in the blocked time spectrum of the ith group of PSG signal samples;
each patch in the time spectrum after the blocking of the ith group of PSG signal samples is sequentially converted into a patch vector sequence of the ith group of PSG signal samples through linear projection by the linear projection layer, and the specific definition is as follows:
E i =(E i,1 ,E i,2 ,...,E i,N )
wherein, E i,n Representing the nth patch vector of the ith group of PSG signal samples, wherein N is the total number of patches in the spectrum after the blocking of the ith group of PSG signal samples;
the position coding layer embeds the vector random superposition position of each patch to obtain the coded feature sequence of the ith group of PSG signal samples, and the specific definition is as follows:
wherein, P i,n Indicating that the position of the nth patch of the ith group of PSG signal samples is embedded, N is the total number of patches in the spectrum after the blocking of the ith group of PSG signal samples,representing encoded features of an nth patch vector of an ith set of PSG signal samples;
constructing an input Transformer input characteristic sequence of the ith group of PSG signal samples, which comprises the following specific steps:
wherein,CLS representing input of ith set of PSG signal samples]The position of the mark is embedded in the mark,
learnable [ CLS ] for sequence start of ith set of PSG signal samples]The mark is marked on the surface of the substrate,representing the n-th patch coded feature in the blocked time spectrum of the ith group of PSG signal samples;
the multi-head attention layer and the full connection layer are toObtaining an output characteristic sequence of the ith group of PSG signal samples through the processing of a multilayer Transformer encoder
Wherein,representing the output signature sequence of the ith set of PSG signal samples,CLS representing the output of the ith set of PSG signal samples]The mark is marked on the surface of the substrate,an output characteristic representing the nth patch of the ith set of PSG signal samples;
defining the length of a target sleep frame as N 0 And constructing an output characteristic vector sequence of the ith group of PSG signal samples, which is defined as follows:
the token connection layer is toAnd connecting with the average value of Di to obtain the characteristics of a single sleep frame of the ith group of PSG signal samples, which are defined as follows:
wherein,representing the output characteristic of the nth patch of the ith set of PSG signal samples, concat representing the splice;
5. The visual transducer-based automated sleep staging method according to claim 1, characterized in that:
step 3, the bidirectional GRU sequence level encoder converts the characteristic sequence of a single sleep frameConversion to sequence-level feature vector sequences
Step 3, the softmax layer converts the sequence level feature vector sequenceMapping to a corresponding predicted sleep stage probability sequence, the predicted sleep stage probability sequence being defined as:
π i =(π i,1 ,π i,2 ,...,π i,K ) T
wherein, pi i,k Representing the probability of being predicted as sleep stage k for the ith set of PSG signal samples.
6. The visual transducer-based automated sleep staging method according to claim 1, characterized in that:
the sleep staging network loss function model in the step 3 specifically comprises the following steps:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210965248.6A CN115374815A (en) | 2022-08-12 | 2022-08-12 | Automatic sleep staging method based on visual Transformer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210965248.6A CN115374815A (en) | 2022-08-12 | 2022-08-12 | Automatic sleep staging method based on visual Transformer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115374815A true CN115374815A (en) | 2022-11-22 |
Family
ID=84066240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210965248.6A Pending CN115374815A (en) | 2022-08-12 | 2022-08-12 | Automatic sleep staging method based on visual Transformer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115374815A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117316369A (en) * | 2023-08-24 | 2023-12-29 | 兰州交通大学 | Chest image diagnosis report automatic generation method for balancing cross-mode information |
-
2022
- 2022-08-12 CN CN202210965248.6A patent/CN115374815A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117316369A (en) * | 2023-08-24 | 2023-12-29 | 兰州交通大学 | Chest image diagnosis report automatic generation method for balancing cross-mode information |
CN117316369B (en) * | 2023-08-24 | 2024-05-07 | 兰州交通大学 | Chest image diagnosis report automatic generation method for balancing cross-mode information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108670200B (en) | Sleep snore classification detection method and system based on deep learning | |
CN114376564B (en) | Sleep staging method, system, device and medium based on ballistocardiogram signals | |
CN107736894A (en) | A kind of electrocardiosignal Emotion identification method based on deep learning | |
WO2021114761A1 (en) | Lung rale artificial intelligence real-time classification method, system and device of electronic stethoscope, and readable storage medium | |
CN107657868A (en) | A kind of teaching tracking accessory system based on brain wave | |
CN110946576A (en) | Visual evoked potential emotion recognition method based on width learning | |
CN110731778B (en) | Method and system for recognizing breathing sound signal based on visualization | |
CN110600053A (en) | Cerebral stroke dysarthria risk prediction method based on ResNet and LSTM network | |
CN114190944B (en) | Robust emotion recognition method based on electroencephalogram signals | |
CN112370015A (en) | Physiological signal quality evaluation method based on gram angular field | |
CN114469124A (en) | Method for identifying abnormal electrocardiosignals in motion process | |
Tang et al. | ECG de-noising based on empirical mode decomposition | |
CN115374815A (en) | Automatic sleep staging method based on visual Transformer | |
CN113609975A (en) | Modeling method for tremor detection, hand tremor detection device and method | |
CN113796889A (en) | Auxiliary electronic stethoscope signal discrimination method based on deep learning | |
CN113576472B (en) | Blood oxygen signal segmentation method based on full convolution neural network | |
CN113974607B (en) | Sleep snore detecting system based on pulse neural network | |
CN112617761B (en) | Sleep stage staging method for self-adaptive focalization generation | |
KR20220158462A (en) | EMG signal-based recognition information extraction system and EMG signal-based recognition information extraction method using the same | |
He et al. | HMT: An EEG Signal Classification Method Based on CNN Architecture | |
CN116196015A (en) | Electroencephalogram classification model based on rhythm feature fusion convolutional neural network | |
CN115270847A (en) | Design decision electroencephalogram recognition method based on wavelet packet decomposition and convolutional neural network | |
CN114569116A (en) | Three-channel image and transfer learning-based ballistocardiogram ventricular fibrillation auxiliary diagnosis system | |
Murthy et al. | Design and implementation of hybrid techniques and DA-based reconfigurable FIR filter design for noise removal in EEG signals on FPGA | |
CN111493864A (en) | EEG signal mixed noise processing method, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |