CN115374815A - Automatic sleep staging method based on visual Transformer - Google Patents

Automatic sleep staging method based on visual Transformer Download PDF

Info

Publication number
CN115374815A
CN115374815A CN202210965248.6A CN202210965248A CN115374815A CN 115374815 A CN115374815 A CN 115374815A CN 202210965248 A CN202210965248 A CN 202210965248A CN 115374815 A CN115374815 A CN 115374815A
Authority
CN
China
Prior art keywords
psg
signal samples
psg signal
time
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210965248.6A
Other languages
Chinese (zh)
Inventor
任延珍
彭荔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202210965248.6A priority Critical patent/CN115374815A/en
Publication of CN115374815A publication Critical patent/CN115374815A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4806Sleep evaluation
    • A61B5/4812Detecting sleep stages or cycles
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7203Signal processing specially adapted for physiological signals or for diagnostic purposes for noise prevention, reduction or removal
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7253Details of waveform analysis characterised by using transforms
    • A61B5/7257Details of waveform analysis characterised by using transforms using Fourier transforms
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Public Health (AREA)
  • Signal Processing (AREA)
  • Veterinary Medicine (AREA)
  • Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Surgery (AREA)
  • Psychiatry (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses an automatic sleep staging method based on a visual Transformer. The method comprises the steps of processing an original PSG signal through a sliding window to obtain a PSG signal sequence; carrying out data enhancement on the PSG signal sequence to obtain an enhanced signal sample; establishing a sleep staging network by cascading a visual Transformer frame-level encoder, a bidirectional GRU sequence-level encoder and a softmax layer, inputting each group of PSG signal samples into the sleep staging network, predicting the sleep stage of the PSG signal samples, initializing the network by cross-modal transfer learning, establishing a loss function by combining the real sleep stage of the PSG signal samples, and training by using an ADAM optimizer to obtain an optimized sleep staging network; and (4) acquiring a PSG signal in real time, and predicting a sleep stage by passing a PSG signal sample through an optimized sleep stage network. According to the invention, data enhancement is designed according to the noise and the artifact of the PSG signal, so that the robustness of the network to the noise and the artifact of the PSG signal is improved; an encoder based on a visual Transformer is introduced to improve the network feature representation capability; through transfer learning, dependence on a large amount of PSG data is relieved.

Description

Automatic sleep staging method based on visual Transformer
Technical Field
The invention belongs to the technical field of sleep quality assessment, and particularly relates to an automatic sleep staging method based on a visual Transformer.
Background
Polysomnography (PSG) is a standard technology clinically used for sleep state monitoring, comprehensively records various physiological indexes of a monitored object in a sleep process, including nerve signals such as electroencephalogram (EEG), electrooculogram (EOG) and Electromyogram (EMG) and respiratory monitoring data such as oral-nasal airflow, chest and abdominal pressure and blood oxygen saturation concentration, and can be used as an effective basis for evaluating sleep quality and diagnosing sleep disorder symptoms. However, sleep physiological signal analysis for a long time depends on sleep experts to manually check polysomnography signals, the method has the problems of low efficiency, high labor cost and the like, and subjective differences of expert knowledge can also cause errors of evaluation results. Therefore, there is a need to develop a robust, high-performance automated sleep staging tool to assist physicians in sleep staging, improving sleep staging efficiency and accuracy.
The automated sleep staging technique is the basis for expanding sleep assessment and diagnosis, serving millions of people with sleep disorders, and making sleep monitoring possible in a home environment. Although the existing automatic sleep staging model obtains good automatic sleep staging performance and can exceed the staging accuracy of a single human expert, the existing model still has some problems to be solved:
visual transformers can extract valid feature representations, but the performance on PSG signals has not been explored; while the position information on the time and frequency axes is crucial for the fourier transformed PSG signal. However, in the recent sleep staging method, only the position relationship on the time axis is considered, and the visual Transformer can capture the position information on the time axis and the frequency axis simultaneously, so that the defects of the existing model are overcome.
The Transformer-based deep learning approach requires a large amount of training data to surpass the performance of CNN. Current Transformer-based automated sleep staging models perform well when pre-trained on large-scale PSG datasets, but the accuracy of sleep staging decreases significantly on smaller datasets. However, it is difficult to obtain large-scale, accurately labeled PSG data sets, and training the model from scratch on large PSG data sets consumes a significant amount of computational resources.
The existing automatic sleep staging model has low robustness to noise and artifacts. Due to human factors and the influence of an acquisition environment, noise and artifacts which are difficult to avoid exist in the PSG signal. However, research into the design of data enhancement modules for PSG signals is still quite limited. Most work has taken enhancement means directly in image and audio tasks without taking into account the characteristics of the PSG signal itself.
There is therefore a substantial need to develop a robust automatic sleep staging technique based on visual transducers.
Disclosure of Invention
Aiming at the problems of limited performance, scarce reliable PSG data set, poor robustness of a model to PSG signal noise and artifacts and the like of feature representation in a sleep staging task, the invention introduces a visual transform-based encoder, relieves the dependence on a large amount of PSG data through transfer learning, and designs a data enhancement module aiming at the noise and artifacts of the PSG signal, thereby learning the feature representation with high performance and high robustness.
The model comprises three key ideas: designing a frame-level encoder based on a visual Transformer, capturing short-term context information by adopting a sliding window, and realizing long-term sequence-level modeling by using a GRU (generalized regression Unit); through cross-modal transfer learning, a pre-training model on an out-of-domain data set is used for fine tuning on a sleep PSG data set so as to reduce dependence on a large-scale PSG data set; a dynamic data enhancement module for EEG and EOG channels is proposed to enable the model to learn more robust feature representations.
The method is an automatic sleep staging method based on a visual transducer, and comprises the following specific steps:
step 1: introducing original PSG signals of a plurality of channels at a plurality of moments, and processing the original PSG signals of each channel at the plurality of moments through a sliding window to obtain a plurality of PSG signal sequences of each channel;
step 2: obtaining multiple groups of PSG signal samples after data enhancement processing of each channel by carrying out data enhancement processing on multiple PSG signal sequences of each channel, constructing each group of PSG signal samples through the same group of data-enhanced PSG signal samples of the multiple channels, and manually marking the real sleep stage of each group of PSG signal samples;
and step 3: sequentially cascading a visual Transformer frame-level encoder, a bidirectional GRU sequence-level encoder and a softmax layer to construct a sleep staging network, inputting each group of PSG signal samples into the sleep staging network to predict to obtain a predicted sleep stage of each group of PSG signal samples, initializing the sleep staging network through cross-modal transfer learning, constructing a sleep staging network loss function model by combining the real sleep stages of each group of PSG signal samples, and training through an ADAM (automatic dynamic analysis and analysis) optimizer to obtain an optimized sleep staging network;
and 4, step 4: PSG signals at multiple moments are collected in real time, real-time PSG signal samples are obtained through sliding window processing in the step 1, and the stages of real-time sleep are obtained through predicting the real-time PSG signal samples through an optimized sleep staging network.
Preferably, the original PSG signals of each channel at multiple time points in step 1 are defined as:
datac=(datac,1,datac,2,...,datac,L)
c∈[1,C]
wherein, dac represents the original PSG signals of the C channel at multiple moments, dac, n represents the original PSG signals of the C channel at the nth moment, n belongs to [1, L ], L represents the number of original moments, and C represents the number of channels;
the window coverage range of the sliding window processing in the step 1 is as follows: (n- (T) 0 -1)/2) to (n + (T) 0 -1)/2);
The window length of the sliding window processing in the step 1 is as follows: t is a unit of 0
The multiple PSG signal sequences of each channel in step 1 specifically include:
Figure BDA0003794413960000031
Figure BDA0003794413960000032
c∈[1,C]
wherein Sdata c Representing the PSG signal, S, in a sliding window at a plurality of times in the c-th channel c,i Represents the PSG signal in the sliding window at the ith time of the c-th channel, i ∈ [1, T 1 ],T 1 Represents the number of PSG signal sequences, C represents the number of channels;
preferably, the data enhancement processing in step 2 specifically includes:
signal denoising processing, signal channel interference processing, signal additive noise processing and signal masking frequency processing are respectively carried out data enhancement processing according to certain random probability;
step 2, the multiple groups of data of each channel are subjected to enhancement processing to obtain PSG signal samples, specifically:
Figure BDA0003794413960000041
c∈[1,C]
wherein, sdata' c Data enhanced PSG signal, dS ', representing a plurality of instants of the c-th channel' c,m The mth group data representing the mth channel enhances the processed PSG signal sample, and m belongs to [1, T ] 1 ],T 1 Representing the number of PSG signal samples after data enhancement processing;
step 2, constructing each group of PSG signal samples by the PSG signal samples after the same group of data enhancement processing of the plurality of channels, specifically as follows:
S′ i =(dS′ 1,i ,dS′ 2,i ,...dS′ C,i )
i∈[1,T 1 ]
wherein, S' i Representing the i-th group of PSG signal samples, T 1 Represents the number of PSG signal samples, C represents the number of channels;
preferably, the visual Transformer frame-level encoder in step 3 is formed by sequentially cascading a time-frequency transform layer, a time-frequency spectrum partitioning layer, a linear projection layer, a position encoding layer, a multi-head attention layer, a full connection layer and a token connection layer;
the time frequency conversion layer converts the ith group of PSG signal samples S' i Calculating a short-time Fourier transform time-frequency spectrum of the ith group of PSG signal samples through short-time Fourier transform, and expressing the short-time Fourier transform time-frequency spectrum as
F i =(dF 1,i ,dF 2,i ,...dF C,i )
i∈[1,T 1 ]
Wherein, F i Representing the short-time Fourier transform time-frequency spectrum, T, of the ith set of PSG signal samples 1 Representing the number of PSG signal samples, C representing the number of channels, dF c,i A short-time Fourier transform time-frequency spectrum representing the ith set of PSG signal samples of the c channel;
will dF 1,i ,dF 2,i ,...dF G,i Splicing along a frequency axis to obtain a spliced time-frequency spectrum X of the ith group of PSG signal samples fft,i To X fft,i Carrying out logarithmic transformation to obtain spliced logarithmic-time frequency spectrum of the ith group of PSG signal samples, and normalizing the spliced logarithmic-time frequency spectrum of the ith group of PSG signal samples by a normal distribution method to obtain normalized time-frequency spectrum X 'of the ith group of PSG signal samples' fft,i
The time frequency spectrum blocking layer normalizes the time frequency spectrum X 'of the ith group of PSG signal samples' fft,i The patch sequence is divided into N pieces of p × p patch sequences, and is expressed as a partitioned time spectrum of the ith set of PSG signal samples, which is as follows:
X i =(x 1,i ,x 2,i ,...,x n,i ,...,x N,i )
n∈[1,N]
wherein x is n,i Representing the nth patch in the blocked time spectrum of the ith group of PSG signal samples, wherein N is the total number of the patches in the blocked time spectrum of the ith group of PSG signal samples;
each patch in the time spectrum after the blocking of the ith group of PSG signal samples is sequentially converted into a patch vector sequence of the ith group of PSG signal samples through linear projection by the linear projection layer, and the specific definition is as follows:
E i =(E i,1 ,E i,2 ,...,E i,N )
wherein E is i,n Representing the nth patch vector of the ith group of PSG signal samples, wherein N is the total number of patches in the spectrum after the partitioning of the ith group of PSG signal samples;
the position coding layer embeds the vector random superposition position of each patch to obtain the coded characteristic sequence of the ith group of PSG signal samples, and the specific definition is as follows:
Figure 1
Figure BDA0003794413960000052
wherein, P i,n Indicating that the nth patch of the ith group of PSG signal samples is embedded, N is the total number of patches in the frequency spectrum after the blocking of the ith group of PSG signal samples,
Figure BDA0003794413960000053
representing encoded features of an nth patch vector of an ith set of PSG signal samples;
constructing an input Transformer input characteristic sequence of the ith group of PSG signal samples, which comprises the following specific steps:
Figure BDA0003794413960000054
wherein,
Figure BDA0003794413960000055
CLS representing input of ith set of PSG signal samples]The position of the mark is embedded in the mark,
Figure BDA0003794413960000056
learnable [ CLS ] for sequence start of ith set of PSG signal samples]The mark is marked on the surface of the substrate,
Figure BDA0003794413960000057
representing the n-th patch coded feature in the blocked time spectrum of the ith group of PSG signal samples;
the multi-head attention layer and the full connection layer are to
Figure BDA0003794413960000061
Obtaining an output characteristic sequence of the ith group of PSG signal samples through the processing of a multilayer Transformer encoder
Figure BDA0003794413960000062
Figure BDA0003794413960000063
Wherein,
Figure BDA0003794413960000064
an output signature sequence representing the i-th set of PSG signal samples,
Figure BDA0003794413960000065
CLS representing the output of the ith set of PSG signal samples]The mark is marked on the surface of the substrate,
Figure BDA0003794413960000066
representing the output characteristic of the nth patch of the ith set of PSG signal samples;
defining the length of a target sleep frame as N 0 And constructing an output characteristic vector sequence of the ith group of PSG signal samples, which is defined as follows:
Figure BDA0003794413960000067
the token connection layer is to
Figure BDA0003794413960000068
And connecting with the average value of Di to obtain the characteristics of a single sleep frame of the ith group of PSG signal samples, which are defined as follows:
Figure BDA0003794413960000069
wherein,
Figure BDA00037944139600000610
representing the output characteristic of the nth patch of the ith set of PSG signal samples, concat representing the splice;
will be provided with
Figure BDA00037944139600000615
Defining a characteristic sequence of a single sleep frame;
step 3, the bidirectional GRU sequence level encoder converts the characteristic sequence of a single sleep frame
Figure BDA00037944139600000611
Conversion to sequence-level feature vector sequences
Figure BDA00037944139600000612
Step 3, the softmax layer sequences the sequence-level feature vector sequence
Figure BDA00037944139600000613
Mapping to a corresponding predicted sleep stage probability sequence, the predicted sleep stage probability sequence being defined as:
π i =(π i,1 ,π i,2 ,...,π i,K ) T
wherein, pi i,k Representing the probability of being predicted as sleep stage k for the ith set of PSG signal samples;
the sleep staging network loss function model in the step 3 specifically comprises:
Figure BDA00037944139600000614
wherein, y i A true sleep stage one-hot encoded vector representing the ith set of PSG signal samples,
Figure BDA0003794413960000071
to predict sleep stage probability sequences;
aiming at the problems of poor robustness, limited feature representation capability, scarce reliable PSG data set, and the like of PSG noise and artifacts in a sleep staging task, a data enhancement module is designed according to the noise and artifacts of a PSG signal, so that the robustness of the model to the noise and artifacts of the PSG signal is improved; a visual Transformer-based encoder is introduced to improve the capability of model feature representation; and the dependence on a large amount of PSG data is relieved through migration learning.
Drawings
FIG. 1: the general structure of the embodiment of the invention;
FIG. 2 is a schematic diagram: according to the embodiment of the invention, the de-noised EEG upper waveform, EOG lower waveform signals and time-frequency spectrograms are obtained;
FIG. 3: according to the method, after signal interference, EEG upper waveform signals, EOG lower waveform signals and time-frequency spectrograms are obtained;
FIG. 4: according to the embodiment of the invention, after high-frequency noise is added, waveform signals above an EEG and waveform signals below an EOG and a time-frequency spectrogram are obtained;
FIG. 5: according to the embodiment of the invention, after low-frequency noise is added, waveform signals above an EEG and waveform signals below an EOG and a time-frequency spectrogram are obtained;
FIG. 6: according to the embodiment of the invention, after low-frequency noise is added, waveform signals above an EEG and waveform signals below an EOG and a time-frequency spectrogram are obtained;
FIG. 7: the cross-mode transfer learning implementation scheme of the embodiment of the invention is shown schematically;
FIG. 8: the invention discloses a schematic diagram of a bidirectional GRU-based sequence encoder;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In specific implementation, a person skilled in the art can implement the automatic operation process by using a computer software technology, and a system device for implementing the method, such as a computer readable storage medium storing a corresponding computer program according to the technical solution of the present invention and a computer device including the corresponding computer program, should also be within the scope of the present invention.
An automatic sleep staging method based on a visual Transformer according to an embodiment of the present invention is described below with reference to fig. 1 to 8, which includes:
step 1: introducing original PSG signals of a plurality of channels at a plurality of moments, and processing the original PSG signals of each channel at the plurality of moments through a sliding window to obtain a plurality of PSG signal sequences of each channel;
step 1, the original PSG signals of each channel at multiple times are defined as:
datac=(datac,1,datac,2,...,datac,L)
c∈[1,C]
wherein datac represents the original PSG signal at multiple times of the c-th channel, datac, n represents the original PSG signal at the n-th time of the c-th channel, and n ∈ [1, l ]:
the window coverage of the sliding window processing in the step 1 is as follows: (n- (T) 0 -1)/2) to (n + (T) 0 -1)/2);
The window length of the sliding window processing in the step 1 is as follows: t is 0
The multiple PSG signal sequences of each channel in step 1 specifically include:
Figure BDA0003794413960000081
Figure BDA0003794413960000082
c∈[1,C]
wherein Sdata c Representing the PSG signal, S, in a sliding window at a plurality of times in the c-th channel c,i Represents the PSG signal in the sliding window at the ith time of the c channel, i ∈ [1, T ] 1 ],T 1 Representing the number of PSG signal sequences, T 1 =21 denotes the number of original moments, T 0 =3 indicates the window length of the sliding window processing is, and C =2 indicates the number of channels;
step 2: obtaining multiple groups of PSG signal samples after data enhancement processing of each channel by carrying out data enhancement processing on the multiple PSG signal sequences of each channel, constructing each group of PSG signal samples by using the same group of data-enhanced PSG signal samples of the multiple channels, and manually marking the real sleep stage of each group of PSG signal samples;
the data enhancement processing in the step 2 specifically comprises the following steps:
carrying out data enhancement processing on signal denoising processing, signal channel interference processing, signal additive noise processing and signal masking frequency processing according to certain random probability respectively;
step 2, the multiple groups of data of each channel are subjected to enhancement processing to obtain PSG signal samples, specifically:
Figure BDA0003794413960000091
c∈[1,C]
wherein, sdata' c Data enhanced PSG signal, dS ', representing a plurality of instants in time of the c-th channel' c,m The mth group of data representing the c channel enhances the processed PSG signal sample, m is [1, T ] 1 ],T 1 Representing the number of PSG signal samples after data enhancement processing;
step 2, constructing each group of PSG signal samples by the PSG signal samples after the same group of data enhancement processing of the plurality of channels, specifically as follows:
S′ i =(dS 1,i ,dS′ 2,i ,...dS′ C,i )
i∈[1,T 1 ]
wherein, S' i To representI th group of PSG signal samples, T 1 Represents the number of PSG signal samples, C represents the number of channels;
and the signal denoising treatment: the low-pass signal and the high-pass signal have important values in sleep-related studies. This enhancement uses bandpass filtering to reduce the noise of the PSG signal. The signal passes through a first order Butterworth filter, retaining only in-band frequencies. The probability of band-pass filtering denoising being active during training is 0.5.
As shown in fig. 2, EEG (upper waveform) and EOG (lower waveform) time domain signals and time spectra after signal denoising processing;
the signal channel interference processing comprises the following steps: since F3 and F4 are relatively close to the eye, eye movement artifacts due to eye movement are picked up by frontal leads and the associated deflection of eye movement can be seen in the EEG signal. The eye movement artifact is embodied in that deflections in the EOG leads occur in the frontal area leads. Similarly, the EOG channel will sometimes receive signals from the EEG channel. This artifact is simulated by superimposing the EEG and EOG signals at a particular scale. During training, the likelihood of signal interference activation is 0.4, where the likelihood of receiving an EEG signal by an EOG channel and the likelihood of receiving an EOG signal by an EEG channel are both 50%.
As shown in fig. 3, EEG (upper waveform) and EOG (lower waveform) time domain signals and time spectra after signal channel interference processing;
the signal additive noise processing: the slow frequency artifacts and muscle artifacts that may occur are simulated by adding high frequency low amplitude or low frequency high amplitude noise on the EEG and EOG channels. The slow frequency artifacts are typically due to sweating or body motion associated with breathing. Sweat changes the potential of the electrode, diluting the conductive medium between the electrode and the skin, creating an artifact that resembles a delta wave. Muscle artifacts are typically produced by local muscle activity, which has a frequency of 20-200Hz. These artifacts are simulated by adding separate, identically distributed high frequency low amplitude or low frequency high amplitude noise on the EEG and EOG channels. The probability of additive noise being active during training is 0.5 and the probability of adding high or low frequency noise is 50% each.
EEG (upper waveform) and EOG (lower waveform) time domain signals and time spectra after addition of high frequency noise and addition of low frequency noise, as shown in fig. 4, 5;
the signal masking frequency processing: masking techniques are widely used in research in the field of audio and video, but the effectiveness of masking strategies in the spectrum of PSG signals has not yet been explored. In this enhancement module, a set of consecutive frequency channels or time steps is masked by using a frequency mask and a time mask. Frequency masking is achieved by filtering the time signal through a band-stop filter, while time masking is achieved by setting consecutive sampling points to zero.
As shown in fig. 6, the signals mask the original time-frequency spectrum and masked time-frequency spectrum of the frequency-processed EEG (upper waveform) and EOG (lower waveform);
and step 3: sequentially cascading a visual Transformer frame-level encoder, a bidirectional GRU sequence-level encoder and a softmax layer to construct a sleep staging network, inputting each group of PSG signal samples into the sleep staging network to predict to obtain a predicted sleep stage of each group of PSG signal samples, initializing the sleep staging network through cross-mode transfer learning, constructing a sleep staging network loss function model by combining the real sleep stages of each group of PSG signal samples, and training through an ADAM (adaptive dynamic analysis and analysis) optimizer to obtain an optimized sleep staging network;
the visual Transformer frame-level encoder is formed by sequentially cascading a time-frequency transform layer, a time-frequency spectrum blocking layer, a linear projection layer, a position encoding layer, a multi-head attention layer, a full connection layer and a token connection layer;
the time frequency conversion layer converts the ith group of PSG signal samples S' i Calculating a short-time Fourier transform time-frequency spectrum of the ith group of PSG signal samples through short-time Fourier transform, and expressing the short-time Fourier transform time-frequency spectrum as
F i =(dF 1,i ,dF 2,i ,...dF G,i )
i∈[1,T 1 ]
Wherein, F i Short-time Fourier transform time-frequency spectrum, T, representing the i-th set of PSG signal samples 1 Representing the number of PSG signal samples, C representing the number of channels, dF c,i Denotes the ith group of the c channelA short-time fourier transform time-frequency spectrum of the PSG signal samples;
will dF 1,i ,dF 2,i ,...dF C,i Splicing along a frequency axis to obtain a spliced time-frequency spectrum X of the ith group of PSG signal samples fft,i To X fft,i Performing logarithmic transformation to obtain spliced logarithmic-time frequency spectrum of the ith group of PSG signal samples, and normalizing the spliced logarithmic-time frequency spectrum of the ith group of PSG signal samples by a normal distribution method to obtain normalized time frequency spectrum X 'of the ith group of PSG signal samples' fft,i
The time frequency spectrum blocking layer normalizes the time frequency spectrum X 'of the ith group of PSG signal samples' fft,i The partitioning into N p × p patch sequences is expressed as a block-wise time spectrum of the ith set of PSG signal samples, which is as follows:
X i =(x 1,i ,x 2,i ,...,x n,i ,...,x N,i )
n∈[1,N]
wherein x is n,i Representing the nth patch in the partitioned time spectrum of the ith group of PSG signal samples, wherein N is the total number of the patches in the partitioned time spectrum of the ith group of PSG signal samples;
each patch in the time spectrum after the blocking of the ith group of PSG signal samples is sequentially converted into a patch vector sequence of the ith group of PSG signal samples through linear projection by the linear projection layer, which is specifically defined as follows:
E i =(E i,1 ,E i,2 ,...,E i,N )
wherein E is i,n Representing the nth patch vector of the ith group of PSG signal samples, wherein N is the total number of patches in the spectrum after the partitioning of the ith group of PSG signal samples;
the position coding layer embeds the vector random superposition position of each patch to obtain the coded characteristic sequence of the ith group of PSG signal samples, and the specific definition is as follows:
Figure BDA0003794413960000121
Figure BDA0003794413960000122
wherein, P i,n Indicating that the position of the nth patch of the ith group of PSG signal samples is embedded, N is the total number of patches in the spectrum after the blocking of the ith group of PSG signal samples,
Figure BDA0003794413960000123
representing the encoded features of the nth patch vector of the ith set of PSG signal samples;
constructing an input Transformer input characteristic sequence of the ith group of PSG signal samples, which comprises the following steps:
Figure BDA0003794413960000124
wherein,
Figure BDA0003794413960000125
CLS representing input of ith set of PSG signal samples]The position of the mark is embedded in the mark,
Figure BDA0003794413960000126
learnable [ CLS ] for the beginning of a sequence of i-th group of PSG signal samples]The mark is marked on the surface of the substrate,
Figure BDA0003794413960000127
representing the n-th patch coded feature in the blocked time spectrum of the ith group of PSG signal samples;
the multi-head attention layer and the full connection layer are to
Figure BDA0003794413960000128
Obtaining an output characteristic sequence of the ith group of PSG signal samples through the processing of a multilayer Transformer encoder
Figure BDA0003794413960000129
Figure BDA00037944139600001210
Wherein,
Figure BDA00037944139600001211
representing the output signature sequence of the ith set of PSG signal samples,
Figure BDA00037944139600001212
CLS representing the output of the ith set of PSG signal samples]The mark is marked on the surface of the substrate,
Figure BDA00037944139600001213
representing the output characteristic of the nth patch of the ith set of PSG signal samples;
defining the length of a target sleep frame as N 0 And constructing an output characteristic vector sequence of the ith group of PSG signal samples, which is defined as follows:
Figure BDA00037944139600001214
the token connection layer is to
Figure BDA00037944139600001215
And connecting with the average value of Di to obtain the characteristics of a single sleep frame of the ith group of PSG signal samples, which are defined as follows:
Figure BDA00037944139600001216
wherein,
Figure BDA00037944139600001217
representing the output characteristic of the nth patch of the ith set of PSG signal samples, concat representing the splice;
step 3, initializing the sleep staging network by the cross-modal transfer learning, specifically: channel weight averaging, input length adaptation.
The channel weight average: and averaging the weights corresponding to the three input channels of the linear projection layer of the pre-trained Transformer to serve as the weight of the linear projection layer of the visual Transformer frame-level encoder.
The input length is adaptive: since the input shape of the pre-trained Transformer is fixed (224 × 224 or 384 × 384), this is different from a typical PSG time-frequency diagram. In order to solve the problem of inconsistent position embedding length, the position embedding is self-adapted by using a cutting method and a bilinear interpolation method.
FIG. 7 is a schematic diagram of a cross-modal migration learning implementation;
will be provided with
Figure BDA0003794413960000131
A feature sequence defined as a single sleep frame;
step 3, the bidirectional GRU sequence level coder converts the characteristic sequence of a single sleep frame
Figure BDA0003794413960000132
Conversion to sequence-level feature vector sequences
Figure BDA0003794413960000133
Fig. 8 is a schematic diagram of a bidirectional GRU based sequence encoder;
step 3, the softmax layer sequences the sequence-level feature vector sequence
Figure BDA0003794413960000134
Mapping to a corresponding predicted sleep stage probability sequence, the predicted sleep stage probability sequence being defined as:
π i =(π i,1 ,π i,2 ,...,π i,K ) τ
wherein, pi i,k Representing the probability of being predicted as sleep stage k for the ith set of PSG signal samples;
the sleep staging network loss function model in the step 3 specifically comprises:
Figure BDA0003794413960000135
wherein, y i Representing the one-hot encoded vector of the i-th group of PSG signal samples;
and 4, step 4: the polysomnography monitoring device is worn on a human body according to sleep medical standards, PSG signals at multiple moments are collected in real time and transmitted to a computer, the computer processes the PSG signals collected at the multiple moments in real time through a sliding window in the step 1 to obtain real-time PSG signal samples, and the real-time PSG signal samples are predicted through an optimized sleep staging network to obtain the stages of the real-time sleep.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A vision transducer-based automatic sleep staging method is characterized by comprising the following steps:
step 1: introducing original PSG signals of a plurality of channels at a plurality of moments, and processing the original PSG signals of each channel at the plurality of moments through a sliding window to obtain a plurality of PSG signal sequences of each channel;
step 2: obtaining multiple groups of PSG signal samples after data enhancement processing of each channel by carrying out data enhancement processing on the multiple PSG signal sequences of each channel, constructing each group of PSG signal samples by using the same group of data-enhanced PSG signal samples of the multiple channels, and manually marking the real sleep stage of each group of PSG signal samples;
and 3, step 3: sequentially cascading a visual Transformer frame-level encoder, a bidirectional GRU sequence-level encoder and a softmax layer to construct a sleep staging network, inputting each group of PSG signal samples into the sleep staging network to predict to obtain a predicted sleep stage of each group of PSG signal samples, initializing the sleep staging network through cross-modal transfer learning, constructing a sleep staging network loss function model by combining the real sleep stages of each group of PSG signal samples, and training through an ADAM (automatic dynamic analysis and analysis) optimizer to obtain an optimized sleep staging network;
and 4, step 4: PSG signals at multiple moments are collected in real time, real-time PSG signal samples are obtained through sliding window processing in the step 1, and the real-time PSG signal samples are predicted through an optimized sleep staging network to obtain real-time sleep stages.
2. The visual transducer-based automated sleep staging method according to claim 1, characterized in that:
step 1, the original PSG signals of each channel at multiple times are defined as:
datac=(datac,1,datac,2,...,datac,L)
c∈[1,C]
wherein, dac represents the original PSG signals of multiple moments of the C-th channel, dac, n represents the original PSG signals of the n-th moment of the C-th channel, n belongs to [1, L ], L represents the number of original moments, and C represents the number of channels;
the window coverage of the sliding window processing in the step 1 is as follows:
(n-(T 0 -1)/2) to (n + (T) 0 -1)/2);
The window length of the sliding window processing in the step 1 is as follows: t is 0
The multiple PSG signal sequences of each channel in step 1 specifically include:
Figure FDA0003794413950000021
Figure FDA0003794413950000022
c∈[1,C]
wherein Sdata c Representing the PSG signal, S, in a sliding window at a plurality of times in the c-th channel c,i Represents the PSG signal in the sliding window at the ith time of the c channel, i ∈ [1, T ] 1 ],T 1 Indicates the number of PSG signal sequences, and C indicates the number of channels.
3. The visual Transformer-based automated sleep staging method according to claim 1, characterized in that:
the data enhancement processing in the step 2 specifically comprises the following steps:
signal denoising processing, signal channel interference processing, signal additive noise processing and signal masking frequency processing are respectively carried out data enhancement processing according to certain random probability;
step 2, the multiple groups of data of each channel are subjected to enhancement processing to obtain PSG signal samples, specifically:
Figure FDA0003794413950000023
c∈[1,C]
wherein, sdata' c Data enhanced PSG signal, dS ', representing a plurality of instants in time of the c-th channel' c,m The mth group of data representing the c channel, m ∈ [1, T, samples of the post-enhancement PSG signal 1 ],T 1 Representing the number of PSG signal samples after data enhancement processing;
step 2, constructing each group of PSG signal samples by the PSG signal samples after the same group of data enhancement processing of the plurality of channels, specifically as follows:
S′ i =(dS′ 1,i ,dS′ 2,i ,...dS′ G,i )
i∈[1,T 1 ]
wherein, S' i Representing the ith set of PSG signal samples, T 1 Representing the number of PSG signal samples and C representing the number of channels.
4. The visual Transformer-based automated sleep staging method according to claim 1, characterized in that:
the visual Transformer frame-level encoder is formed by sequentially cascading a time-frequency transform layer, a time-frequency spectrum blocking layer, a linear projection layer, a position encoding layer, a multi-head attention layer, a full connection layer and a token connection layer;
the time frequency conversion layer converts the ith group of PSG signal samples
Figure FDA0003794413950000031
Calculating a short-time Fourier transform time-frequency spectrum of the ith group of PSG signal samples through short-time Fourier transform, and expressing the short-time Fourier transform time-frequency spectrum as
F i =(dF 1,i ,dF 2,i ,...dF G,i )
i∈[1,T 1 ]
Wherein, F i Representing the short-time Fourier transform time-frequency spectrum, T, of the ith set of PSG signal samples 1 Representing the number of PSG signal samples, C representing the number of channels, dF c,i A short-time Fourier transform time-frequency spectrum representing the ith group of PSG signal samples of the c channel;
will dF 1,i ,dF 2,i ,...dF G,i Splicing along a frequency axis to obtain a spliced time-frequency spectrum X of the ith group of PSG signal samples fft,i To X fft,i Carrying out logarithmic transformation to obtain spliced logarithmic time spectrum of the ith group of PSG signal samples, and normalizing the spliced logarithmic time spectrum of the ith group of PSG signal samples by a normal distribution method to obtain normalized time spectrum of the ith group of PSG signal samples
Figure FDA0003794413950000032
The time frequency spectrum blocking layer normalizes the time frequency spectrum of the ith group of PSG signal samples
Figure FDA0003794413950000033
The patch sequence divided into N pieces of p × p size is expressed as the i-th set of PSG signalsThe time-frequency spectrum after the sample is partitioned specifically is as follows:
X i =(x 1,i ,x 2,i ,...,x n,i ,...,x N,i )
n∈[1,N]
wherein x is n,i Representing the nth patch in the blocked time spectrum of the ith group of PSG signal samples, wherein N is the total number of the patches in the blocked time spectrum of the ith group of PSG signal samples;
each patch in the time spectrum after the blocking of the ith group of PSG signal samples is sequentially converted into a patch vector sequence of the ith group of PSG signal samples through linear projection by the linear projection layer, and the specific definition is as follows:
E i =(E i,1 ,E i,2 ,...,E i,N )
wherein, E i,n Representing the nth patch vector of the ith group of PSG signal samples, wherein N is the total number of patches in the spectrum after the blocking of the ith group of PSG signal samples;
the position coding layer embeds the vector random superposition position of each patch to obtain the coded feature sequence of the ith group of PSG signal samples, and the specific definition is as follows:
Figure FDA0003794413950000041
Figure FDA0003794413950000042
wherein, P i,n Indicating that the position of the nth patch of the ith group of PSG signal samples is embedded, N is the total number of patches in the spectrum after the blocking of the ith group of PSG signal samples,
Figure FDA0003794413950000043
representing encoded features of an nth patch vector of an ith set of PSG signal samples;
constructing an input Transformer input characteristic sequence of the ith group of PSG signal samples, which comprises the following specific steps:
Figure FDA0003794413950000044
wherein,
Figure FDA0003794413950000045
CLS representing input of ith set of PSG signal samples]The position of the mark is embedded in the mark,
Figure FDA0003794413950000046
learnable [ CLS ] for sequence start of ith set of PSG signal samples]The mark is marked on the surface of the substrate,
Figure FDA0003794413950000047
representing the n-th patch coded feature in the blocked time spectrum of the ith group of PSG signal samples;
the multi-head attention layer and the full connection layer are to
Figure FDA0003794413950000048
Obtaining an output characteristic sequence of the ith group of PSG signal samples through the processing of a multilayer Transformer encoder
Figure FDA0003794413950000049
Figure FDA00037944139500000410
Wherein,
Figure FDA00037944139500000411
representing the output signature sequence of the ith set of PSG signal samples,
Figure FDA00037944139500000412
CLS representing the output of the ith set of PSG signal samples]The mark is marked on the surface of the substrate,
Figure FDA00037944139500000413
an output characteristic representing the nth patch of the ith set of PSG signal samples;
defining the length of a target sleep frame as N 0 And constructing an output characteristic vector sequence of the ith group of PSG signal samples, which is defined as follows:
Figure FDA00037944139500000414
the token connection layer is to
Figure FDA00037944139500000415
And connecting with the average value of Di to obtain the characteristics of a single sleep frame of the ith group of PSG signal samples, which are defined as follows:
Figure FDA00037944139500000416
wherein,
Figure FDA0003794413950000051
representing the output characteristic of the nth patch of the ith set of PSG signal samples, concat representing the splice;
will be provided with
Figure FDA0003794413950000052
Defined as a sequence of features of a single sleep frame.
5. The visual transducer-based automated sleep staging method according to claim 1, characterized in that:
step 3, the bidirectional GRU sequence level encoder converts the characteristic sequence of a single sleep frame
Figure FDA0003794413950000053
Conversion to sequence-level feature vector sequences
Figure FDA0003794413950000054
Step 3, the softmax layer converts the sequence level feature vector sequence
Figure FDA0003794413950000055
Mapping to a corresponding predicted sleep stage probability sequence, the predicted sleep stage probability sequence being defined as:
π i =(π i,1 ,π i,2 ,...,π i,K ) T
wherein, pi i,k Representing the probability of being predicted as sleep stage k for the ith set of PSG signal samples.
6. The visual transducer-based automated sleep staging method according to claim 1, characterized in that:
the sleep staging network loss function model in the step 3 specifically comprises the following steps:
Figure FDA0003794413950000056
wherein, y i A one-hot encoded vector representing the true sleep stages of the ith set of PSG signal samples,
Figure FDA0003794413950000057
to predict sleep stage probability sequences.
CN202210965248.6A 2022-08-12 2022-08-12 Automatic sleep staging method based on visual Transformer Pending CN115374815A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210965248.6A CN115374815A (en) 2022-08-12 2022-08-12 Automatic sleep staging method based on visual Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210965248.6A CN115374815A (en) 2022-08-12 2022-08-12 Automatic sleep staging method based on visual Transformer

Publications (1)

Publication Number Publication Date
CN115374815A true CN115374815A (en) 2022-11-22

Family

ID=84066240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210965248.6A Pending CN115374815A (en) 2022-08-12 2022-08-12 Automatic sleep staging method based on visual Transformer

Country Status (1)

Country Link
CN (1) CN115374815A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316369A (en) * 2023-08-24 2023-12-29 兰州交通大学 Chest image diagnosis report automatic generation method for balancing cross-mode information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316369A (en) * 2023-08-24 2023-12-29 兰州交通大学 Chest image diagnosis report automatic generation method for balancing cross-mode information
CN117316369B (en) * 2023-08-24 2024-05-07 兰州交通大学 Chest image diagnosis report automatic generation method for balancing cross-mode information

Similar Documents

Publication Publication Date Title
CN114376564B (en) Sleep staging method, system, device and medium based on ballistocardiogram signals
CN107736894A (en) A kind of electrocardiosignal Emotion identification method based on deep learning
CN107657868A (en) A kind of teaching tracking accessory system based on brain wave
WO2021114761A1 (en) Lung rale artificial intelligence real-time classification method, system and device of electronic stethoscope, and readable storage medium
CN110946576A (en) Visual evoked potential emotion recognition method based on width learning
CN110600053A (en) Cerebral stroke dysarthria risk prediction method based on ResNet and LSTM network
CN110731778B (en) Method and system for recognizing breathing sound signal based on visualization
CN114190944B (en) Robust emotion recognition method based on electroencephalogram signals
CN114469124A (en) Method for identifying abnormal electrocardiosignals in motion process
Tang et al. ECG de-noising based on empirical mode decomposition
CN115374815A (en) Automatic sleep staging method based on visual Transformer
CN111772669A (en) Elbow joint contraction muscle force estimation method based on adaptive long-time and short-time memory network
CN113609975A (en) Modeling method for tremor detection, hand tremor detection device and method
CN113796889A (en) Auxiliary electronic stethoscope signal discrimination method based on deep learning
CN113576472B (en) Blood oxygen signal segmentation method based on full convolution neural network
CN113974607B (en) Sleep snore detecting system based on pulse neural network
KR20220158462A (en) EMG signal-based recognition information extraction system and EMG signal-based recognition information extraction method using the same
He et al. HMT: An EEG Signal Classification Method Based on CNN Architecture
CN116196015A (en) Electroencephalogram classification model based on rhythm feature fusion convolutional neural network
CN112617761B (en) Sleep stage staging method for self-adaptive focalization generation
CN115270847A (en) Design decision electroencephalogram recognition method based on wavelet packet decomposition and convolutional neural network
CN114569116A (en) Three-channel image and transfer learning-based ballistocardiogram ventricular fibrillation auxiliary diagnosis system
Murthy et al. Design and implementation of hybrid techniques and DA-based reconfigurable FIR filter design for noise removal in EEG signals on FPGA
Liu et al. SDEMG: Score-Based Diffusion Model for Surface Electromyographic Signal Denoising
CN115956925B (en) QRS wave detection method and system based on multistage smooth envelope

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination