CN115089123A

CN115089123A - OSA detection method based on attention and Transformer

Info

Publication number: CN115089123A
Application number: CN202210788498.7A
Authority: CN
Inventors: 石争浩; 张治军; 周亮; 李成建; 任晓勇; 黑新宏; 张一彤; 刘海琴; 罗靖; 尤珍臻; 陈敬国
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2022-07-06
Filing date: 2022-07-06
Publication date: 2022-09-23

Abstract

The invention discloses an attention and Transformer-based OSA detection method, which comprises the steps of establishing a data set, constructing an OSA detection network, preprocessing the data set, inputting the preprocessed data set into the OSA detection network for training to obtain a trained OSA detection network, and inputting the preprocessed data set into the trained OSA detection network for classification to obtain a classification detection result. The detection method improves the class perception loss function to effectively solve the class imbalance problem without additional calculation, completes the detection of the OSA in the whole night recording of the patient through a classifier consisting of MLP and Softmax and improves the accuracy of the OSA detection.

Description

OSA detection method based on attention and Transformer

Technical Field

The invention belongs to the technical field of medical signal processing, and particularly relates to an OSA detection method based on attention and a Transformer.

Background

Sleep Apnea Syndrome (SAS) is a common Sleep disorder that results in Sleep fragmentation and affects human health and quality of life. There are three basic types of sleep apnea: obstructive (OSA), Central (CSA) and Mixed (Mixed Sleep Apnea, MSA), often accompanied by associated hypoventilation (hypoppeas, HYP). OSA is one of the most common, most frequent and most severe sleep disorders. It can cause complete obstruction of the upper airway and relaxation of the laryngeal muscles, thereby impeding respiratory flow during sleep. A recent literature-based study suggests that 9.36 million adults worldwide are estimated to have undiagnosed OSA. Such diseases may lead to serious adverse physiological conditions and increased risk of heart disease, stroke, neurodegenerative diseases such as alzheimer's disease and cancer.

According to AASM, Polysomnography (PSG) is considered the "gold standard" for OSA detection, which is based on a comprehensive assessment of sleep signals. The PSG records the patient overnight and measures signals, such as electroencephalograms (EEG), Electrocardiograms (ECG), etc., using sensors attached to the body, after which sleep state analysis is performed by a sleep medical professional and sleep apnea events are manually annotated. With the increasing number of OSA patients in recent years, the shortage of sleep centers and sleep medical professionals, as well as the difference in scores between medical professionals and some other human errors, is a major obstacle to the correct diagnosis and timely treatment of OSA. Therefore, there is an urgent need to automate the detection of OSA events and help sleep medical experts to achieve fast and accurate event annotation to provide a powerful technical support to circumvent these human errors and the lack of infrastructure.

To achieve this goal, various physiological signals such as oxygen saturation, changes in heart rate and respiration, and various physiological signals such as EEG and ECG, etc., have been used for a great deal of research. EEG-based analysis has received particular attention from researchers in recent years due to the acquisition of multiple physiological signals, the time and effort involved, the high cost, and the rapid development of wearable infinite electrical detection acquisition systems. Meanwhile, with the continuous development of deep learning, the method is applied to different fields, and the method shows the advantages over the traditional machine learning model under the condition of not needing field knowledge. Among them, convolutional neural network is a popular deep learning model, and because of its excellent feature extraction and classification capability in tasks such as visual image, speech recognition and text recognition, CNN is also applied to the problem of biological signal classification at present. Some proposals divide the EEG signal into individual subframes, then realize the feature extraction of each subframe through the designed FCNN, and finally realize classification by using the full connection layer. The CNN building model, however, cannot model the time dependence and adaptive feature optimization between EEG data. Currently, a Recurrent Neural Network (RNN) is often employed in order to capture time dependence in EEG data. It has been proposed to divide the EEG signal into sub-frames at a fixed length of 10s, followed by feature extraction by CNN, long short term memory network (LSTM) to learn the transition rules for sleep apnea classification. However, due to their cyclic nature, RNNs have limitations in that they typically have high model complexity, and thus they are difficult to train in parallel. In addition, the problem of class imbalance of the data is also one of the important problems affecting the detection accuracy of the OSA.

Disclosure of Invention

The invention aims to provide an OSA detection method based on attention and a Transformer, which is beneficial to improving the accuracy of the detection of obstructive sleep apnea syndrome.

The technical scheme adopted by the invention is that an OSA detection method based on attention and a Transformer is used for establishing a data set and constructing an OSA detection network, preprocessing the data set and inputting the preprocessed data set into the OSA detection network for training to obtain the trained OSA detection network, and then inputting the preprocessed data set into the trained OSA detection network for classification to obtain a classification detection result.

The present invention is also characterized in that,

the method specifically comprises the following steps:

step 1, establishing an EEG signal data set and constructing an OSA detection network comprising a feature extraction module and a classification module;

step 2, carrying out data preprocessing on the EEG signal data set in the step 1 to obtain an original EEG signal;

step 3, inputting the original EEG signal obtained in the step 2 into an OSA detection network for feature extraction and classification to obtain a classification result;

step 4, constraining the classification result obtained in the step 3 by using a loss function, and then performing iterative training to obtain a trained OSA detection network model;

and 5, putting the EEG signal to be processed into the OSA detection network model trained in the step 4, and finally outputting a classification detection result.

The feature extraction module in the step 1 consists of a two-way convolution neural network, a convolution attention module and a Transformer, the convolution attention module consists of a space attention module and a channel attention module, and the classification module in the step 1 consists of an MLP (maximum likelihood probability) and a Softmax.

Step 3, the characteristic extraction specifically comprises the following steps: and (3) transmitting the original signals obtained in the step (2) into a two-way convolutional neural network, splicing the features extracted from each branch, inputting the spliced features into a convolutional attention module to complete the adaptive feature optimization of the features, and modeling the dependency relationship among the optimized features to obtain shallow semantics.

The two-way convolutional neural network adopts convolution kernels with two different sizes to carry out primary extraction of features, then multilayer convolution and pooling are carried out to carry out convolution on two branches, wherein a Dropout layer is adopted to prevent model overfitting, the size of a large convolution kernel in the two-way convolutional neural network is set to be 400, and the size of a small convolution kernel is set to be 50.

And in the characteristic extraction process, a residual block is adopted to enrich characteristic details and enhance the characteristic extraction capability.

The formula of the residual block is:

x _l+1 ＝x _l +F(x _l +W _l ) (10)

in the formula (10), x _l+1 Is the convolution result of the (l + 1) th convolutional layer, x _l As a result of convolution of the first convolutional layer, W _l Is the weight of the first convolutional layer, F (x) _l +W _l ) Is the residual part.

The step 2 preprocessing is specifically to decompose, denoise and reconstruct the EEG signal by the FastICA algorithm.

The decomposition, denoising and reconstruction of the EEG signal by the FastICA algorithm specifically comprises the following steps:

s1, centralization: calculating the mean value of the mixed signal X, and then subtracting the mean value from X, as shown in equation (1):

X＝X-E(X) (1)

in the formula (1), X is a mixed signal, and E (X) is a signal mean value;

s2, whitening: the specific process is shown as formula (2):

E[XX ^T ]＝C _X ＝UΛU ^T (2)

in the formula (2), C _X Is E (XX) ^T ) Given the covariance matrix of X, Λ ═ diag (λ) ₁ ，λ ₂ ，…λ _n ) Is that the diagonal element is C _X Diagonal matrix of eigenvalues, U ═ U ₁ ，u ₂ ，…，u _n ]Is a characteristic value of C _X Characteristic vector of (1), UΛ U ^T A characteristic decomposition part of the covariance matrix;

Z＝K×X (3)

in the formula (3), Z is the updated whitening matrix, K is the whitening matrix, and X is the signal after mean value removal;

in the formula (4), K is a whitening matrix, and Λ ═ diag (λ) ₁ ，λ ₂ ，…λ _n ) Is that the diagonal element is C _X Diagonal matrix of eigenvalues, U ═ U ₁ ，u ₂ ，…，u _n ]Is a characteristic value of C _X The feature vector of (2);

the matrix Z after whitening is: as shown in the formula (5),

in formula (5), where I is an identity matrix, Z is an updated whitening matrix, K is a whitening matrix, and Λ ═ diag (λ) ₁ ，λ ₂ ，…λ _n ) Is that the diagonal element is C _X Diagonal matrix of eigenvalues, U ═ U ₁ ，u ₂ ，…，u _n ]Is a characteristic value of C _X Characteristic vector of (1), UΛ U ^T A characteristic decomposition part of the covariance matrix;

s3, initializing W _i ；

S4, for W _i Updating is carried out, as shown in the formula (6),

W _i ＝E{Zg(W _i ^T Z)}-E{g′(W _i ^T Z)}W (6)

in equation (6), W is a matrix initialized with all 0s, W _i I column of W;

s5, orthogonalizing W _i As shown in the formula (7),

W _i ＝W _i -∑(W _i ^T W _j )W _j (7)

in formula (7), W _i Is the ith column of W, W _j Is the jth column of W;

s6, normalizing W _i As shown in the formula (8),

W _i ＝W _i /||W _i || (8)

in formula (8), W _i Is the ith column of W, | W _i I is W _i The mold of (4);

s7, checking whether the iteration converges, if not converging to S4, when both converge to S3, initializing the next W _i (i++)

S8, the separated signal is reconstructed to obtain the source signal S, as shown in equation (9).

S＝WKX (9)

In the formula (9), S is a reconstructed source signal, W is a matrix initialized by all 0S, K is a whitening matrix, and X is a signal after mean value removal;

the loss function in step 4 is shown in equations (11), (12) and (13):

ω _k ＝μ _k max(1，log(μ _k M/M _k )) (12)

in equations (11), (12) and (13), λ represents a penalty factor representing a degree of penalty, ω, for the entire network weight _k Denotes the weight, μ, assigned to the k class _k Is an adjustable parameter, M _k Is the number of samples for the class k.

The invention has the beneficial effects that:

according to the invention, under the condition that an original non-interference signal is obtained by decomposing, denoising and reconstructing the signal through FastICA, the feature extraction is completed through the feature extraction module, the modeling of the dependence relationship between the features is optimized and captured, high-quality information is extracted, and meanwhile, a loss function is designed to solve the problem of category imbalance, and finally, the classification is completed through MLP, so that a better OSA detection effect and a higher evaluation index are obtained.

Drawings

FIG. 1 is a schematic flow diagram of an attention and transducer based OSA detection method of the present invention;

FIG. 2 is a schematic flow chart of the preprocessing in the attention and Transformer based OSA detection method of the present invention;

FIG. 3 is a schematic structural diagram of a two-way convolutional neural network in the OSA detection method based on attention and Transformer according to the present invention;

FIG. 4 is a schematic diagram of the structure of the convolution attention module in the attention and Transformer based OSA detection method of the present invention;

FIG. 5 is a schematic diagram of the structure of a Transformer in the attention and Transformer based OSA detection method of the present invention;

FIG. 6 is a diagram of a trained OSA detection network model in the attention and Transformer based OSA detection method of the present invention;

FIG. 7 is a graph showing the result of OSA detection in the attention and Transformer based OSA detection method of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and detailed description.

As shown in fig. 1, in the OSA detection method based on attention and Transformer, a data set is established and an OSA detection network is constructed, the data set is preprocessed and then input to the OSA detection network for training to obtain a trained OSA detection network, and then the preprocessed data set is input to the trained OSA detection network for classification to obtain a result of classification detection.

The method specifically comprises the following steps:

step 1, establishing an EEG signal data set and constructing an OSA detection network comprising a feature extraction module and a classification module; the data set was the UCD data set provided in the university of saint wenmet college of medicine/university of dublin school of sleep apnea database, with a total of 25 subjects participating in EEG signal collection, each subject recording overnight PSG signals, with a sampling frequency of 128Hz and annotated by the expert for the subjects' overnight recordings. The feature extraction module consists of a two-way convolutional neural network, a convolutional attention module and a Transformer, and the classification module consists of an MLP (maximum likelihood probability) and a Softmax;

step 2, carrying out data preprocessing on the EEG signal data set in the step 1 to obtain an original EEG signal; dividing the EEG signal of a subject in the whole night for 30s with a fixed length, and decomposing, denoising and reconstructing the divided signal through FastICA to obtain original signal data without other lead interference in the signal acquisition process;

as shown in fig. 2, the preprocessing specifically includes decomposing, denoising, and reconstructing an original EEG signal by a FastICA algorithm, and finally obtaining a denoised signal S;

decomposing the EEG signal by the FastICA algorithm, and denoising and reconstructing the EEG signal specifically comprises the following steps:

s1, centralization: this is the first step of preprocessing, first calculating the mean value of the mixed signal X, and then subtracting the mean value from X as shown in equation (1):

X＝X-E(X) (1)

in the formula (1), X is a mixed signal, and E (X) is a signal mean value;

s2, whitening: in this step, the matrix X is converted to Z, so they are uncorrelated and possess unit variance. The method relates to characteristic value decomposition, and the specific process is shown as formula (2):

E[XX ^T ]＝C _X ＝UΛU ^T (2)

the whitening realization process is shown as the formula (3),

Z＝K×X (3)

the matrix K is defined as:

it can further be concluded that the Z elements of the matrix are uncorrelated after the whitening process.

The matrix Z after whitening is: as shown in the formula (5),

s3, initializing W _i ，W _i I column of W;

s4, for W _i Updating is carried out, as shown in the formula (6),

W _i ＝E{Zg(W _i ^T Z)}-E{g′(W _i ^T Z)}W (6)

in the formula (6), W is a matrix initialized by all 0s, W _i I column of W;

in the experiment, g (·) ═ tanh (·), at the moment, FastICA iteration speed is high, and robustness is strong.

S5, orthogonalizing W _i As shown in the formula (7),

W _i ＝W _i -∑(W _i ^T W _j )W _j (7)

in formula (7), W _i Is ith column of W, W _j Is the jth column of W;

s6, normalizing W _i As shown in the formula (8),

W _i ＝W _i /||W _i || (8)

in formula (8), W _i Is the ith column of W, | W _i I is W _i The mold of (4);

S8, reconstructing the separated signal to obtain a source signal S, as shown in formula (9);

S＝WKX (9)

in equation (9), S is the reconstructed source signal, W is a matrix initialized with all 0S, K is the whitening matrix, and X is the de-averaged signal.

Step 3, inputting the EEG signal data set in the step 2 into an OSA detection network for feature extraction and classification, obtaining a classification result and obtaining a trained OSA detection network model; the obtained original signals are transmitted into a two-way convolution neural network, and finally the features extracted from each branch are spliced and then input into a convolution attention module to complete the self-adaptive feature optimization of the features so as to enhance the feature learning. And simultaneously, context coding is realized by using a Transformer for better capturing the context dependency relationship among the interior of the features.

In order to better extract features and enrich semantic information of the features, the invention adopts a double-path convolutional neural network as a primary feature extractor, wherein convolutional layer branches with two different kernel sizes are realized according to the relation of sampling rates in signals. To better capture features according to the range of the frequency bands, the large convolution kernel size was set to 400 and the small convolution kernel size was set to 50 in the experiment. As shown in FIG. 3, each branch consists of three convolutional layers and two max pooling layers, where each convolutional layer includes a Batch normalization layer and uses GELU as the activation function. To prevent overfitting, we also apply Dropout layer after the first maximum pooling in the two branches and after the connection of the two branches, inputting the reconstructed signal S into the two-way convolutional neural network as shown in equations (11) and (12):

F＝Dropout(concat(B1，B2)) (13)

in formulae (11) and (12), f ^n×n Represents the convolution operation with convolution kernel size n × n, and Maxpool represents the maximum pooling.

In formula (13), concat represents the concatenation of two signatures.

The convolution attention module is composed of a spatial attention module and a channel attention module. For better feature extraction, feature extraction capability is enhanced, and meanwhile, a convolution attention mechanism is introduced to realize the self-adaptive optimization of features. The convolutional attention mechanism is a simple and effective attention module for feeding forward convolutional neural networks. Given an intermediate feature, the convolution attention mechanism module infers the attention feature in turn along two independent dimensions, such as channel and space, and then multiplies the attention feature with the input feature for adaptive feature optimization. As shown in fig. 4.

The channel attention mechanism is as follows: the feature mapping F firstly carries out global average pooling and global maximum pooling operations to aggregate spatial information of the feature mapping, and generates two different spatial context descriptors:

and

the global average pooling feature and the global maximum pooling feature are represented separately. These two features are then propagated forward into a shared network to generate our channel attention map

The channel attention was calculated as:

in equation (16), AdaptAvgPool represents adaptive average pooling, AdapteMaxpool represents adaptive maximum pooling, σ represents sigmoid function, f ^1×1 A convolution operation with a convolution kernel size of 1 x 1 is represented.

The spatial attention mechanism is as follows: aggregating the channel information of the feature maps by two merging operations, generating two feature maps:

and

mean pooling characteristics and maximum pooling characteristics across channels are represented, respectively. These information are then concatenated through a standard convolutional layer to generate a spatial attention map

In short, the spatial attention calculation is as shown in equation (17):

in the formula (17), σ represents a sigmoid function, f ^n×n Represents the convolution operation with a convolution kernel size of n × n, AvgPool represents the average pooling operation, MaxPool represents the maximum pooling operation,

and

mean pooling characteristics and maximum pooling characteristics across channels are represented, respectively.

Given feature map F as input, CBAM derives a channel attention map in order

And spatial attention mapping

As shown in fig. 4. The overall process of attention is shown in equation (18):

in the formula (18), the reaction mixture,

representing a dot product, in which the attention value is broadcast: channel attention values are broadcast along the spatial dimension and vice versa. F is the input signature, Mc represents the input through the channel attention, Ms represents the input through the spatial attention, and F "is the final output.

For better extracting context dependence between features, a Transformer is used as context codingA Code (CE) module. The architecture of the CE module is shown in FIG. 5 as the Transformer Encoder. It mainly comprises a Norm layer, a continuous multi-head attention Module (MHA) and an MLP module. The MLP module of (1) consists of two fully connected layers with a non-linear relationship, using ReLU as the activation function in between and a Dropout layer to prevent data overfitting. The same L layers are then stacked to produce the final feature. Inspired by the BERT model, we add a token to the input

Its state is represented as a context vector in the output. The Transformer first inputs F' into a linear mapping layer omega _Tran Mapping features to hidden dimensions, i.e.

The output of this linear mapping is then input to a Transformer, i.e.

Next, we append the context vector to the feature vector

Such input features become

Where subscript 0 represents the input to the first layer. Then we will let ψ ₀ Inputting the transform layer for processing, as shown in formula (19) and formula (20):

in formula (19), MHA represents a multi-headed attention module, Norm represents a LayerNorm layer, ψ _l-1 Represents the characteristics of the (l-1) th layer,

representing features after multi-head attention, + representing residual concatenation.

In the formula (20), MLP represents a pure full-link network, Norm represents a LayerNorm layer, ψ _l Represents the l-th layer output characteristics and + represents the residual connection.

The characteristic extraction specifically comprises the following steps: transmitting the original signals obtained in the step (2) into a two-way convolutional neural network, splicing the features extracted from each branch, then inputting the spliced features into a convolutional attention module to complete the adaptive feature optimization of the features, and modeling the dependency relationship among the optimized features to obtain shallow semantics;

feature extraction in which residual blocks are employed to enrich feature details enhance feature extraction capabilities.

The formula of the residual block is:

x _l+1 ＝x _l +F(x _l +W _l ) (21)

in the formula (21), x _l+1 Is the convolution result of the (l + 1) th convolutional layer, x _l As a result of convolution of the first convolutional layer, W _l Is the weight of the first convolutional layer, F (x) _l +W _l ) Is the residual part.

The two-way convolutional neural network uses convolution kernels of two different sizes for initial extraction of features, followed by multi-layer convolution and pooling to convolve the two-way branches, with a Dropout layer being used to prevent model overfitting. The large convolution kernel size in a two-way convolutional neural network is set to 400 and the small convolution kernel size is set to 50.

Re-attaching the context vector from the final output such that

Combining MLP and Softmax to be used as a classifier, inputting the context vector into the classifier in order to achieve a classification effect, and obtaining classification results Normal and OSA. As shown in fig. 6.

And 4, constraining the result obtained by training the network by using a loss function, then performing parameter updating through reverse retransmission, and performing 100 times of iterative training, wherein 1 time refers to training the preprocessed signal once, and finally obtaining a trained network model, as shown in fig. 6.

Standard multi-class cross entropy is used as a loss function for the model in conventional multi-classification tasks.

In the formula (22), the reaction mixture is,

is the true label for the ith sample,

the prediction probability of the ith sample as the class K is shown, M is the total amount of the samples, and K is the number of the classes. The overall data set category is unbalanced due to the large number of variations per category. The loss function in equation (22) is equivalent to penalizing the misclassifications of all classes, so the trained model may be biased toward classes with large sample size.

In order to solve the influence of the class imbalance problem on the classification result, the method reconstructs the whole loss function on the basis of the standard multi-class cross entropy and the former work to solve the class imbalance problem, as shown in the formulas (23), (24) and (25):

in equations (23), (24) and (25), λ represents a penalty factor representing a degree of penalty for the entire network weight, and ω represents a penalty degree _k Denotes the weight, μ, assigned to the k class _k Is an adjustable parameter, M _k Is the number of samples for the class k. The choice of class weight depends on two factors, namely the number of samples of the class is M/M _k The decision, and the class are differentiated by μ _k And (6) determining. We distribute the number of samples in the dataset according to the Normal State and the OSA State we will have the highest μ _k OSA was assigned, with Normal being the lowest.

And 5: putting the EEG signal to be processed into the trained model, and finally outputting the result of classification detection, as shown in FIG. 7, graph a, graph b and graph c are the EEG signal to be detected, graph d is the classification result graph of graph a, graph e is the classification result graph of graph b, and graph e is the classification result graph of graph c.

Claims

1. The OSA detection method based on attention and a Transformer is characterized in that a data set is established, an OSA detection network is constructed, the data set is preprocessed and then input into the OSA detection network for training, the trained OSA detection network is obtained, then the preprocessed data set is input into the trained OSA detection network for classification, and the classified detection result is obtained.

2. The OSA detection method based on attention and transducer according to claim 1, characterized by comprising the following steps:

and 5, putting the EEG signal to be processed into the trained OSA detection network model in the step 4, and finally outputting a classification detection result.

3. The attention and fransformer-based OSA detection method as claimed in claim 2, wherein the feature extraction module of step 1 is composed of a two-way convolutional neural network, a convolutional attention module and a fransformer, the convolutional attention module is composed of a spatial attention module and a channel attention module, and the classification module of step 1 is composed of MLP and Softmax.

4. The attention and transducer based OSA detection method according to claim 2, wherein the feature extraction in step 3 is specifically: and (3) transmitting the original signals obtained in the step (2) into a two-way convolution neural network, splicing the features extracted from each branch, inputting the spliced features into a convolution attention module to complete the self-adaptive feature optimization of the features, and modeling the dependency relationship among the optimized features to obtain shallow semantics.

5. The attention and transducer based OSA detection method according to claim 3 or 4, characterized in that the two-way convolutional neural network uses convolution kernels of two different sizes for the preliminary extraction of features, then performs multi-layer convolution and pooling to convolve the two-way branches, wherein a Dropout layer is used to prevent model overfitting, and the two-way convolutional neural network has a large convolution kernel size set to 400 and a small convolution kernel size set to 50.

6. The attention and fransformer based OSA detection method of claim 2, wherein the feature extraction process employs a residual block to enrich feature details and enhance feature extraction capability.

7. The attention and fransformer based OSA detection method of claim 6, wherein the formula of the residual block is:

x _l+1 ＝x _l +F(x _l +W _l ) (1)

in the formula (1), x _l+1 Is the convolution result of the l +1 th convolution layer, x _l As a result of convolution of the first convolutional layer, W _l Is the weight of the first convolutional layer, F (x) _l +W _l ) Is the residual part.

8. The attention and transducer based OSA detection method according to claim 2, wherein the preprocessing of step 2 is to decompose, denoise and reconstruct EEG signal by FastICA algorithm.

9. The attention and fransformer based OSA detection method of claim 8, wherein the FastICA algorithm decomposition, denoising and reconstruction of EEG signals specifically comprises the steps of:

s1, centralization: calculating the mean value of the mixed signal X, and then subtracting the mean value from X, as shown in equation (2):

X＝X-E(X) (2)

in the formula (2), X is a mixed signal, and E (X) is a signal mean value;

s2, whitening: the specific process is shown as formula (3):

E[XX ^T ]＝C _X ＝UΛU ^T (3)

in formula (3), C _X Is E (XX) ^T ) Given the covariance matrix of X, Λ ═ diag (λ) ₁ ，λ ₂ ，...λ _n ) Is a diagonal element of C _X Diagonal matrix of eigenvalues, U ═ U ₁ ，u ₂ ，...，u _n ]Is a characteristic value of C _X Characteristic vector of (1), UΛ U ^T A characteristic decomposition part of the covariance matrix;

Z＝K×X (4)

in the formula (4), Z is the updated whitening matrix, K is the whitening matrix, and X is the signal after mean value removal;

in the formula (5), K is a whitening matrix, and Λ ═ diag (λ) ₁ ，λ ₂ ，...λ _n ) Is a diagonal element of C _X Diagonal matrix of eigenvalues, U ═ U ₁ ，u ₂ ，...，u _n ]Is a characteristic value of C _X The feature vector of (2);

the matrix Z after whitening is: as shown in the formula (6),

in the formula (6), I is an identity matrix, Z is an updated whitening matrix, K is a whitening matrix, and Λ ═ diag (λ) ₁ ，λ ₂ ，...λ _n ) Is that the diagonal element is C _X Diagonal matrix of eigenvalues, U ═ U ₁ ，u ₂ ，...，u _n ]Is a characteristic value of C _X Characteristic vector of (1), UΛ U ^T A characteristic decomposition part of the covariance matrix;

s3, initializing W _i ；

S4, for W _i Updating is carried out, as shown in a formula (7),

W _i ＝E{Zg(W _i ^T Z)}-E{g′(W _i ^T Z)}W (7)

in the formula (7), W is a matrix initialized by all 0s, W _i I column of W;

s5, orthogonalizing W _i As shown in the formula (8),

W _i ＝W _i -∑(W _i ^T W _j )W _j (8)

in the formula (8), W _i Is ith column of W, W _j Is the jth column of W;

S6，normalized W _i As shown in the formula (9),

W _i ＝W _i /||W _i || (9)

in the formula (9), W _i Is the ith column of W, | W _i I is W _i The mold of (4);

And S8, reconstructing the separated signals to obtain a source signal S, as shown in the formula (10).

S＝WKX (10)

In equation (10), S is the reconstructed source signal, W is a matrix initialized with all 0S, K is the whitening matrix, and X is the de-averaged signal.

10. The attention and transducer based OSA detection method according to claim 2, wherein the loss function in step 4 is shown in equations (11), (12) and (13):

ω _k ＝μ _k max(1，log(μ _k M/M _k )) (12)

in equations (11), (12) and (13), λ represents a penalty factor representing a degree of penalty, ω, for the entire network weight _k Representing the weight assigned to the k class, μ _k Is an adjustable parameter, M _k Is the number of samples for the class k.