CN116756657A

CN116756657A - CNN and transducer-based fNIRS brain load detection method

Info

Publication number: CN116756657A
Application number: CN202311031625.XA
Authority: CN
Inventors: 汪曼青; 廖凌翔; 郜东瑞; 王录涛; 张永清; 余海翔; 李小鱼
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2023-08-16
Filing date: 2023-08-16
Publication date: 2023-09-15
Anticipated expiration: 2043-08-16
Also published as: CN116756657B

Abstract

The invention discloses a kind ofThe method for detecting the fNIRS brain load based on CNN and Transformer comprises the steps of acquiring original data acquired by fNIRS acquisition equipment, and preprocessing the original data to obtain signals of oxyhemoglobin and deoxyhemoglobin concentrationsAndthe method comprises the steps of carrying out a first treatment on the surface of the For signalsAndperforming one-dimensional convolution operation, and combining the two signals subjected to the convolution operation in the channel dimension to obtain a combined signal Hb; extracting local fine granularity time features of the combined signal Hb by adopting a convolutional neural network to obtain a feature matrix; performing feature enhancement extraction on the feature matrix by adopting a transducer module to obtain state features; and inputting the state characteristics into a multi-layer perceptron classification layer to obtain a classification result of mental load detection.

Description

CNN and transducer-based fNIRS brain load detection method

Technical Field

The invention relates to the technical field of brain signal classification, in particular to an fNIRS mental load detection method based on CNN and a transducer.

Background

Mental load refers to the cognitive resources and mental effort required in performing cognitive tasks, and the level of mental load varies from task to task or context to context, as well as from individual to individual or population to population. The fNIRS is used as an optical nerve imaging technology for detecting brain functions and activities, is widely applied to brain load detection due to the advantages of portability, non-invasiveness and the like, monitors, analyzes and identifies the activities of brain areas through the fNIRS, is expected to provide objective basis for brain load assessment, and reduces dependence on subjective claims.

Most of the current brain load detection methods based on the functional near infrared spectroscopy (fNIRS) use single signal (such as oxyhemoglobin) input modeling, so that the waste of acquired data is caused and the relevance of the two signals is ignored. The recognition model is mainly based on the traditional machine learning and deep learning methods, and the traditional machine learning such as a support vector machine, a random forest and the like requires to manually extract corresponding features according to priori knowledge, and then input model classification; deep learning methods such as convolutional neural networks (Convolutional Neural Networks, CNN) are widely used for their strong local feature learning capability, and further such as long and short memory networks, the timing correlation is captured by filtering and retaining the timing signal through a gating unit. However, the fnigs signal is essentially a time series signal, and has strong time dependence, so that the existing model cannot have the capability of local feature learning and global correlation at the same time.

Disclosure of Invention

Aiming at the defects in the prior art, the CNN and Transformer-based fNIRS brain load detection method solves the problem of limited feature extraction by utilizing the capability of the CNN and Transformer module in combination with the feature lifting model to extract local and global features.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

provided is a CNN and transducer-based fNIRS brain load detection method, comprising the steps of:

s1, acquiring original data acquired by an fNIRS acquisition device, and preprocessing the original data to obtain signals of oxyhemoglobin and deoxyhemoglobin concentration and />；

S2, to the signal and />Proceeding withOne-dimensional convolution operation is carried out, and two signals after the convolution operation are combined in the channel dimension to obtain a combined signal Hb;

s3, extracting local fine granularity time features of the combined signal Hb by adopting a convolutional neural network to obtain a feature matrix;

s4, performing feature enhancement extraction on the feature matrix by adopting a transducer module to obtain state features;

s5, inputting the state characteristics into a multi-layer perceptron classification layer to obtain a classification result of mental load detection.

Further, the step S1 further includes:

s11, continuously acquiring the change quantity of infrared light intensity before and after transmitting and receiving by adopting an fNIRS acquisition device;

s12, converting the change quantity of the infrared light intensity into the relative change quantity of the concentration of oxyhemoglobin and deoxyhemoglobin by using a modified Bill relation equation:

wherein ,dfor the distance between the light source and the detectorFor being at wavelength +.>An extinction coefficient of the oxygenated hemoglobin; />For being at wavelength +.>An extinction coefficient at the deoxyhemoglobin concentration; />An extinction coefficient for oxyhemoglobin at wavelength λ2; />To deoxygenate blood at wavelength lambda 2Extinction coefficient of hemoglobin concentration;for passing->Optical density variation at wavelength λ1 after time, ">For passing->Optical density variation at wavelength λ2 after time;DPF/>for wavelength->Is a differential path length factor of (a);DPF/>for wavelength->Is a differential path length factor of (a); Δhbr and Δhbo are the relative amounts of change in the concentration of oxyhemoglobin and deoxyhemoglobin, respectively; />A time interval of two adjacent samples;

s13, filtering, dividing, correcting a base line and normalizing the relative change quantity of the concentration of the oxyhemoglobin and the concentration of the deoxyhemoglobin to obtain a signal and />。

The beneficial effects of the technical scheme are as follows: and obtaining normalized mental load data with high signal-to-noise ratio and meeting the requirements of classification tasks, thereby improving the accuracy of subsequent classification.

Further, to the signal and />The calculation formula for performing one-dimensional convolution operation is:

wherein ,for signal->//>Middle (f)iConvolution result of individual elements,/>For signal->/Middle (f)iAn element; />Is the first convolution kernelkThe number of elements to be added to the composition,kis the length of the convolution kernel;

the formula for combining the two signals after the convolution operation in the channel dimension is:

wherein Hb is the combined signal;C _s is a signal combining method;for signal->A convolved result;for signal->The result after convolution.

The beneficial effects of the technical scheme are as follows: the two chromophore signals are fully utilized, and the relevance is characterized, so that the accuracy of the network model in fNIRS brain load detection can be improved.

Further, the step S3 further includes:

s31, inputting a combined signal Hb into a bottleneck layer for dimension reduction operation, and then inputting three one-dimensional convolutions with different kernel lengths to extract mode information with different time scales;

s32, inputting the combined signal Hb into a maximum pooling layer with the core length of 3 and the input-output scale unchanged for filling, and then obtaining a pooling value through a bottleneck layer with the core length of 1;

s33, splicing mode information of different time scales with the pooling value, and reducing parameters of the model by adopting a bottleneck layer;

s34, carrying out batch normalization on the data output by the bottleneck layer in the step S33:

wherein ,is the firstlFeatures of the individual dimensions;μis->Mean value in the current lot; />Is->Standard deviation in the current lot; />Is a constant; />Is->Performing operation after batch normalization;

s35, normalizing the characteristics in batchesScaling and shifting operations are carried out, and a feature matrix is obtained:

wherein ,is a scaling parameter; />Is an offset parameter; />Is the firstlAnd the batch normalization results of the dimensions form a feature matrix.

The beneficial effects of the technical scheme are as follows: the time characteristics of different resolutions of mental load data are extracted, multi-scale time information is effectively fused, and the characteristic extraction capacity of a network is enhanced.

Further, the transducer module comprises two sub-layers of a multi-head self-attention and feed-forward neural network, residual connection and layer normalization are arranged behind each sub-layer, and the transducer module adds position embedding PE and CLS marks to the feature matrix at the beginning, wherein the position embedding PE is used as position coding, the CLS marks are block embedding, and the block embedding passes through all layers of the transducer encoder.

Further, the method for processing the feature matrix by the multi-head self-attention comprises the following steps:

a1, respectively adopting three linear layers to perform linear transformation on the feature matrix to obtain a query vector matrixQKey value vector matrixKMatrix of value vectorsV：

，/>，/>

wherein ,、/>、/>are trainable linear transformation parameter matrixes; />Is a feature matrix;

a2, calculating a query vector matrixQKey value vector matrixKObtaining a correlation weight;

a3, scaling and softmax operation is carried out on the correlation weight, and then the weight and value vector matrix obtained by softmax is obtainedVMultiplying to obtain weighted value matrix：

wherein ,、/> and />Respectively an mth self-attention query vector matrix, a key vector matrix and a value vector matrix;his the total number of self-attentions; />Feature dimensions that are a matrix of trainable linear transformation parameters;Tperforming matrix transposition operation; />Is multi-head self-attentionmAn output of the individual self-attentiveness; />Operate for softmax;hthe total number of self-attentions in the multi-head self-attentions;

a4, splicing a plurality of self-attention outputs:

wherein MHSA is a concatenation of multiple self-attentive outputs;is the firsthAn output of the individual self-attentiveness;is a trainable parameter matrix; />Is a matrix connection operation;

and A5, inputting the splicing result into a linear layer for linear conversion to obtain the output of the multi-head self-attention module.

The beneficial effects of the technical scheme are as follows: different characteristics and modes in input data can be captured in parallel, the expression of the two signal correlation characteristics is effectively enhanced, and the problem of insufficient data quantity is solved.

Further, the formula of the residual block in the residual connection and layer normalization is:

wherein ,Mthe output of the multi-head self-attention module;mapping for residual errors; />Output of the residual block;

the calculation formula of the layer normalization is as follows:

wherein ,is the average of the outputs of the residual blocks; />Variance of the output for the residual block; />Is a constant;γ and />Respectively a leachable scaling factor and an offset parameter; />Results of layer normalization.

The beneficial effects of the technical scheme are as follows: residual connection is used for solving the problems of gradient disappearance and gradient explosion in deep network training, and layer normalization is used for improving the stability and convergence speed of the network.

Further, the formula of the activation function of the feedforward neural network is:

wherein ,as a hyperbolic tangent function; />Is an activation function;

the feedforward neural network is as follows:

wherein ,is the output of the feedforward neural network; /> and />A weight matrix for two linear layers; and />Is the bias of two linear layers.

The beneficial effects of the technical scheme are as follows: the global perception capability of the CNN network is enhanced, so that the model has both local and global feature learning capability.

Further, the multi-layer perceptron classification layer is expressed as:

wherein ,ythe mental load detection probability is output for the multi-layer perceptron classification layer;Wis a weight matrix;bis biased toPlacing;is the output of the transducer module.

Further, the objective function of the fNIRS brain load detection method is a cross entropy loss function:

wherein ,y _i for the sampleiIs used to determine the true tag value of (c),for the sampleiIs used to predict the tag value of the (c),nis the total number of samples.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) In the brain load representation extraction of the fNIRS signals, a deep learning network model is used, so that the complicated operation of manually extracting features in the traditional machine learning method is overcome.

(2) The invention provides a convolution-based signal combination method. Since the concentration of deoxyhemoglobin decreases if the concentration of oxyhemoglobin increases at the time of brain loading, the relative change of the two signals helps to classify the brain loading state of a human. The method provided by the invention can effectively enhance the expression of the correlation characteristics of two signals and solve the problem of insufficient data quantity.

(3) According to the invention, two networks of CNN and Transformer are effectively combined, CNN is focused on learning local time characteristics, transformer is focused on improving global perception capability of model learning characteristics, signal characteristic expression is enriched, and recognition performance is improved.

Drawings

FIG. 1 is a schematic flow chart of a CNN and transducer-based fNIRS brain load detection method provided by the invention;

FIG. 2 is a schematic diagram of a method for combining two chromophore signals provided by the present invention;

FIG. 3 is a block diagram of a convolutional network in an embodiment of the present invention;

FIG. 4 is a functional block diagram of a transducer module in an embodiment of the present invention;

FIG. 5 is a block diagram of multi-headed self-attention in a transducer module;

fig. 6 is a block diagram of self-attention among multi-headed self-attention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

Aiming at the difficult problem of small fNIRS collection amount and the requirement of deep learning of big data, the scheme explores the common input modeling of two chromophore signals, processes the two signals respectively through one-dimensional convolution and then combines the signals, and better extracts the correlation characteristics of the two signals. Finally, a mode of combining CNN and a transducer is adopted to realize multi-scale local feature extraction and global feature perception capability, and then recognition of the brain mental load state is realized through a multi-layer perceptron classification layer, and the overall flow chart is shown in figure 1.

The method for detecting the fNIRS brain load based on the CNN and the Transformer comprises the following steps of S1 to S6:

in step S1, raw data acquired by the fNIRS acquisition equipment is acquired, and the raw data is preprocessed to obtain signals of the concentration of oxyhemoglobin and deoxyhemoglobin and />；

In implementation, the preferred step S1 of the present embodiment further includes:

wherein ,dis the distance between the light source and the detector;for being at wavelength +.>An extinction coefficient of the oxygenated hemoglobin; />For being at wavelength +.>An extinction coefficient at the deoxyhemoglobin concentration; />An extinction coefficient for oxyhemoglobin at wavelength λ2; />An extinction coefficient that is the concentration of deoxyhemoglobin at wavelength λ2;for passing->Optical density variation at wavelength λ1 after time, ">For passing->Optical density variation at wavelength λ2 after time;DPF/>for wavelength->Is a differential path length factor of (a);DPF/>for wavelength->Is a differential path length factor of (a); Δhbr and Δhbo are the relative amounts of change in the concentration of oxyhemoglobin and deoxyhemoglobin, respectively; />A time interval of two adjacent samples;

s13, filtering, dividing, correcting a base line and normalizing the relative change quantity of the concentration of the oxyhemoglobin and the concentration of the deoxyhemoglobin to obtain a signal and />：

For converted、/>Filtering, wherein the parameters of the filtering are set to be fourth-order, 0.01-0.1Hz Butterworth band-pass filtering so as to remove false signals caused by respiration, blood pressure, heartbeat and the like; the fnigs signal from 5s before stimulation to 20s after stimulation is split as a cycle after filtering, called a stimulus response.

Baseline correction was performed using pre-stimulus 5s data as baseline, baseline Correction (BC) method was subtracting the mean value of the reference interval from the fNIRS signal; finally, all channel signals were normalized using the Z-Score normalization to accelerate gradient descent to achieve convergence faster. The normalization operation is expressed as:

wherein ,input signals for the individual channels, < > for>For the average value of the input signal, +.>For standard deviation of the input signal +.>For outputting the signal, i.e. the whole pre-processed data, recorded as +.> and />They respectively haveMA plurality of channel signals, and each channel signal is N sampling points,M、Nis a positive integer.

In step S2, the signal is processed and />Performing one-dimensional convolution operation:

then, the two signals after the convolution operation are combined in the channel dimension to obtain a combined signal Hb:

Step S2 aims to fully utilize the collected chromophore signals and correlate the two chromophore signals, so that the subsequent recognition model can learn the correlation characteristics better, and the combining process of the two chromophore signals can refer to fig. 2. In the scheme and />Is of the dimension ofM×NThe dimension of the combined signal Hb isM×2NThe method comprises the steps of carrying out a first treatment on the surface of the The one-dimensional convolution operation parameter designates the core length as 3, and the filling mode ensures that the input and the output are the same, so that the dimensions of the delta CHbO and the delta CHbR are kept unchanged.

In step S3, extracting local fine granularity time features of the combined signal Hb by adopting a convolutional neural network to obtain a feature matrix;

as shown in fig. 3, in implementation, the preferred step S3 further includes:

s31, inputting the combined signal Hb into a bottleneck layer for dimension reduction operation, wherein the bottleneck layer is formed by one-dimensional convolution with the kernel length of 1, so that the dimension of an input time sequence can be reduced, a network model is kept at a reasonable scale, and the problem of over fitting is solved.

Then three one-dimensional convolutions with different kernel lengths are input to extract mode information with different time scales; the convolution filling mode is that the input and output scale is unchanged, and the lengths of the three convolution kernels are N//4, N//8 and N//16 respectively.

In step S4, a transducer module is adopted to perform feature enhancement extraction on the feature matrix to obtain state features; as shown in fig. 4, the transducer module includes two sub-layers of a multi-head self-attention and feed-forward neural network, and a residual connection and layer normalization are set after each sub-layer, so as to avoid network degradation and accelerate convergence.

The formula of the activation function of the feedforward neural network is:

wherein ,as a hyperbolic tangent function; />Is an activation function;

the feedforward neural network is as follows:

The transducer module adds position embedded PE and CLS marks to the feature matrix at the beginning, wherein the position embedded PE is used as a position code so that the transducer model can understand the sequence relation in the sequence; CLS is marked as block embedded, which passes through all layers of the transducer encoder and is used for the final classification task.

In the scheme, the multi-head self-attention is used for establishing global association in the sequences, capturing the dependency relationship between the sequences and extracting global features. Reference toFIG. 5, multi-headed self-attention byhA self-attention component and keeping the input and output dimensions consistent by a linear layer, the linear layer scaling factor being the number of self-attentionsh。

In one embodiment of the present invention, a method for processing a feature matrix by using multiple self-attentions may refer to fig. 5 and fig. 6, and the specific method includes:

，/>，/>

wherein ,、/> and />Respectively the firstmA query vector matrix, a key vector matrix, and a value vector matrix of the self-attentions;hthe total number of self-attentions in the multi-head self-attentions; />Feature dimensions that are a matrix of trainable linear transformation parameters;Tperforming matrix transposition operation; />Is the firstmAn output of the individual self-attentiveness; />Operate for softmax;hthe total number of self-attentions in the multi-head self-attentions;

the scheme scales the correlation weight, can avoid overlarge dot product result, and adopts the square root of characteristic dimension as a scaling factor.

A4, splicing a plurality of self-attention outputs:

The formula of the residual block in the residual connection and layer normalization is:

the calculation formula of the layer normalization is as follows:

In step S5, inputting the state characteristics into a multi-layer perceptron classification layer to obtain a classification result of mental load detection; the multi-layer perceptron classification layer is expressed as:

wherein ,ythe mental load detection probability is output for the multi-layer perceptron classification layer;Wis a weight matrix;bis biased;is the output of the transducer module.

The target function of the fNIRS brain load detection method is a cross entropy loss function:

The multi-layer perceptron of the scheme receives the output of the step S4 and generates corresponding output to obtain a one-dimensional feature vector with the length of 64; the input features are then linearly transformed by a linear layer, in this embodiment, the linear layer has an input dimension of 64 and an output dimension of 2, and the output of the linear layer is converted into probability distribution using softmax operation for multi-class classification.

In summary, the method is based on the original data collected by the fNIRS, and utilizes the negative correlation characteristics of the two chromophores under the brain load to input different signals in a combined way after a series of pretreatment operations, so that the problems of small data quantity and low data utilization rate of the collected fNIRS are solved; and the problem of limitation of fNIRS feature extraction is solved by combining the local perceptibility of CNN and the global self-adaptive association of a transducer, so that the brain mental load detection efficiency is improved, and the practical application of fNIRS in a brain-computer interface is facilitated.

Claims

1. The method for detecting the fNIRS brain load based on the CNN and the transducer is characterized by comprising the following steps of:

S2, to the signal and />Performing one-dimensional convolution operation, and combining the two signals subjected to the convolution operation in the channel dimension to obtain a combined signal Hb;

2. The CNN and fransformer-based fnrs brain load detection method according to claim 1, wherein the step S1 further comprises:

wherein ,dis the distance between the light source and the detector; wavelength ofFor being at wavelength +.>An extinction coefficient of the oxygenated hemoglobin; />For being at wavelength +.>An extinction coefficient at the deoxyhemoglobin concentration; />An extinction coefficient for oxyhemoglobin at wavelength λ2; />An extinction coefficient that is the concentration of deoxyhemoglobin at wavelength λ2;for passing->Optical density variation at wavelength λ1 after time, ">For passing->Optical density variation at wavelength λ2 after time;DPF/>for wavelength->Is a differential path length factor of (a);DPF/>for wavelength->Is a differential path length factor of (a); Δhbr and Δhbo are the relative amounts of change in the concentration of oxyhemoglobin and deoxyhemoglobin, respectively; wavelength->A time interval of two adjacent samples; s13, filtering, dividing, correcting a base line and normalizing the relative change quantity of the concentration of the oxyhemoglobin and the concentration of the deoxyhemoglobin to obtain a signal +.> and />。

3. The CNN and transducer-based fnrs brain load detection method according to claim 1, wherein the signal is a signal and />The calculation formula for performing one-dimensional convolution operation is:

wherein ,for signal->//>Middle (f)iConvolution result of individual elements,/>For signal->//>Middle (f)iAn element; />Is the first convolution kernelkThe number of elements to be added to the composition,kis the length of the convolution kernel;

4. The CNN and fransformer-based fnrs brain load detection method according to claim 1, wherein the step S3 further comprises:

5. The CNN and fransformer based fnrs brain load detection method of claim 1, wherein the fransformer module comprises two sub-layers of a multi-headed self-attention and feed forward neural network, each sub-layer is followed by a residual connection and layer normalization, the fransformer module initially adds a position-embedded PE as a position code and a CLS tag as a block-embedded, which passes through all layers of the fransformer encoder, to the feature matrix.

6. The CNN and fransformer-based fnrs brain load detection method of claim 5, wherein the multi-headed self-attention feature matrix processing method comprises:

，/>，/>

a4, splicing a plurality of self-attention outputs:

wherein MHSA is a concatenation of multiple self-attentive outputs;is the firsthAn output of the individual self-attentiveness; />Is a trainable parameter matrix; />Is a matrix connection operation;

7. The CNN and fransformer-based fnrs brain load detection method of claim 5, wherein the residual block in the residual connection and layer normalization is formulated as:

the calculation formula of the layer normalization is as follows:

8. The CNN and fransformer-based fnrs brain load detection method of claim 5, wherein the formula of the activation function of the feedforward neural network is:

wherein ,as a hyperbolic tangent function; />Is an activation function;

the feedforward neural network is as follows:

wherein ,is the output of the feedforward neural network; /> and />A weight matrix for two linear layers; />Andis the bias of two linear layers.

9. The CNN and fransformer-based fnrs brain load detection method of claim 1, wherein the multi-layer perceptron classification layer is expressed as:

10. The CNN and fransformer-based fnrs brain load detection method of claim 1, wherein the objective function of the fnrs brain load detection method is a cross entropy loss function:

wherein ,y _i for mental load data samplesiIs used to determine the true tag value of (c),for mental load data samplesiIs used to predict the tag value of the (c),nis the total number of samples.