CN115836868A

CN115836868A - Driver fatigue state identification method based on multi-scale convolution kernel size CNN

Info

Publication number: CN115836868A
Application number: CN202211488681.1A
Authority: CN
Inventors: 付荣荣; 侯启恩
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-03-24

Abstract

The invention discloses a driver fatigue state identification method based on a multi-scale convolution kernel size CNN, which comprises the following steps: s1, data preparation: preprocessing the acquired electroencephalogram signals to obtain standard format data; s2, data enhancement: performing data enhancement on the original data by adopting a frequency masking and frequency domain recombination algorithm; s3: training a model, namely training a data set formed by original data and enhanced data by adopting a multi-scale convolution kernel size mixed CNN model to obtain a classifier; s4, state identification: inputting the preprocessed electroencephalogram data into a classifier model to obtain a state label of a sample and an interpretable model classification basis; the method improves the classification performance of the model, and realizes higher accuracy in the task of identifying the fatigue state by the data set of the continuous attention driving task; and two data enhancement methods of adding frequency noise and frequency masking are designed to be integrated with the CNN model, so that the generalization capability of the model is further improved.

Description

Driver fatigue state identification method based on multi-scale convolution kernel size CNN

Technical Field

The invention relates to a driver fatigue state identification method based on a multi-scale convolution kernel size CNN, and belongs to the technical field of electroencephalogram signal processing.

Background

Along with the frequent occurrence of various malignant traffic accidents, people pay more and more attention to the driving safety. Wherein, driving fatigue can directly cause traffic accidents, which causes serious harm to the life and property safety of people. Driving fatigue refers to a phenomenon in which a driver has reduced physical functions due to insufficient rest or prolonged driving, and is usually manifested as fatigue in the mind of the driver. And the fatigue state of the driver can be accurately identified, so that the occurrence of traffic accidents can be effectively reduced.

Research shows that electroencephalogram (EEG) of a driver in a fatigue state is obviously different from a non-fatigue state. The identification of fatigue state from brain electrical signals by deep learning method has been widely studied. However, because electroencephalogram signals of different subjects and different recording periods are very different, a convolutional neural network with a single convolution scale cannot be adapted to electroencephalogram signals of different subjects at the same time, and the fatigue electroencephalogram state identification accuracy of the model is to be further improved. In the fatigue state identification of the electroencephalogram signals crossing the tested object, the time characteristics, the frequency characteristics and the spatial characteristics of the electroencephalogram signals are fully extracted, and the method is very important for accurately identifying the fatigue state.

For the deep learning algorithm, the accuracy of electroencephalogram signal classification is not only related to the performance of the designed network structure, but also to a great extent related to the quantity of data quantity available for electroencephalogram signal training. When the training data volume is limited, the neural network model is easy to generate an overfitting phenomenon in the training process, so that the classification accuracy of the model on a test set is low. In order to solve the neural network model overfitting phenomenon caused by the limited data volume of the electroencephalogram signals, the first thought solution is to collect more training data as much as possible. However, in most cases, it is difficult or even impossible to acquire more brain electrical signals. Therefore, how to fully utilize the acquired electroencephalogram data is very important to generate more data through a data enhancement method.

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a driver fatigue state identification method based on a multi-scale convolution kernel size CNN, a convolution neural network with a mixed multi-scale convolution kernel size is designed to identify the driver fatigue state, the classification performance of a fatigue state identification model is improved, and meanwhile, the classification performance and the generalization capability of the model are further improved by integrating a data enhancement method adopting frequency masking and adding frequency domain noise with the multi-scale convolution kernel size CNN model.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

the driver fatigue state identification method based on the multi-scale convolution kernel size CNN comprises the following steps:

s1, data preparation: preprocessing the acquired electroencephalogram signals to obtain standard format data;

s2, data enhancement: performing data enhancement on the original data by adopting a frequency masking or frequency domain noise adding algorithm;

s3: training a model, namely training a data set formed by original data and enhanced data by adopting a multi-scale convolution kernel size mixed CNN model to obtain a classifier;

s4, state identification: and inputting the preprocessed electroencephalogram data into a classifier model to obtain a state label of the sample and an interpretable model classification basis.

The technical scheme of the invention is further improved as follows: the specific operation of the step S1 is as follows:

s11, reducing the sampling rate of the original data set to 128Hz, and extracting electroencephalogram samples of each track 3S before the deviation event occurs;

s12, calculating the local reaction time R of each sample _t ：

R _t ＝t _d -t _r (21)，

Wherein, t _d Time of drift of the vehicle, t _r The time when the subject operates the automobile to return to the original lane;

s13, calculating the global reaction time GR of each sample _t ：

Wherein N is the number of samples in a window of 90s before each sample automobile drift event occurs, and GR _t Namely the local reaction time R in the window of 90 seconds before the occurrence of the automobile drift event _t Average value of (d);

s14, defining baseline wakefulness response time R _t ^alert ：

The fifth percentile of the local response time in each session was taken as the baseline wakefulness response time R _t ^alert As the basis for marking the sample in the next step;

s15, labeling each sample:

when the local reaction time and the global reaction time of the sample are both less than 1.5 times the baseline wake time, the sample is marked as alert; the sample is labeled as fatigue when the local and global reaction times of the sample are both greater than 2.5 times the baseline wakeful reaction time.

The technical scheme of the invention is further improved as follows: after the marking in step S15 is completed, 2022 samples are obtained, each sample containing 3S of electroencephalogram data.

The technical scheme of the invention is further improved as follows: the specific operation of the frequency masking enhancement data in step S2 is:

s21, carrying out fast Fourier transform on the original signal X (t) to obtain a frequency domain signal X (jw)

X(jw)＝F(x(t)) (23)，

Wherein F (-) represents a fast Fourier transform;

s22, determining hyper-parameters S and t, wherein S represents the number of masking points, t represents the number of masking areas, and setting 20 frequency points randomly selected from one area to zero to obtain enhanced data.

The technical scheme of the invention is further improved as follows: the specific operation of adding the frequency domain noise algorithm enhanced data in the step S2 is as follows:

H(jw)e ^jωt ＝F[x(t)] (21)，

where H (jw) = | X (jw) |, e ^jωt ＝Arg[X(jw)](i.e., phase of frequency domain signal), amplitude and phase additive noise G _i (λ)～(0,σ _i ² )(i＝0,1)：

Obtaining an enhanced time domain signal after Fourier inverse transformation:

x _noise (t)＝F ^-1 (X _noise ) (23)。

the technical scheme of the invention is further improved as follows: the specific operation of the step S3 is:

s31, data filtering: suppose that the original EEG signals recorded by m electrodes are X = { X = _i } _i＝1,2,...m Performing band-pass filtering on the X to obtain three EEG signals X with different frequency bands ₁ (4-7Hz),X ₂ (8-13Hz),X ₃ (13-32Hz)；

S32, determining the size of a convolution kernel:

let convolution kernel size K = { K = } ₁ ,K ₂ ,K ₃ In which K is _i (i =1,2,3) convolution kernel size of Depthwise convolution for the ith branch;

s33, network structure design:

the first branch of the network structure, the input signal being the first frequency band X of the original data brain electrical signal _1(m,n) Wherein m is the number of electroencephalogram channels, 30, n is the number of sampling points of each sample, and 384 is taken;

as follows appear

A parameter representing a network layer x;

the output of the first layer Pointwise is:

wherein i =1,2,3 ₁ ，N ₁ =16, number of Pointwise convolutions,

represents the weight of the p channel of the ith Pointwise convolution, </R>

A p-th channel, representing a j-th sample point of an input electroencephalogram signal sample, is->

A bias representing the ith Pointwise convolution; passes through a Pointwise layer to obtain an output->

Dimension (16, 384);

the first branch of the second layer Depthwise, the output signal dimension of the first layer is (16, 384), the signal obtained from the first layer has 16 channels, each channel adopts two Depthwise convolutions, so the number of channels output by the first branch is 32, and the number of sampling points j output by the first branch is j ⁽²⁾ Can be calculated from

Wherein the convolution kernel size K ₁ =36, step size stride =1, padding =0, resulting in j ⁽²⁾ ＝349；

When i is odd, the output of the second layer Depthwise is:

when i is an even number, the output of the second layer Depthwise is:

wherein K ₁ Is the convolution kernel size of Depthwise, i is the number of input channels, j is the number of sampling points,

third active layer:

the fourth layer is a batch normalization layer:

fifth global average pooling layer:

second branch of Depthwise layer, convolution kernel size K ₂ =51 repeating equation (4) and subsequent steps, wherein j for the second branch is calculated from equation (4) ⁽²⁾ ＝334；

Third branch of Depthwise layer, convolution kernel size K ₃ =80, repeating the formula (4) and the following steps, wherein j of the third branch is calculated from the formula (4) ⁽³⁾ ＝305；

The network structure of the first branch of the network structure is explained above, and the branch processes the EEG signal X of the 4-7Hz frequency band ₁ For electroencephalogram signal X in 8-13Hz frequency band ₂ Repeating the operation of the first branch of the network by the electroencephalogram signals of the 13-32Hz frequency band to obtain the output of the second branch and the third branch of the network, and finally carrying out full connection processing on the output of the three branches of the network and then carrying out the following processing:

sixth hidden layer:

c =0 or c =1 in the above formula, and represents an awake state when c = 0; when c =1, represents a fatigue state;

the seventh layer

And obtaining a classification result.

The technical scheme of the invention is further improved as follows: the specific operation of the step S4 is:

s41, adopting 11-fold cross validation

Selecting a first subject as a test set, taking electroencephalogram data except the first subject as a training set, and calculating the identification accuracy ACC of the model on the first subject ₁ (ii) a The steps are adopted for the rest subjects, and the identification accuracy rate ACC of 11 subjects is finally obtained _i (i＝1,2...11)；

S42, calculating average accuracy:

wherein n =11 represents the number of subjects;

s43, interpretability analysis

Positioning a distinguishing area of each input sample of a CNN model for solving a classification task by adopting a class activation mapping method;

assuming that a given electroencephalogram sample X (m, n) is classified with a label c, c being 0 for awake state and c being 1 for fatigue state, the input sample produces an activation h at the sixth level of the network _c ⁽⁶⁾ From equation (9), we can obtain:

wherein

Further neglecting the constant (n-K + 1) to obtain

M _i,j Can be regarded as 2N in size ₁ Final activation layer of class c in mapping of x (n-K + 1)

The distribution of (a);

in the above formula, σ is a constant for determining the radius of the area of influence of each discrimination point in the input signal;

further normalized to the range of (-1, 1) for visualization;

according to (3) and (5), (6), when i _k When it is odd

When i is _k When it is even number

When i is _k When it is even, ignore

From a point of time j _k To j _k One set of local area input signals of + K-1 results, i.e. convolution signals of m =30 channels ≥>

Is weighted, the weight of the p-th channel is->

(i _k Odd), or +>

(i _k Is an even number); accordingly, is present>

Position (i) of _k ,j _k ) It is possible to trace back to the center (p) of the strongest contribution set in the input signal _k ,q _k ) When i is _k When it is odd, p _k The expression is as follows:

when i is _k When it is an even number, p _k The expression is as follows:

and q is _k ＝j _k +(l-1)/2

In the formula (17)

So as to highlight the entire set of strongest contributing signals at the discriminated position of the input signal.

Due to the adoption of the technical scheme, the invention has the technical progress that:

the invention designs a multi-scale convolution kernel size mixed convolution neural network to identify the fatigue state of a driver, obtains 2% of classification performance improvement in continuous attention fatigue driving data concentration, improves the classification performance of a fatigue state identification model, and further improves the classification performance and generalization capability of the model by integrating a frequency masking and frequency domain noise adding data enhancement method and a multi-scale convolution kernel size CNN model.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of the frequency masking of the present invention;

FIG. 3 is a diagram of a CNN model architecture for multi-scale convolution kernel sizes in accordance with the present invention;

FIG. 4 is an overall schematic view of the present invention;

FIG. 5 is a t-SNE visualization graph of classification results of electroencephalogram data of a subject 1 through a multi-scale convolution kernel size CNN model in an output layer in embodiment 2 of the present invention;

FIG. 6 is a t-SNE visualization diagram of classification results of electroencephalogram data of the subject 1 through an interpretable CNN model in an output layer in the embodiment 2 of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples:

example 1:

a driver fatigue state identification method based on a multi-scale convolution kernel size CNN, fig. 1 is a schematic flow diagram of the method, and fig. 4 is a schematic overall diagram of the method.

The specific operation steps are as follows:

s1, data preparation: preprocessing the acquired electroencephalogram signals to obtain eleven subjects and 2022 samples, wherein the samples are shown in table 1:

/>

TABLE 1

The method is divided into the following 5 sub-steps:

s11, reducing the sampling rate of the original data set to 128Hz, and extracting electroencephalogram samples of each track 3S before the deviation event occurs.

S12, calculating the local reaction time R of each sample _t ：

R _t ＝t _d -t _r (41)，

Wherein t is _d Time of vehicle drift, t _r The time when the subject operated the car to return to the original lane.

S13, calculating the global reaction time GR of each sample _t ：

Wherein N is the number of samples in a window of 90s before each sample automobile drift event occurs, and GR _t Namely the local reaction time R in the window of 90 seconds before the occurrence of the automobile drift event _t Average value of (a).

S14, defining baseline wakefulness response time R _t ^alert ：

The fifth percentile of the local response time in each session was taken as the baseline wakefulness response time R _t ^alert And the result is used as the basis for marking the sample in the next step.

S15, labeling each sample:

when the local reaction time and the global reaction time of the sample are both less than 1.5 times the baseline wake time, the sample is marked as alert; the sample is labeled as fatigue when the local and global reaction times of the sample are both greater than 2.5 times the baseline wakeful reaction time. After labeling was completed, 2022 samples were obtained, each containing 3s of electroencephalographic data.

S2, data enhancement: the example uses frequency masking for data enhancement of raw data

Fast Fourier transform is carried out on the original signal X (t) to obtain a frequency domain signal X (jw)

X(jw)＝F(x(t)) (43)，

Where F (-) represents a fast Fourier transform.

A schematic diagram of frequency masking is shown in fig. 2.

Determining hyper-parameters s and t, wherein s represents the number of masking points, t represents the number of masking areas, and t =1,s =20 is taken in the example; and (3) setting 20 frequency points randomly selected from a region to zero, and performing Fourier inverse transformation to obtain a time domain signal to obtain enhanced data.

S3, model training: training a data set formed by original data and enhanced data by adopting a multi-scale convolution kernel size mixed CNN model to obtain a classifier

S31, data filtering: suppose that the original EEG signals recorded by m electrodes are X = { X = _i } _i＝1,2,...m Performing band-pass filtering on the X to obtain three EEG signals X with different frequency bands ₁ (4-7Hz),X ₂ (8-13Hz),X ₃ (13-32Hz)。

S32, determining the size of a convolution kernel:

let convolution kernel size K = { K = } ₁ ,K ₂ ,K ₃ In which K is _i (i =1,2,3) is the convolution kernel size of the Depthwise convolution of the ith branch. Convolution kernel size K adopted in this example ₁ ＝36,K ₂ ＝51,K ₃ ＝80

S33, network structure design:

the first branch of the network structure, the input signal being the first frequency band X of the original data brain electrical signal _1(m,n) Wherein m =30 is the number of electroencephalogram channels, and n =384 is the number of sampling points of each sample. All numbers in parentheses below indicate the number of network layers, e.g.

Representing a parameter of a first layer of the network.

The output of the first layer Pointwise is:

wherein i =1,2,3 ₁ ，N ₁ =16, number of Pointwise convolutions,

represents the weight of the p channel of the ith Pointwise convolution, </R>

Representing the j-th sample point of the input electroencephalogram signal sampleThe p-th channel>

Represents the bias of the ith Pointwise convolution. Passes through a Pointwise layer to obtain an output->

The dimension is (16,384).

Wherein the convolution kernel size K ₁ =36, step size stride =1, padding =0, resulting in j ⁽²⁾ ＝349

When i is odd, the output of the second layer Depthwise is:

when i is even, the output of the second layer Depthwise is:

wherein K is ₁ Is the convolution kernel size of Depthwise, i is the number of input channels, j is the number of sampling points,

third active layer:

the fourth layer is a batch normalization layer:

fifth global average pooling layer:

second branch of Depthwise layer, convolution kernel size K ₂ =51 repeating equation (4) and subsequent steps, wherein j for the second branch is calculated from equation (4) ⁽²⁾ ＝334。

Third branch of Depthwise layer, convolution kernel size K ₃ =80, repeating equation (4) and subsequent steps, wherein j for the third branch is calculated from equation (4) ⁽³⁾ ＝305。

As shown in fig. 3, the three branches of Depthwise are all connected.

the sixth hidden layer

C =0 or c =1 in the above formula, and represents an awake state when c = 0; when c =1, the fatigue state is represented.

The seventh layer

And obtaining a classification result.

S4, state identification: inputting the preprocessed electroencephalogram data into a classifier model to obtain a state label of a sample and an interpretable model classification basis;

s41, adopting 11-fold cross validation

Selecting a first subject as a test set, taking electroencephalogram data except the first subject as a training set, and calculating the identification accuracy ACC of the model on the first subject ₁ (ii) a The steps are adopted for the rest subjects, and the identification accuracy rate ACC of 11 subjects is finally obtained _i (i＝1,2...11)。

S42, calculating average accuracy:

where n =11, the number of subjects is indicated.

S43, interpretability analysis

And positioning the distinguishing area of each input sample of the CNN model for solving the classification task by adopting a class activation mapping method.

Assuming that a given electroencephalogram sample X (30, 384) is classified with the label c, c being 0 for awake state and c being 1 for fatigue state, the input sample produces an activation h at the sixth level of the network _c ⁽⁶⁾ From equation (9), we can obtain:

wherein

Further neglecting the constant (n-K + 1) to obtain

M _i,j Can be regarded asIs of size 2N ₁ Final activation layer of class c in mapping of x (n-K + 1)

Distribution of (2).

/>

In the above equation, σ is a constant for determining the radius of the region of influence of each discrimination point in the input signal.

Further normalization was in the range of (-1,1) for visualization.

According to (3) and (5), (6), when i _k When it is odd

When i is _k When it is even number

When i is _k When it is even, ignore

Is weighted, the weight of the p-th channel is->

(i _k Odd), or +>

(i _k Is an even number). Accordingly, are combined>

when i is _k When it is an even number, p _k The expression is as follows:

and q is _k ＝j _k +(l-1)/2

In formula (17)

Example 2:

this example introduces a data enhancement method of adding frequency domain noise in a fatigue state recognition task, and the processing steps of steps S1, S3, and S4 are all the same as those of embodiment 1:

s2, data enhancement: in the embodiment, the frequency domain reorganization algorithm is adopted to perform data enhancement on the original data

Fast Fourier transform is carried out on the original signal x (t) to obtain a frequency domain signal

H(jw)e ^jωt ＝X(jw)＝F[x(t)] (21)，

Obtaining an enhanced time domain signal after Fourier inverse transformation:

x _noise (t)＝F ^-1 (X _noise ) (23)，

in order to embody the performance of the algorithm of the present invention, the experimental results of the multi-scale convolution kernel size CNN of the present invention are compared with the results of the Conv-bellownet, EEGNet, and interpretiblecnn network structures, as shown in table 2, where the bold numbers are the optimal recognition accuracy of each experimental object.

t-SNE visualization of the classification result of the electroencephalogram data of the subject 1 through the multi-scale convolution kernel size CNN and the interpretable CNN model on the output layer is shown in FIG. 5 and FIG. 6, and the remarkable advantage of the multi-scale convolution kernel size CNN model on the classification effect can be seen.

TABLE 2

Finally, it should be noted that: the above-mentioned embodiments are only used for illustrating the technical solution of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. The method for identifying the fatigue state of the driver based on the multi-scale convolution kernel size CNN is characterized by comprising the following steps of:

2. The method for identifying the fatigue state of the driver based on the multi-scale convolution kernel size CNN as claimed in claim 1, wherein the specific operations of step S1 are as follows:

s12, calculating the local reaction time R of each sample _t ：

R _t ＝t _d -t _r (1)，

s13, calculating the global reaction time GR of each sample _t ：

Wherein N is the number of samples in a window of 90s before each sample automobile drift event occurs, and GR _t Namely the local reaction time R in the window of 90 seconds before the occurrence of the automobile drift event _t Average value of (a);

s14, defining baseline wakefulness response time R _t ^alert ：

s15, labeling each sample:

3. The method for identifying the fatigue status of the driver based on the multi-scale convolution kernel size CNN as claimed in claim 2, wherein 2022 samples are obtained after the marking in step S15 is completed, and each sample contains 3S electroencephalogram data.

4. The method for identifying the fatigue state of the driver based on the multi-scale convolution kernel size CNN as claimed in claim 1, wherein the specific operations of the frequency masking enhancement data in the step S2 are as follows:

X(jw)＝F(x(t)) (3)，

Wherein F (-) represents a fast Fourier transform;

s22, determining hyper-parameters S and t, wherein S represents the number of masking points, t represents the number of masking areas, and setting 20 frequency points of one area randomly selected to zero to obtain enhanced data.

5. The method for identifying the fatigue state of the driver based on the multi-scale convolution kernel size CNN as claimed in claim 1, wherein the specific operation of adding the frequency domain noise algorithm enhanced data in step S2 is:

H(jw)e ^jωt ＝F[x(t)] (21)，

Obtaining an enhanced time domain signal after Fourier inverse transformation:

x _noise (t)＝F ^-1 (X _noise ) (23)。

6. the method for identifying the fatigue state of the driver based on the multi-scale convolution kernel size CNN as claimed in claim 1, wherein the specific operation of step S3 is:

S32, determining the size of a convolution kernel:

let convolution kernel size K = { K = } ₁ ,K ₂ ,K ₃ In which K is _i (i =1,2,3) the convolution kernel size of the Depthwise convolution for the ith branch;

s33, network structure design:

as follows appear

A parameter representing a network layer x;

the output of the first layer Pointwise is:

wherein i =1,2,3 ₁ ，N ₁ =16, number of Pointwise convolutions,

weight of p channel representing ith Pointwise convolution，

A p-th channel representing a j-th sample point of the input brain electrical signal sample,

a bias representing the ith Pointwise convolution; passing through Pointwise layer to obtain output

Dimension (16, 384);

When i is odd, the output of the second layer Depthwise is:

when i is even, the output of the second layer Depthwise is:

third active layer:

the fourth layer is a batch normalization layer:

fifth global average pooling layer:

Third branch of Depthwise layer, convolution kernel size K ₃ =80, repeating equation (4) and subsequent steps, wherein j for the third branch is calculated from equation (4) ⁽³⁾ ＝305；

The network structure of the first branch of the network structure is explained above, and the branch processes the EEG signal X of the 4-7Hz frequency band ₁ For 8-13Hz frequency band EEG signal X ₂ And repeating the operation of the first branch of the network by the electroencephalogram signals with the frequency band of 13-32Hz to obtain the output of the second branch and the third branch of the network, and finally carrying out full connection processing on the output of the three branches of the network and then carrying out the following processing:

sixth hidden layer:

the seventh layer

And obtaining a classification result.

7. The method for identifying the fatigue state of the driver based on the multi-scale convolution kernel size CNN as claimed in claim 1, wherein the specific operations of step S4 are as follows:

s41, adopting 11-fold cross validation

S42, calculating average accuracy: