CN111310783B

CN111310783B - Speech state detection method based on electroencephalogram micro-state features and neural network model

Info

Publication number: CN111310783B
Application number: CN202010007821.3A
Authority: CN
Inventors: 司霄鹏; 韩顺利; 明东; 张行健; 周煜; 李思成; 向绍鑫; 孙宇林; 于佳悦
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-01-05
Filing date: 2020-01-05
Publication date: 2022-08-30
Anticipated expiration: 2040-01-05
Also published as: CN111310783A

Abstract

A speech state detection method based on electroencephalogram micro-state features and a neural network model comprises the following steps: respectively constructing an improved GoogleLeNet neural network model, acquiring multi-channel electroencephalogram signals of a normal subject in the states of listening, speaking and imagining speaking, respectively extracting micro-state time sequence features in a set time window, and adding corresponding labels to the micro-state time sequence features; training an improved GoogLeNet neural network model by using the micro-state time sequence features with labels; and then acquiring multi-channel electroencephalogram signals of a normal subject in a speech state in real time, extracting the micro-state time sequence characteristics in a set time window, and sending the micro-state time sequence characteristics into a trained improved GoogLeNet neural network model, thereby realizing speech state detection. The invention innovatively applies the electroencephalogram micro-state time sequence characteristics to the neural network model to achieve the aim of speech detection, and can effectively improve the accuracy of speech state classification.

Description

Speech state detection method based on electroencephalogram micro-state features and neural network model

Technical Field

The invention relates to a speech state detection method. In particular to a speech state detection method based on electroencephalogram micro-state characteristics and a neural network model.

Background

The human brain is the most complex system in the human body, various neural networks of the brain regulate various physiological activities of the human body, and the realization of one task generally requires the mutual correlation and coordination among various brain areas. The human brain is only explored in one corner of iceberg, and the brain function far beyond the cognitive range of people is not known yet. Therefore, with the development of scientific technology, it is undoubtedly a very meaningful matter to explore brain functions.

Many studies have shown that the features of the electroencephalographic micro-state time-series are different in different behavioral states. Microstations, also known as "atoms of thought", are mentioned in the document EEG microstations a tool for interpreting the temporal dynamics of the temporal neural networks a review, that the networks activated in a particular microstate represent different states of consciousness, and that each microstate is associated with a different class of mental states constituting states of consciousness. The time series characteristics of the micro-states include the frequency of occurrence, duration and switching pattern of the micro-states. In task-oriented brain activities, there is a link between the occurrence of micro-states and specific functions of information processing. Therefore, the brain electrical microstate can be used as an objective physiological index, and a novel method is provided for detecting the brain function state.

Speech is the fundamental way that humans communicate. Speech states are divided into listening, speaking and imagination. At present, a large number of language-handicapped patients still exist in the world, and the language-handicapped patients hardly can normally communicate with the outside world, which brings great difficulty to the lives of the language-handicapped patients. With the rapid development of brain-computer interfaces in recent years, people begin to find a method which can decode the speech state of a person with speech disorder by means of brain science and neural engineering, so that the speech state of the patient can be recognized, and the method is further applied to brain-computer interaction. In the document Microstates in Language-Related Brain positional Maps Show non-Verb Differences, where linguistic processing involves multiple fundamental regions of the Brain and widely distributed neurons, in a simple word reading paradigm where the Brain processes nouns and verbs through different neural groups, the subject's average position of the microstate topographic map center when seeing the nouns and verbs varies.

With the rise of deep learning, medical workers also begin to explore an efficient human body physiological index detection method. And compared with the traditional machine learning, the deep learning saves more time, and because of the weight sharing, the deep learning is more advantageous in the aspects of the accuracy and the efficiency of the model. Deep learning has also been a good breakthrough in medical image classification tasks in recent years, and more people are beginning to apply deep learning to medical detection. The google lenet neural network invented by google corporation has been receiving wide attention from researchers, and the residual block network has been considered as an effective method for preventing the disappearance of the gradient and the explosion of the gradient.

Disclosure of Invention

The invention aims to solve the technical problem of providing a speech state detection method based on electroencephalogram micro-state features and a neural network model, which deepens the width and depth of the model, avoids the over-fitting phenomenon and effectively improves the classification detection accuracy.

The technical scheme adopted by the invention is as follows: a speech state detection method based on electroencephalogram micro-state features and a neural network model comprises the following steps: respectively constructing an improved GoogleLeNet neural network model, acquiring multi-channel electroencephalogram signals of a normal subject in the states of listening, speaking and imagining speaking, respectively extracting micro-state time sequence features in a set time window, and adding corresponding labels to the micro-state time sequence features; training an improved GoogLeNet neural network model by using the micro-state time sequence features with labels; and then acquiring multi-channel electroencephalogram signals of a normal subject in a speech state in real time, extracting the micro-state time sequence characteristics in a set time window, and sending the micro-state time sequence characteristics into a trained improved GoogLeNet neural network model, thereby realizing speech state detection.

The speech state is one of an listening state, a speaking state and an imagination and speaking state.

The construction of the improved GoogLeNet neural network model is characterized in that a residual block network is added on the basis of the GoogLeNet neural network model, and the construction method specifically comprises the following steps:

from the input layer to the first three groups of parallel structures, and then from the three groups of parallel structures to the first channel merging layer, the first three groups of parallel structures are respectively: a 1 × 1 convolutional layer and a pooling layer connected; a 3 × 3 convolutional layer and a pooling layer connected; connecting the 5 × 5 convolutional layer and the pooling layer; the number of convolution kernels of the convolution layers is 32;

and uniformly outputting the data from the first channel merging layer to the second three groups of parallel structures and then from the second three groups of parallel structures to the second channel merging layer, wherein the second three groups of parallel structures are respectively as follows: a 1 × 1 convolutional layer and a pooling layer connected; five 3 x 3 convolutional layers and one pooling layer which are connected in sequence; five 5 x 5 convolutional layers and one pooling layer which are connected in sequence; the number of convolution kernels of the convolution layer is also 32; the connection of the residual error network is respectively as follows: the input of the first 3 x 3 convolutional layer is used as the input of the first residual error network, and the output of the second 3 x 3 convolutional layer is used as the output of the first residual error network; the input of the fourth 3 × 3 convolutional layer is used as the input of the second residual error network, and the output of the fifth 3 × 3 convolutional layer is used as the output of the second residual error network; the input of the first 5 x 5 convolutional layer is used as the input of the third residual error network, and the output of the second 5 x 5 convolutional layer is used as the output of the third residual error network; the input of the fourth 5 × 5 convolutional layer is taken as the input of the fourth residual network, and the output of the fifth 5 × 5 convolutional layer is taken as the output of the fourth residual network;

and the third three groups of parallel structures are uniformly output to the third channel merging layer from the second channel merging layer to the third three groups of parallel structures, wherein the third three groups of parallel structures are respectively as follows: a 1 × 1 convolutional layer and a pooling layer connected; five 3 x 3 convolutional layers and one pooling layer which are connected in sequence; five 5 x 5 convolutional layers and one pooling layer which are connected in sequence; the number of convolution kernels of the convolution layers is also 64; the connection of the residual error network is respectively as follows: the input of the first 3 × 3 convolutional layer is used as the input of the fifth residual error network, and the output of the second 3 × 3 convolutional layer is used as the output of the fifth residual error network; the input of the fourth 3 × 3 convolutional layer is used as the input of the sixth residual error network, and the output of the fifth 3 × 3 convolutional layer is used as the output of the sixth residual error network; the input of the first 5 × 5 convolutional layer is used as the input of the seventh residual error network, and the output of the second 5 × 5 convolutional layer is used as the output of the seventh residual error network; the input of the fourth 5 × 5 convolutional layer is used as the input of the eighth residual network, and the output of the fifth 5 × 5 convolutional layer is used as the output of the eighth residual network;

and uniformly outputting the three groups of parallel structures from the third channel merging layer to the fourth three groups of parallel structures and then outputting the four groups of parallel structures to the fourth channel merging layer, wherein the fourth three groups of parallel structures are respectively as follows: a 1 × 1 convolutional layer and a pooling layer connected; five 3 x 3 convolutional layers and one pooling layer which are connected in sequence; five 5 x 5 convolutional layers and one pooling layer which are connected in sequence; the number of convolution kernels of the convolution layer is also 128;

the connection of the residual error network is respectively as follows: the input of the first 3 × 3 convolutional layer is used as the input of the ninth residual error network, and the output of the second 3 × 3 convolutional layer is used as the output of the ninth residual error network; the input of the fourth 3 × 3 convolutional layer is taken as the input of the tenth residual network, and the output of the fifth 3 × 3 convolutional layer is taken as the output of the tenth residual network; the input of the first 5 × 5 convolutional layer is taken as the input of the eleventh residual network, and the output of the second 5 × 5 convolutional layer is taken as the output of the eleventh residual network; the input of the fourth 5 × 5 convolutional layer is taken as the input of the twelfth residual network, and the output of the fifth 5 × 5 convolutional layer is taken as the output of the twelfth residual network;

and finally, outputting the data to an average pooling layer and an output layer in sequence.

The set time window is 2s, namely, the time sequence of each 2s micro-state is taken as an input characteristic, and the time lasts for 80-120ms before one micro-state is converted into another micro-state.

The extraction of the micro-state time series characteristics in the set time window comprises the following steps:

(1) calculating the multichannel electroencephalogram signals through the following formula to obtain a global field power curve:

wherein, V _i (t) denotes the electrode voltage vector at time t, V _mean (t) represents the average value of the instantaneous potential between the electrodes, K represents the number of electrodes, and GFP represents the totalLocal field power curve;

then drawing the potential of the global field power curve at the moment of the local maximum value to generate a topographic map of the electrode array;

(2) submitting a topographic map corresponding to the local maximum time of the global field power curve to a K-means clustering algorithm, and dividing the topographic map into four types of micro-state maps through the algorithm;

(3) and (4) carrying out time sequence arrangement on the four types of micro states according to the sequence of the peak values of the global field power curve to obtain input characteristics.

The micro-state time sequence features are added with corresponding labels, and are divided into three types of labels of listening, speaking and imagination according to different speech states.

The method for training the improved GoogLeNet neural network model by using the micro-state time sequence features with the labels comprises the following steps:

(1) dividing the micro-state time sequence features with the labels into a training set, a verification set and a test set according to the ratio of 8:1:1, wherein the labels in the test set are removed;

(2) inputting the training set into an input layer of an improved GoogLeNet neural network model for training, and carrying out forward propagation; transforming layer by layer and transmitting to an output layer of a residual error neural network;

(3) and (3) detecting the effectiveness of the improved GoogLeNet neural network model by using a cross entropy loss function, adjusting parameters of the improved GoogLeNet neural network model obtained in the step (2) by using an Adam optimizer and a back propagation mode to update parameters and weights of each layer, verifying the adjusted improved GoogLeNet neural network model by using a verification set, tuning until the accuracy of the improved GoogLeNet neural network model is unchanged, stopping training, and testing the tuned improved GoogLeNet neural network model by using a test set, thereby obtaining the trained improved GoogLeNet neural network model.

According to the speech state detection method based on the electroencephalogram micro-state characteristics and the neural network model, the improved GoogleLeNet neural network is added with the residual block on the basis of GoogleLeNet, internal resources of a computer are fully utilized by the network, the width of the network is increased, and the overfitting resistance effect is achieved. The invention innovatively applies the electroencephalogram micro-state time sequence characteristics to the neural network model to achieve the aim of speech detection, and can effectively improve the accuracy of speech state classification.

Drawings

FIG. 1 is a block diagram of a speech state detection method based on electroencephalogram micro-state features and a neural network model according to the present invention;

FIG. 2 is a flow chart of a speech state detection method based on electroencephalogram micro-state features and a neural network model according to the present invention;

FIG. 3 is a flow chart of the micro-state time series feature acquisition of the present invention;

fig. 4 is a schematic diagram of a residual block according to an embodiment of the present invention.

Detailed Description

The speech state detection method based on the electroencephalogram micro-state features and the neural network model of the invention is explained in detail below with reference to the embodiments and the accompanying drawings.

The invention discloses a speech state detection method based on electroencephalogram micro-state characteristics and a neural network model, which comprises the following steps of:

1) respectively constructing an improved GoogLeNet neural network model, acquiring multi-channel electroencephalogram signals of a normal testee in the states of listening, speaking and imagining speaking, respectively extracting micro-state time sequence characteristics in a set time window, and adding corresponding labels to the micro-state time sequence characteristics, specifically dividing the micro-state time sequence characteristics into three types of labels of listening, speaking and imagining speaking according to different speech states; wherein,

and the third three groups of parallel structures are uniformly output to the third channel merging layer from the second channel merging layer to the third three groups of parallel structures, wherein the third three groups of parallel structures are respectively as follows: a 1 × 1 convolutional layer and a pooling layer connected; five 3 x 3 convolutional layers and one pooling layer which are connected in sequence; five 5 x 5 convolutional layers and one pooling layer which are connected in sequence; the number of convolution kernels of the convolution layer is also 64; the connection of the residual error network is respectively as follows: the input of the first 3 × 3 convolutional layer is used as the input of the fifth residual error network, and the output of the second 3 × 3 convolutional layer is used as the output of the fifth residual error network; the input of the fourth 3 × 3 convolutional layer is used as the input of the sixth residual network, and the output of the fifth 3 × 3 convolutional layer is used as the output of the sixth residual network; the input of the first 5 × 5 convolutional layer is used as the input of the seventh residual error network, and the output of the second 5 × 5 convolutional layer is used as the output of the seventh residual error network; the input of the fourth 5 × 5 convolutional layer is used as the input of the eighth residual network, and the output of the fifth 5 × 5 convolutional layer is used as the output of the eighth residual network;

and finally, outputting the data to the average pooling layer and the output layer in sequence.

2) Training an improved GoogLeNet neural network model with labeled micro-state time series features, comprising:

(1) In the present example, the cross entropy loss function used is:

where m denotes the number of classifications, y denotes the correct label value, y ^l Representing the real output.

(2) The Adam algorithm mode in the embodiment of the invention is as follows:

s←ρ ₁ s+(1-ρ ₁ )g

γ←ρ ₂ γ+(1-ρ ₂ )g⊙g

where ρ is ₁ 、ρ ₂ Are all constants (default value is ρ) ₁ ＝0.9、ρ ₂ 0.999), g represents the first order gradient of the loss function, s represents the biased first order moment estimate, and γ represents the biased second order moment estimate;

wherein ρ ₁ 、ρ ₂ Are all constants (default value is ρ) ₁ ＝0.9、ρ ₂ 0.999), s stands for biased first order moment estimate, γ stands for biased second order moment estimate, s ^l Representing deviations of the corrected first-order matrix, gamma ^l Representing the deviation of the modified second moment.

Wherein, both epsilon and delta are constants (default values are epsilon 0.001 and delta 10) ^-8 ) Gamma stands for the biased second moment estimate, s ^l RepresentAnd correcting the deviation of the first-order matrix, wherein delta theta represents the change of the parameter theta.

3) And acquiring a multi-channel electroencephalogram signal of a normal testee in a speech state in real time, wherein the speech state is one of an auditory state, a speaking state and a fictitious speaking state, extracting a micro-state time sequence characteristic in a set time window, and sending the micro-state time sequence characteristic into a trained improved GoogLeNet neural network model, so that speech state detection is realized.

The time window set in step 1) and step 3) of the present invention is 2s, that is, every 2s of the micro-state time sequence is used as an input feature, and the time duration lasts 80-120ms before one micro-state is switched to another.

The extraction of the micro-state time series characteristics in the set time window in the steps 1) and 3) of the invention comprises the following steps:

(1) calculating the multichannel electroencephalogram signals by the following formula to obtain a Global Field Power (GFP) curve:

wherein, V _i (t) denotes the electrode voltage vector at time t, V _mean (t) represents the average value of the instantaneous potential between the electrodes, K represents the number of electrodes, and GFP represents the global field power curve;

(3) and (4) carrying out time sequence arrangement on the four types of micro-states according to the sequence of the peak value of the global field power curve to obtain input characteristics.

Claims

1. A speech state detection method based on electroencephalogram micro-state features and a neural network model is characterized by comprising the following steps: respectively constructing an improved GoogleLeNet neural network model, acquiring multi-channel electroencephalogram signals of a normal subject in the states of listening, speaking and imagining speaking, respectively extracting micro-state time sequence features in a set time window, and adding corresponding labels to the micro-state time sequence features; training an improved GoogLeNet neural network model by using the micro-state time sequence features with labels; acquiring multi-channel electroencephalogram signals of a normal testee in a speech state in real time, extracting micro-state time sequence characteristics in a set time window, and sending the micro-state time sequence characteristics into a trained improved GoogLeNet neural network model, so that speech state detection is realized;

and uniformly outputting the data from the first channel merging layer to the second three groups of parallel structures and from the second three groups of parallel structures to the second channel merging layer, wherein the second three groups of parallel structures are respectively as follows: a 1 × 1 convolutional layer and a pooling layer connected; five 3 x 3 convolutional layers and one pooling layer which are connected in sequence; five 5 x 5 convolutional layers and one pooling layer which are connected in sequence; the number of convolution kernels of each convolution layer is also 32; the connection of the residual error network is respectively as follows: the input of the first 3 x 3 convolutional layer is used as the input of the first residual error network, and the output of the second 3 x 3 convolutional layer is used as the output of the first residual error network; the input of the fourth 3 × 3 convolutional layer is used as the input of the second residual error network, and the output of the fifth 3 × 3 convolutional layer is used as the output of the second residual error network; the input of the first 5 x 5 convolutional layer is used as the input of the third residual error network, and the output of the second 5 x 5 convolutional layer is used as the output of the third residual error network; the input of the fourth 5 × 5 convolutional layer is taken as the input of the fourth residual network, and the output of the fifth 5 × 5 convolutional layer is taken as the output of the fourth residual network;

and the third three groups of parallel structures are uniformly output to the third channel merging layer from the second channel merging layer to the third three groups of parallel structures, wherein the third three groups of parallel structures are respectively as follows: a 1 × 1 convolutional layer and a pooling layer connected; five 3 x 3 convolutional layers and one pooling layer which are connected in sequence; five 5 x 5 convolutional layers and one pooling layer which are connected in sequence; the number of convolution kernels of the convolution layer is also 64; the connection of the residual error network is respectively as follows: the input of the first 3 × 3 convolutional layer is used as the input of the fifth residual error network, and the output of the second 3 × 3 convolutional layer is used as the output of the fifth residual error network; the input of the fourth 3 × 3 convolutional layer is used as the input of the sixth residual error network, and the output of the fifth 3 × 3 convolutional layer is used as the output of the sixth residual error network; the input of the first 5 × 5 convolutional layer is used as the input of the seventh residual error network, and the output of the second 5 × 5 convolutional layer is used as the output of the seventh residual error network; the input of the fourth 5 × 5 convolutional layer is used as the input of the eighth residual network, and the output of the fifth 5 × 5 convolutional layer is used as the output of the eighth residual network;

the connection of the residual error network is respectively as follows: the input of the first 3 × 3 convolutional layer is used as the input of the ninth residual error network, and the output of the second 3 × 3 convolutional layer is used as the output of the ninth residual error network; the input of the fourth 3 × 3 convolutional layer is used as the input of the tenth residual network, and the output of the fifth 3 × 3 convolutional layer is used as the output of the tenth residual network; the input of the first 5 × 5 convolutional layer is taken as the input of the eleventh residual error network, and the output of the second 5 × 5 convolutional layer is taken as the output of the eleventh residual error network; the input of the fourth 5 × 5 convolutional layer is taken as the input of the twelfth residual network, and the output of the fifth 5 × 5 convolutional layer is taken as the output of the twelfth residual network;

2. The speech state detection method based on the electroencephalogram micro-state features and the neural network model as claimed in claim 1, wherein the speech state is one of an auditory state, a speaking state and an imaginary speaking state.

3. The method for detecting the speech state based on the EEG micro-state characteristics and the neural network model according to claim 1, wherein the set time window is 2s, that is, the micro-state time sequence of every 2s is used as an input characteristic, and the time period lasts 80-120ms before one micro-state is converted into another micro-state.

4. The method for detecting the speech state based on the electroencephalogram micro-state features and the neural network model according to claim 1, wherein the step of extracting the micro-state time series features in the set time window comprises the following steps:

(1) calculating the multichannel electroencephalogram signals to obtain a global field power curve through the following formula:

5. The method for detecting the speech state based on the electroencephalogram micro-state features and the neural network model according to claim 1, wherein the micro-state time series features are added with corresponding labels, and the micro-state time series features are classified into three types of labels of listening, speaking and imagination according to different speech states.

6. The method for detecting the speech state based on the electroencephalogram micro-state features and the neural network model according to claim 1, wherein the training of the improved GoogleLeNet neural network model by using the micro-state time series features with the labels comprises the following steps: