CN107256393B

CN107256393B - Feature extraction and state recognition of one-dimensional physiological signals based on deep learning

Info

Publication number: CN107256393B
Application number: CN201710414832.1A
Authority: CN
Inventors: 张俊然; 杨豪
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2017-06-05
Filing date: 2017-06-05
Publication date: 2020-04-24
Anticipated expiration: 2037-06-05
Also published as: CN107256393A

Abstract

The invention discloses a feature extraction and state recognition method of a one-dimensional physiological signal based on deep learning. Establishing a DBN (direct binary digit network) model based on deep learning one-dimensional physiological signals, wherein the DBN model adopts a training process of 'pre-training + fine adjustment', and in the pre-training stage, firstly training a first RBM (radial basis function), then taking a trained node as the input of a second RBM, then training the second RBM, and so on; and after all RBM training is finished, fine tuning is carried out on the network by using a BP algorithm, finally, the feature vector output by the deep confidence network is input into a Softmax classifier, and the individual state of the incorporated one-dimensional physiological signal is judged. The invention effectively solves the problem of low classification precision caused by manually selecting feature input in the traditional one-dimensional physiological signal classification process, automatically obtains highly separable feature/feature combinations for classification through the nonlinear mapping of a deep confidence network, and can continuously optimize a network structure to obtain better classification effect.

Description

Feature extraction and state recognition of one-dimensional physiological signals based on deep learning

Technical Field

The invention relates to the technical field of medical data processing, in particular to a physiological signal feature extraction and classification identification method, and specifically relates to feature extraction and state identification of one-dimensional physiological signals based on deep learning.

Background

Physiological signals are dominated by an autonomic nervous system and an endocrine system, are not controlled by subjective consciousness, and can objectively and truly reflect physiological, mental and emotional states of individuals, so that the physiological signals are researched and applied more and more widely. Since physiological signals are external expressions of states such as physiology, spirit, and emotion of an individual and can directly and truly reflect changes in these states, many researchers have used different classifiers to identify the states of an individual based on physiological signals (electroencephalogram-EEG, electrocardiography, myoelectricity, respiration, and bioelectricity). Although the number of classifiers suitable for recognizing individual states of physiological signals is increasing at present, the recognition rate is also increasing, most classifiers need to manually extract features, and the recognition rate is related to manual experience, is unstable and has a certain distance from practical application. For example, Moghimi et al uses a linear discriminant analysis classifier to identify emotional states based on brain blood oxygen variation, and the identification rate is about 72 percent

In 2011, Li Shufang et al adopts an empirical mode decomposition EMD and a support vector machine SVM to classify epileptic state of an electroencephalogram signal, firstly, the EMD is decomposed by the empirical mode to divide the electroencephalogram EEG signal into a plurality of empirical mode components, then effective characteristics are extracted, then, the support vector machine SVM is used to classify the electroencephalogram EEG signal, and finally, the recognition rate of an intermittent period and a seizure period of epileptic seizure reaches 99%. In 2014, Niu et al performed feature selection by using a genetic algorithm and identified emotional states based on electrocardio, myoelectricity, respiration and electrodermal by using a K-nearest neighbor classifier, wherein the identification rate reaches 96%. However, the higher recognition rate obtained by the method depending on different method combinations or different signal combinations has the characteristic of high specificity, so that the method is difficult to popularize to the general situation, and the method which finds a certain combination mode has greater chance.

Since the development of deep learning in Image Net competition made by Hinton et al stand out by using convolutional neural network in 2012, the research of deep learning is pushed to climax, and therefore, the research in the field of signal and information processing has attracted much attention and application, and especially has achieved unprecedented effects in the directions of Image processing, voice recognition and the like. With the rapid development of deep learning, the deep learning is also primarily applied to the processing of physiological electrical signals such as electroencephalogram, myoelectricity, electrocardio, dermatophytosis and the like, and a surprising effect is achieved. Through continuous development, a large number of Deep learning frameworks (such as Deep TooLbox, Caffe, Deep learning, etc.) and models (such as Deep Belief Networks (DBN), sparse automatic encoders, recurrent neural networks, etc.) have appeared. However, how to utilize and improve these frameworks and models to adapt them to practical problems is a matter of current research.

Disclosure of Invention

The invention aims to provide a one-dimensional physiological signal feature extraction and state recognition method based on deep learning, aiming at solving the problem that the classification precision is not high due to the fact that feature input needs to be manually selected in the traditional one-dimensional physiological signal classification process, automatically obtaining highly separable features and feature combinations for classification through nonlinear mapping of a deep belief network, and continuously optimizing a network structure to obtain a better classification effect.

The basic idea of the invention is as follows: the method comprises the following steps of (1) extracting features of one-dimensional physiological signals based on deep learning and identifying a state recognition database network model DBN, wherein the model adopts a training process of pre-training and fine-tuning: the pre-training process adopts the unsupervised training from bottom to top, and the basic principle is that a first hidden layer is trained firstly, then a next hidden layer is trained layer by layer, the output of a node of the previous hidden layer is used as the input, and the output of the node of the current hidden layer is used as the input of the next hidden layer; the fine tuning process is to carry out supervision training from top to bottom on the data with the label, so that errors are propagated reversely, and model parameters are fine tuned, wherein an error reverse rebroadcasting algorithm BP algorithm is generally used in the fine tuning process. The training process of 'pre-training + fine-tuning' can be regarded as grouping a large number of parameters, finding local better setting for each group of parameters, then combining the local better solutions to find global optimal solution, and using different activation functions, CD algorithms and small-batch gradient descent algorithms to carry out iterative update technology of weights.

The DBN model is formed by stacking a plurality of RBMs. The training process of the DBN model is as follows: in the pre-training stage, firstly training a first RBM, then taking the trained nodes as the input of a second RBM, then training the second RBM, and so on; and after all RBM training is finished, fine tuning the network by using a BP algorithm. The DBN is stacked through a plurality of RBMs, and processing of the previous layer by each layer can be regarded as processing of input layer by layer, and input with an unfamiliar relationship between an initial value and an output category is converted into a more intimate representation with the category.

The purpose of the invention is achieved by the following steps:

establishing a one-dimensional physiological signal feature extraction and state recognition data analysis model DBN based on deep learning, wherein the DBN adopts a training process of pre-training and fine-tuning: the pre-training process adopts the unsupervised training from bottom to top, firstly trains a first hidden layer, then trains the next hidden layer by layer, takes the output of the node of the previous hidden layer as the input, and takes the output of the node of the current hidden layer as the input of the next hidden layer; the fine adjustment process is implemented by performing supervision training on labeled data from top to bottom, in the pre-training stage, a first RBM is trained firstly, then the trained nodes are used as the input of a second RBM, the second RBM is trained again, and the like; and after all RBM training is finished, fine tuning is carried out on the network by using a BP algorithm, finally, the feature vector output by the deep confidence network is input into a Softmax classifier, and the individual state of the incorporated one-dimensional physiological signal is judged.

The extraction and classification method comprises the following steps:

s1: bringing in one-dimensional physiological signals including one or more of electroencephalogram, electrocardio, myoelectricity, respiration and electrodermal, performing preprocessing operation and feature mapping operation on the signals, performing feature mapping in a standard space to obtain a feature mapping image in the standard space, wherein the preprocessing comprises denoising, filtering, hierarchical decomposition and reconstruction operation;

s2: constructing a deep confidence network DBN which comprises an input layer, a plurality of restricted Boltzmann machines RBM, a back propagation structure and a classifier, wherein the number of the restricted Boltzmann machines RBM serving as core structures of the whole network is 1-N, and the restricted Boltzmann machines RBM are nested in the structure; s3: performing feature extraction on the one-dimensional physiological signal subjected to preprocessing and feature mapping in the step S1 by using the deep confidence network constructed in the step S2, wherein the extraction process comprises RBM training and BP algorithm to perform fine adjustment on the network; the RBM training and BP algorithm comprises:

1) in RBM training and BP algorithm fine tuning, batch normalization processing is carried out before each layer is output;

2) a CD-k algorithm of k iterations is adopted in multiple iterations in Gibbs sampling;

3) selecting Dropout method to prevent overfitting in maximum likelihood estimation of converting to solve input sample using Gibbs sampling to fit input data to maximum possible;

4) in the process of fine adjustment of the BP algorithm to the network, when parameters are adjusted in the negative gradient direction of a target, a small batch gradient descent algorithm is adopted to carry out iterative update of weights on each group of small samples;

5) selecting a Sigmoid activation function in the process of forward propagation from bottom to top; selecting a ReLU activation function in a top-down back propagation;

s4: and inputting the feature vector output by the deep belief network in the step S3 into a Softmax classifier, and judging the individual state of the one-dimensional physiological signal included in the feature vector.

S31: in RBM training and BP algorithm fine tuning, batch normalization processing is carried out before each layer of output, a Z-score standardization method is selected for normalization processing, data are converted into normal distribution with a mean value of 0 and a standard deviation of 1 by respectively using Z-score for a training set and a test set, then the data are converted into a range of [0,1], the Z-score standardization method carries out normalization by using the mean value and the standard deviation of Yuan-Chi data, and the formula is as follows:

in the formula, u represents the average value of each dimension, sigma represents the standard deviation of each dimension, the processed data conforms to the standard positive-phase distribution with the average value of 0 and the standard deviation of 1);

s32: the CD-k algorithm, which uses k iterations for multiple iterations in Gibbs sampling, is:

for one input sample: v ═ v (v)₁,v₂,…,v_m) According to RBM, obtaining the output sample h (h) after the sample v is coded₁,h₂,…,h_n) The n-dimensional encoded output is understood to be the input sample with n features extracted:

1) inputting a training sample x₀Implicit layer number d, learning rate epsilon;

2) initializing a visual layer v₁＝x₀The weight w, the visible layer bias b and the hidden layer bias c are close to 0;

3)for g<s；

substituting the obtained result into the utilization formula

Calculating the distribution of visual layer reconstruction;

using formulas

Calculating the distribution of the hidden layer;

i and j represent the neuron node serial numbers of the hidden layer and the visible layer (i is less than or equal to n, and j is less than or equal to m);

substituting the obtained result into a formula

Obtaining the reconstructed hidden layer distribution;

according to the gradient descent algorithm, updating w, b, c: % rec denotes the modulus after reconstruction

△w＝ε(<v_ih_j>_data-<v_ih_j>_rec)

△b＝ε(<v_i>_data-<v_i>_rec)

△c＝ε(<h_j>_data-<h_j>_rec)

end for；

4) And outputting the updated w, b and c.

S33: selecting Dropout method to prevent overfitting in maximum likelihood estimation of transforming to solve input sample using Gibbs sampling to fit input data most likely, it is Dropout that prevents overfitting by changing the model itself; dropout randomly "deletes" nodes that are part of the hidden layer, the "deleted" nodes are only temporarily regarded as nonexistent, the parameters are not updated temporarily, but need to be retained, and the nodes may participate in training in the next iteration;

s34: in the process of fine-tuning the network by the BP algorithm, when parameters are adjusted in the negative gradient direction of a target, a small-batch gradient descent algorithm is adopted to carry out iterative update of weights on each group of small samples, and the steps are as follows:

1) randomly extracting a group of small samples from all input samples each time, wherein the number of samples contained in each group of small samples is Mini-batch;

2) carrying out iterative updating on the weight value of each group of small samples by adopting a batch gradient descent algorithm;

3) repeating the steps 1) and 2) for the following times: inputting the total number of samples/Mini-batch;

s35: when parameters are adjusted in the negative gradient direction of a target, a Sigmoid activation function is selected in the bottom-up forward propagation process;

the selection process is as follows: maximum likelihood estimation of input samples

Carrying out derivation on the parameters, solving a likelihood function to solve a maximum value, and continuously improving the target function by using a gradient increasing method until a stopping condition is reached; the process of maximizing the likelihood function obtains the probability that the jth visible layer node is activated (with the value of "1") and the probability that the ith hidden layer node is activated respectively as follows:

in the above formula, f is a Sigmoid activation function;

the Sigmoid activation function is defined as

Derivation of the Sigmoid function yields:

an activation function with a derivative of 0 is called a soft saturation activation function, while an activation function with a derivative of 0 when | x | is greater than a certain number is called a hard saturation activation function, i.e.:

the ReLU activation function is selected during the backward propagation from top to bottom, the ReLU (x) can generate a hard saturation phenomenon when x is less than 0, but when x is greater than 0, the derivative of the ReLU (x) is 1, and the gradient disappearance cannot occur, so that the gradient diffusion phenomenon is lighter and the convergence is faster in the backward propagation process, and the gradient disappearance phenomenon can be effectively relieved.

The ReLU function is defined as:

ReLU(x)＝max(0,x) (8)

the Dropout method is selected at S33 to prevent overfitting, before the Dropout method is used, the training procedure of the network is to propagate the input forward through the network, then propagate the error backward using the BP algorithm, after the Dropout method is used, the training procedure becomes:

1) randomly deleting part of hidden layer nodes in the network;

2) the input is propagated forwards through the residual nodes, and then the error is propagated reversely through the residual nodes by using a BP algorithm;

3) restoring the deleted nodes, wherein the parameters of the nodes which are deleted are not updated at the moment, and the parameters of the nodes which are not deleted are updated; and repeating the three steps until the iteration is completed.

Inputting the feature vector output by the deep confidence network into a Softmax classifier, wherein the parameter C is in the range [2 ]^-10,2¹⁰]And searching the optimal classification accuracy.

In using Gibbs sampling, the specific steps to extract input samples of n features are as follows: the process of maximizing the likelihood function obtains the probability that the jth visible layer node is activated (with the value of "1") and the probability that the ith hidden layer node is activated respectively as follows:

in the above formula, f is a Sigmoid activation function;

1) firstly, the probability p (h) that the ith node of the hidden layer is activated (taking the value as '1') is calculated by using the formula (4)_i＝1|v)；

2) The input data is then fitted according to Gibbs sampling to yield h ═ (h)₁,h₂,…,h_n) The specific process is as follows: generating a random number of 0-1, if the value of the random number is less than p (h)_i1| v), then h_iIs "1", otherwise is "0";

3) decoding the coded h obtained in the steps 1) and 2) to obtain the original input v', and similarly, firstly calculating p (v) by using the formula (4)_j1| h), obtaining the activated probability of the jth node of the visual layer;

4) generating a random number of 0-1 as in step 2), if the value of the random number is less than p (v)_j1| h), then v_jThe value of' is "1", otherwise "0";

5) substituting v 'obtained in the step 4) into a formula and calculating h' by Gibbs sampling in the same way as in the step 2);

6) finally, updating the weight, the visual layer bias and the hidden layer bias according to the formulas (9), (10) and (11), wherein η is the learning rate and represents the increasing or decreasing rate when the weight or the bias is updated;

Δw＝η(vh-v'h') (9)

Δb＝η(v-v') (10)

Δc＝η(h-h') (11)。

the invention has the positive effects that:

1. the problem that the classification precision is low due to the fact that manual feature selection input is needed in the traditional one-dimensional physiological signal classification process is effectively solved, highly separable feature/feature combinations are automatically obtained for classification through nonlinear mapping of a deep confidence network, and the network structure can be continuously optimized to obtain a better classification effect. The training process of 'pre-training + fine-tuning' can be regarded as grouping a large number of parameters, finding local better setting for each group of parameters, and then combining the local better solutions to find a global optimal solution, so that the technical scheme not only utilizes the freedom degree provided by a large number of parameters of the model, but also effectively saves the training overhead.

2. Gibbs sampling is a sampling method based on Markov Monte Carlo, each component of x is iteratively sampled by fully utilizing conditional probability distribution, and the conditional probability distribution converges to joint probability distribution at the speed of geometric progression of sampling times along with the increase of iteration times, so that the convergence time is shortened.

3. Before each layer is output, batch normalization processing is carried out, data are converted into normal distribution with the mean value of 0 and the standard deviation of 1 by the aid of Z-score for the training set and the test set respectively, and then the data are converted into the range of [0,1], so that generalization capability of the network is greatly improved, and training speed of the network is improved.

4. The optional activation functions of the present invention are: sigmoid, ReLU; the deep-information network related by the invention is divided into a forward propagation process and a backward propagation process, the forward propagation and the backward propagation can select the same activation function or different activation functions, and the deep-information network is suitable for various different physiological signal requirements.

5. For the problems that the Gibbs algorithm needs to be iterated for multiple times and the convergence rate is low, the invention can quickly calculate the expected value of the model by using the contrast divergence CD-k algorithm on the basis of the Gibbs algorithm, the estimation of the model is obtained through iteration for k times, and better approximation can be obtained when k takes a smaller value.

6. The invention adopts Dropout method to prevent overfitting, reduces overfitting on the whole and improves efficiency.

Drawings

FIG. 1 is a diagram of a DBN network model structure and training process of the present invention.

Fig. 2 is a diagram of a BP network architecture of the present invention.

Fig. 3 is a diagram of Sigmoid activation function.

Fig. 4 is a graph of the ReLU activation function.

Fig. 5 is a network structure diagram before and after Dropout, in which the network structure before Dropout is on the left; on the right is the network structure after Dropout.

FIG. 6-1 is a diagram of a confusion matrix of the recognition results of the SVM classifier.

Fig. 6-2 is a diagram of a classifier DBN recognition result confusion matrix.

FIG. 7 is a graph of the mean absolute values of the weights of the first layer of the DBN after training in an embodiment.

Detailed Description

The hardware and software environment used in the experiment of this example is shown in table 4-1:

TABLE 4-1

Data acquisition:

the experimental data is Emotion electroencephalogram database provided by Shanghai university of transportation (SJTU Emotion EEGDataset, SEED)^[This database contains three emotional data (positive, negative, neutral) based on the brain electrical signals. The data is collected in 15 testees, each experiment requires that each tester watches 15 movie fragments capable of inducing the three emotions, in the process that the testees watch the movie fragments, a 62-channel dry electrode electroencephalogram cap is used for collecting electroencephalogram signals of the testees, each tester obtains 15 groups of electroencephalogram signals in each experiment, each group of electroencephalogram signals is marked with labels (positive is +1, negative is-1 and neutral is 0) according to the description of the testees, and 5 groups of positive, 5 groups of negative and 5 groups of neutral are provided respectively. The above experiment was performed again at intervals of 7 days or more for each subject, and each subject participated in 3 experiments in total, so that 15 × 3 × 15 (675) groups of electroencephalogram data were obtained for 15 subjects, and the first 12 groups of data (including the first 12 groups of data) in one experiment are included herein4 groups of positive emotions, 4 groups of neutral emotions, 4 groups of negative emotions) as a training set, and the last 3 groups of data (including 1 group of positive emotions, 1 group of neutral emotions, 1 group of negative emotions) as a test set.

After the original data are collected, a data provider preprocesses the original EEG signals, and then signals of five frequency bands (Delta frequency band (1-3 Hz), Theta frequency band (4-7 Hz), Alpha frequency band (8-13 Hz), Beta frequency band (14-30 Hz) and Gamma frequency band (31-50 Hz)) of the EEG signals are obtained through filtering. And then on the basis of the five frequency bands, six feature transformation methods are used for carrying out feature extraction on the data under each frequency band, wherein the feature extraction methods are as follows: PSD, DE, ASM, DASM, RASM, DCAU, these six kinds of characteristic transformation have characteristics such as calculating simply, can effectively represent brain electrical signal. DE is expanded on the concept of Shannon entropy, can effectively test the complexity of continuous random variables, has more components of low-frequency energy in electroencephalogram signals, can effectively distinguish the low-frequency energy part from the high-frequency energy part in the electroencephalogram signals, and has 62 channels, so the sample dimension of DE is 62 multiplied by 5-310. Another study showed that the brain's asymmetric activity has a significant impact on emotional processing, and therefore DASM, RASM were extracted on the basis of DE as the differential and rational asymmetry between DE for 27 pairs of brain asymmetric electrodes, and combining DASM and RASM resulted in ASM. DCAU represents the difference in DE of 23 pairs of frontal and posterior brain electrodes. In addition to the DE feature transformation, PSD features are also extracted. Six feature transformations of PSD, DE, ASM, DASM, RASM and DCAU are provided, wherein the feature dimensions of each feature transformation sample are respectively as follows: 310. 310, 270, 135, 115.

The experimental process comprises the following steps:

the experiment is based on a DBN model of a DeepLearn Toolbox framework, and a batch normalization algorithm and a ReLU activation function are introduced on the basis. The CD-k algorithm is used for k iterations in multiple iterations in Gibbs sampling. Choosing Dropout method to prevent overfitting in maximum likelihood estimation that transitions to solving input samples using Gibbs sampling to most likely fit the input data; in the process of fine tuning the network by the BP algorithm, when parameters are adjusted in the negative gradient direction of a target, a small batch gradient descent algorithm is adopted to carry out iterative updating of the weight of each group of small samples. And adjusting various parameters of the DBN model through repeated experiments, determining the optimal DBN model, and comparing the optimal DBN model with the classification result of the SVM. The method analyzes the results of each experiment based on different testees, the characteristic transformation of different electroencephalogram signals and the identification of different frequency bands, and discusses the influence of iteration times, learning rate and hidden layer node number on the classification result.

As shown in fig. 1, which is a flowchart of training and classifying a DBN model used in the present invention, an original training set and a test set are normalized, and then the training set and the test set are brought into the model for training and classifying. FIG. 2 is a block diagram of a BP network. As shown in fig. 1 and 2, the training is mainly divided into two steps of pre-training and fine-tuning, then the adjusted and updated weight and bias are brought into a classifier for prediction classification, and finally the classification accuracy is calculated according to the difference between the prediction result and the actual result. The RBM training parameters are as follows: connection weight w between hidden layer and visual layer_ij(i-1, 2,3, …, n; j-1, 2,3, …, m), and a visible layer bias b- (b)₁,b₂,b₃,…,b_m) Hidden layer bias c ═ c₁,c₂,c₃,…,c_n)。

The training of the DBN is mainly a process of continuously adjusting the weights and the offsets, and what has the greatest influence on the weights and the offsets is the depth of the network, i.e., the number of hidden layers and the number of nodes of each hidden layer. When the number of hidden layers is smaller, the learning capability of the network is insufficient, only some shallow features can be learned, and when the number of hidden layers is reduced to 1, the network becomes an artificial neural network; theoretically speaking, the nature of input data can be abstracted more accurately by increasing the number of layers of the hidden layer, so that the classification effect is better, but more parameters can be brought to the whole model along with the increase of the number of layers, the training time is prolonged, the generalization capability of the DBN is reduced, and overfitting is caused. In this embodiment, 2 hidden layers are selected and used in combination with the actual situation of the original data, and 4 layers including an input layer and an output layer are added. Taking DE characteristics as an example, the node number of an input layer is 310, the node number of an output layer is 3, two hidden layers are included in the middle, and the node numbers of the hidden layers are respectively selected from the range of 50-500 and the range of 20-500.

When parameters are adjusted in the negative gradient direction of a target, a small-batch gradient descent algorithm is adopted to carry out iterative update of weights on each group of small samples, and the steps are as follows:

the specific steps of this embodiment are summarized as DBN and BP, and the specific steps are as follows:

1): initializing a DBN: the number of hidden layer layers, the number of hidden layer nodes, the number of iterations, the learning rate and the momentum, the number of samples Mini-batch contained in each group of small samples, namely m, are required to be evenly divided by the number of all input samples; connecting the weight w, the visible layer bias b and the hidden layer bias c to be 0;

2): for i < RBM of hidden layer number percentage;

3)：repeat；

4)：for j<(N1/Mini-batch1)；

5): training RBM, and according to formulas (12) - (14)

Δw'＝m×Δw+η(vh-v'h') (12)

Δb'＝m×Δb+η(v-v') (13)

Δc'＝m×Δc+η(h-h') (14)

Updating the connection weight w, the visible layer bias b and the hidden layer bias c;

6) the output of the current layer is calculated according to the formula (2-13) and is used as the input of the next hidden layer

7)：end for

8): the number of until cycles is equal to the number of iterations;

9)：end for

10): initializing BP: category number, activation function, learning rate, momentum; iteration times, classifier; initializing BP by using the obtained connection weight w, visible layer bias b and hidden layer bias c;

11)：repeat；

12)：for l<(N1/Mini-batch2)；

13): calculating the output of each hidden layer according to formula (18) and calculating the error e;

14): connecting weight w, visible layer bias b and hidden layer bias c according to the formula (26-28);

15)：end for

16): the number of until cycles is equal to the number of iterations;

17): connecting the test set with the weight w, the visible layer bias b and the hidden layer bias c; substituting the formula (18) to calculate a prediction label y';

18): calculating a real label y of each category;

19): and outputting the classification accuracy of each category.

As seen from the steps, the training is divided into two steps of pre-training steps 1) to 9) and fine-tuning steps 10) to 16), then the adjusted and updated weight and bias are brought into a classifier for prediction classification, and finally the classification accuracy is calculated according to the difference between the prediction result and the actual result. Since the invention introduces the batch normalization algorithm on the basis of the DeepLearn Toolbox framework, the batch normalization processing is carried out before each layer is output, and the batch normalization processing is carried out before the activation functions are brought in at the steps 6), 13) and 17).

Step 13) is shown in fig. 2: fig. 2 shows a BP network structure with d input nodes, q hidden layer nodes, and l output nodes, where input layer node x ═ x (x)₁,x₂,…,x_i,…,x_d) The hidden layer node b ═ b₁,b₂,…,b_h,…b_q) And the output node y is (y)₁,y₂,…,y_j,…,y_l)，θ_jThreshold, gamma, representing the jth node of the output layer_hThreshold, v, representing the h-th hidden layer node_ihRepresents the weight between the ith input layer node and the h hidden layer, w_hjRepresenting the weight between the h hidden layer node and the j output layer node.

The input of the h-th node of the hidden layer obtained by the input layer node and the weight is as follows:

the input of the jth node of the output layer obtained by the nodes of the hidden layer and the weight is as follows:

in the above formula b_hThe output of the h-th node of the hidden layer is represented, and the calculation formula can be obtained according to the formula:

if an input sample (x)_k,y_k) Output if BP network training

The calculation formula is as follows:

then the final mean square error E of the network_kComprises the following steps:

for the BP network structure shown in fig. 2, the parameters that need to be determined are common: (d + l +1) × q + l, each: the weight between d × q input layers and hidden layers, the weight between q × l hidden layers and output layers, the threshold of q hidden layer nodes, and the threshold of l output layer nodes. The BP algorithm is a continuous iterative updating process, and the above parameters can be updated according to the following formula (where v represents any one of the parameters):

v←v+Δv (20)

the BP algorithm adjusts the parameters in the negative gradient direction of the target based on the gradient descent, so when the learning rate η is given, the change amount of the weight is:

as can be seen from FIG. 2, w_hjIs first passed through the input value β of the jth node that affects the hidden layer output_jAgain affecting the output value of the j node

Finally, the mean square error E is influenced_kThus, the above formula can also be expressed as:

if the Sigmoid function is adopted as the activation function, then:

f'(x)＝f(x)(1-f(x)) (23)

obtaining a gradient term g of a j output layer node according to the formulas (2-23), (2-24) and (2-28)_jComprises the following steps:

similarly, the gradient term e of the h-th hidden layer node_hComprises the following steps:

therefore, by substituting the equations (2-22) and (2-29) into the equation (2-27), it is possible to obtainTo weight value w_hjThe update formula of (2):

Δw_hj＝ηg_jb_h(26)

likewise, θ can be obtained_j、v_ih、γ_hThe update formula of (2) is respectively:

Δθ_j＝-ηg_j(27)

Δv_ih＝ηe_hx_i(28)

Δγ_h＝-ηe_h(29)

see fig. 3 and 4.

The activation function is to add nonlinear factors in the learning process to solve the problem of inseparability of linearity, and the selectable activation functions of the invention are as follows: sigmoid, ReLU; the deep belief network related by the invention is divided into a forward propagation process from bottom to top and a backward propagation process from top to bottom, wherein the forward propagation and the backward propagation can select the same activating function or different activating functions.

Sigmoid is the most widely used activation function, defined as:

the functional curve is shown in fig. 3, and the Sigmoid function is derived as follows:

due to the soft saturation of Sigmoid, when the Sigmoid is transmitted in the backward direction, the gradient of downward conduction of Sigmoid includes a factor related to the derivative f '(x), and if the input falls into the soft saturation region, the value of f' (x) approaches to 0, so the gradient of downward transmission is very small, and the training effect of network parameters is not good, which is also an important reason for once hindering the development of the neural network. This phenomenon, also known as "gradient vanishing", typically occurs when the number of network layers is within 5. Although Sigmoid activation functions may exhibit the phenomenon of "gradient disappearance", there are some advantages: sigmoid is closest to the biological neuron model in physical sense, and Sigmoid compresses the input to the range of (0,1), which can be regarded as the normalization processing of the input, and also as the probability of classification (for example, the output of the activation function is 0.9, which can be interpreted as 90% probability of positive sample).

Compared with a Sigmoid Function, a corrected Linear Function (ReLU) can effectively alleviate the phenomenon of gradient disappearance, and the ReLU Function is defined as:

ReLU(x)＝max(0,x) (8)

as shown in FIG. 4, ReLU (x) shows hard saturation when x <0, but when x >0, the derivative of ReLU (x) is 1, so that the gradient dispersion is lighter and the convergence is faster in the backward propagation process.

The multiple iterations of this embodiment use the CD-k algorithm for k iterations:

for one input sample: v ═ v (v)₁,v₂,…,v_m) According to RBM, obtaining the output sample h (h) after the sample v is coded₁,h₂,…,h_n) The n-dimensional encoded output is understood to be the input sample from which n features have been extracted,

the method comprises the following steps:

3)for g<s; wherein g represents the g-th training process with the total number less than the number s of samples, and g is a positive integer to bring the obtained result into a formula

Calculating the distribution of visual layer reconstruction;

using formulas

Calculating the distribution of the hidden layer;

substituting the obtained result into the formula again

Obtaining the reconstructed hidden layer distribution;

△w＝ε(<v_ih_j>_data-<v_ih_j>_rec)

△b＝ε(<v_i>_data-<v_i>_rec)

△c＝ε(<h_j>_data-<h_j>_rec)

end for；

4) And outputting the updated w, b and c.

In the training process of the DBN, overfitting is probably caused by the reasons that the number of hidden layer layers is large, the number of nodes of the hidden layers is large, the sample data size is small and the like, and the overfitting can cause poor classification effect. The invention chooses Dropout method to prevent overfitting.

Dropout is also one of the regularization methods, implemented by changing the model itself to prevent overfitting. The idea of Dropout is: nodes that are part of the hidden layer are "deleted" randomly, e.g., 50%. The nodes that are "deleted" are only temporarily regarded as not existing, the parameters are not updated temporarily, but need to be retained, and the nodes may participate in training in the next iteration.

Before using Dropout H₁And H₂Weight W between₂Comprises the following steps:

W₂＝(w₁₁,w₁₂,w₁₃,w₁₄,w₂₁,w₂₂,w₂₃,w₂₄,w₃₁,w₃₂,w₃₃,w₃₄) (30)

if at H₁Followed by a node filter function m ═ 1,0,1]Then H₁Part of nodes are randomly deleted (the intermediate nodes are deleted) to obtain a new hidden layer H₁’：

As can be seen from the above equation, node h₁ ²Is randomly 'deleted', and node h is connected in the training process₁ ²Related parameter (w)₂₁,w₂₂,w₂₃,w₂₄) The parameters are not updated, but are not set to zero, and are not updated in the iteration process temporarily, if the node h in the next iteration₁ ²Without being "deleted," the parameters will continue to be updated.

Before using the Dropout method, the training process of the network is to propagate the input forward through the network, then to propagate the error backward using the BP algorithm, and after using the Dropout method, the training process becomes:

1) randomly deleting part of hidden layer nodes in the network;

The parameter settings of the unsupervised pre-training and supervised training in the DBN model of the deep leanron Toolbox framework in this embodiment are shown in table 4-2.

It can be seen from the table that the positive emotions have higher energy in the Gamma frequency band and the Beta frequency band than the negative emotions and the neutral emotions, the negative emotions and the neutral emotions have similar energy in the Gamma frequency band and the Beta frequency band, and the negative emotions have higher energy in the Alpha frequency band. These findings indicate that these three emotions have specific neural patterns in high frequency bands, which provides a basis for classification of subsequent emotions.

TABLE 4-2

The emotion recognition results and analysis of this example are as follows:

one very important problem in emotion recognition research based on electroencephalogram signals is that: whether the same emotion induced by the same subject at different time and in different states can be accurately and reliably recognized, so that the emotion data of each subject in three experiments is recognized. Taking the DE characteristics as an example, as shown in table 4-3, the recognition results of each experiment on 15 subjects using two classifiers, SVM and DBN, were obtained.

Tables 4 to 3

As can be seen from tables 4-3, although the collection device, the psychological condition of the subject, etc. may vary to different degrees during each experiment, each subject obtained similar accuracy (average value of standard deviation of 1.44%) in three experiments. Therefore, the experiment for recognizing the emotion based on the electroencephalogram signals is stable and repeatable, so that the emotion of the same subject at different times can be recognized by utilizing the electroencephalogram signals in practical application.

Meanwhile, the average accuracy rate of recognition by using the DBN is 89.12%, the standard deviation is 6.54%, the recognition effect is better than that of a data provider (the average recognition rate is 86.08%, and the standard deviation is 8.34%), the average recognition rate is improved by 3.04%, and the standard deviation is reduced by 1.80%.

In addition, it can be found from the table that the average classification accuracy of SVM is 84.2% and the standard deviation is 9.24%, while the average classification accuracy based on DBN is 89.12% and the standard deviation is 6.54%, and the classification effect of DBN is significantly better than that of SVM, with higher classification accuracy and better stability (higher average value and lower standard deviation).

As shown in fig. 6-1 and 6-2, a classification accuracy confusion matrix is obtained by identifying the data of one experiment of a human subject through two classifiers, namely a deep confidence network-DBN classifier and a support vector machine-SVM classifier. The rows in the figure represent the original classes of the samples, the columns represent the classes predicted by the classifier, the numbers (i, j) in the matrix represent the probability that class i is identified as class j, and the color bar on the right in the figure corresponds to the size of the probability. It can be seen that both positive and neutral emotions are well recognized by SVM and DBN; although the recognition effect of the negative emotions is not good in both SVM and DBN, the recognition effect of the negative emotions is high in the SVM (31% of the negative emotions are recognized as neutral and 24% of the negative emotions are recognized as positive), while the recognition effect of the negative emotions is significantly improved by the DBN (only 5% of the negative emotions are recognized as neutral and 9% of the negative emotions are recognized as positive).

The emotion recognition results based on different feature transformations are as follows:

in order to study the effect of six feature transformations PSD, DE, DASM, RASM, ASM, and DCAU on emotion recognition based on electroencephalogram signals, as shown in tables 4 to 4, the recognition results using different feature transformations in the full frequency band were obtained.

Tables 4 to 4

As can be seen from tables 4-4, compared to the traditionally used PSD features, the two classifiers, DBN and SVM, have the best recognition effect with DE features, with the highest mean and the lowest standard deviation. This is because the DE feature has a certain degree of balance with the high-frequency feature of the brain emotion, and the effect of the high-frequency feature becomes strong, and therefore, the DE feature is more suitable for emotion recognition based on electroencephalogram than the PSD feature. Meanwhile, four asymmetric features of DASM, RASM, ASM and DCAU also have higher accuracy in emotion recognition, and although the dimensions of DE features and PSD features are less compared with the four features (27 dimensions for DASM, 27 dimensions for RASM, 54 dimensions for ASM and 23 dimensions for DCAU), the accuracy equivalent to the DE features can be achieved, which shows that electroencephalogram signals have asymmetry when emotion is generated, and asymmetric activities of the brain are meaningful in emotion recognition. But subsequent experiments are also needed to further verify whether the accuracy of the four features DASM, RASM, ASM, DCAU is lower than that of the DE feature due to the feature dimension.

In order to further study the influence of the frequency band on emotion recognition based on electroencephalogram signals, as shown in tables 4-5, the DE characteristics are taken as an example, and the electroencephalogram signals under different frequency bands and full frequency bands are used for recognition results (%).

Tables 4 to 5

It can be found from tables 4 to 5 that the use of data of different frequency bands has different effects on emotion recognition, and the use of data of full frequency bands has the best effect. In the five frequency bands, the recognition rates of the Beta frequency band and the Gamma frequency band have higher average values and lower standard deviations compared with those of other three frequency bands, so that the Beta frequency band and the Gamma frequency band have key functions in emotion recognition.

See fig. 7.

The DBN combines feature extraction and feature selection, and can automatically select features useful for classification, while filtering out features irrelevant to classification. Fig. 7 is a distribution diagram of the average absolute value of the first hidden layer weights of the DBN after training, and it can be seen that the larger values of the weights after training are mainly distributed in the Beta frequency band and the Gamma frequency band. And the larger weight value indicates that the input connected with the weight value has larger contribution to the classification result finally output, which indicates that the Beta frequency band and the Gamma frequency band contain more information related to emotion. The Beta band and the Gamma band can be called as keys of emotion.

Claims

1. A feature extraction and state recognition of one-dimensional physiological signals based on deep learning is characterized in that:

establishing a one-dimensional physiological signal feature extraction and state recognition data analysis model DBN based on deep learning, wherein the DBN adopts a training process of pre-training and fine-tuning: the pre-training process adopts the unsupervised training from bottom to top, firstly trains a first hidden layer, then trains the next hidden layer by layer, takes the output of the node of the previous hidden layer as the input, and takes the output of the node of the current hidden layer as the input of the next hidden layer; the fine adjustment process is implemented by performing supervision training on labeled data from top to bottom, in the pre-training stage, a first RBM is trained firstly, then the trained nodes are used as the input of a second RBM, the second RBM is trained again, and the like; after all RBM training is finished, fine tuning is carried out on the network by using a BP algorithm, finally, the feature vector output by the deep confidence network is input into a Softmax classifier, and the individual state of the incorporated one-dimensional physiological signal is judged;

and (3) extracting and state identifying:

s2: constructing a deep confidence network DBN which comprises an input layer, a plurality of limited Boltzmann machines RBM and a back propagation structure and finally comprises a classifier, wherein the limited Boltzmann machines RBM are used as core structures of the whole network, have 1-N in number and are nested with each other structurally;

s3: performing feature extraction on the one-dimensional physiological signal subjected to preprocessing and feature mapping in the step S1 by using the deep confidence network constructed in the step S2, wherein the extraction process comprises RBM training and BP algorithm to perform fine adjustment on the network; the RBM training and BP algorithm comprises:

2. The feature extraction and state recognition of one-dimensional physiological signals based on deep learning of claim 1, wherein:

s31: in RBM training and BP algorithm fine tuning, batch normalization processing is carried out before each layer of output, a Z-score normalization method is selected for normalization processing, data are converted into normal distribution with the mean value of 0 and the standard deviation of 1 by respectively using Z-score for a training set and a test set, then the data are converted into the range of [0,1], the Z-score normalization method carries out normalization by using the mean value and the standard deviation of arithmetic data, and the formula is as follows:

in the formula, u represents the average value of each dimension, sigma represents the standard deviation of each dimension, and the processed data conform to the standard normal distribution with the average value of 0 and the standard deviation of 1;

1) inputting a training sample x₀Implicit layer number k, learning rate epsilon;

3)for g<s；

substituting the obtained result into the utilization formula

Calculating the distribution of visual layer reconstruction;

using formulas

Calculating the distribution of the hidden layer;

will obtainThe result is again substituted into the formula

Obtaining the reconstructed hidden layer distribution;

△w＝ε(<v_ih_j>_data-<v_ih_j>_rec)

△b＝ε(<v_i>_data-<v_i>_rec)

△c＝ε(<h_j>_data-<h_j>_rec)

end for；

4) Outputting updated w, b and c;

in the above formula, f is a Sigmoid activation function;

the Sigmoid activation function is defined as

Derivation of the Sigmoid function yields:

the ReLU activation function is selected in the top-down backward propagation, ReLU (x) shows hard saturation when x <0, but when x >0, the derivative of ReLU (x) is 1, no gradient vanishing occurs,

the ReLU function is defined as:

ReLU(x)＝max(0,x) (8) 。

3. the feature extraction and state recognition of one-dimensional physiological signals based on deep learning of claim 2, wherein:

1) randomly deleting part of hidden layer nodes in the network;

4. The feature extraction and state recognition of one-dimensional physiological signals based on deep learning of claim 1, wherein: inputting the characteristic vector output by the deep confidence network into a Softmax classifier, wherein the parameter hidden layer bias C is in the range [2 ]^-10,2¹⁰]And searching the optimal classification accuracy.

5. The feature extraction and state recognition of one-dimensional physiological signals based on deep learning of claim 1, wherein: in using Gibbs sampling, the specific steps to extract input samples of n features are as follows: the process of solving the maximum value of the likelihood function obtains that the jth visible layer node is activated, and the probability that the jth visible layer node is 1 and the probability that the ith hidden layer node is activated are respectively as follows:

in the above formula, f is a Sigmoid activation function;

3) decoding the coded h obtained in the steps 1) and 2) to obtain the original input v', and similarly, firstly calculating p (v) by using a formula_j1| h), obtaining the activated probability of the jth node of the visual layer;

5) substituting the v 'obtained in the step 4) into a formula (3) and obtaining h' by Gibbs sampling calculation in the same way as in the step 2);

Δw＝η(vh-v'h') (9)

Δb＝η(v-v') (10)

Δc＝η(h-h') (11)。