CN107256393A

CN107256393A - The feature extraction and state recognition of one-dimensional physiological signal based on deep learning

Info

Publication number: CN107256393A
Application number: CN201710414832.1A
Authority: CN
Inventors: 张俊然; 杨豪
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2017-06-05
Filing date: 2017-06-05
Publication date: 2017-10-17
Anticipated expiration: 2037-06-05
Also published as: CN107256393B

Abstract

The invention discloses a kind of feature extraction of one-dimensional physiological signal based on deep learning and state identification method.Set up feature extraction and the state recognition analysis model DBN of the one-dimensional physiological signal based on deep learning, DBN model uses " pre-training+fine setting " training process, in the pre-training stage, first RBM is trained first, then using the node trained as second RBM input, second RBM of retraining, by that analogy；All RBM training are finely adjusted after completing using BP algorithm to network, and the characteristic vector for finally exporting deep belief network is inputted in Softmax graders, and the individual state of the one-dimensional physiological signal to bringing into is judged.The present invention, which is efficiently solved, needs the input of artificial selection feature to cause the problem of nicety of grading is not high in traditional one-dimensional physiological signal assorting process, pass through the Nonlinear Mapping of deep belief network, feature/combinations of features that automatically deriving height can divide obtains more preferable classifying quality for classifying, and can continue to optimize network structure.

Description

The feature extraction and state recognition of one-dimensional physiological signal based on deep learning

Technical field

The present invention relates to medical data processing technology field, more particularly to a kind of physiological signal feature extraction and Classification and Identification Method, is specifically the feature extraction and state recognition of the one-dimensional physiological signal based on deep learning.

Background technology

Physiological signal is dominated by autonomic nerves system and internal system, not by subjective consciousness control, can be objective, truly The physiology of reaction individual, spirit, emotional state, therefore more and more extensive research has been obtained with applying.Physiological signal is individual The state such as physiology, spirit, mood external manifestation, can directly, truly react the change of these states, therefore, have Many researchers are using different graders to based on physiological signal (brain electricity-EEG, electrocardio, myoelectricity, breathing, skin electricity etc.) Body state is identified.Although the grader recognized at present suitable for physiological signal individual state is being on the increase, discrimination Improving constantly, but most of grader need it is artificial extract feature, the height of discrimination is relevant and unstable with artificial experience, There is a certain distance from practical application.Such as Moghimi et al. using linear discriminant analysis grader to being changed based on brain blood oxygen The emotional state of amount is identified, and discrimination is 72% or so

2011, Li Shufang et al. was carried out insane using empirical mode decomposition EMD and support vector machines for EEG signals Epilepsy state classification, is divided into multiple empirical mode components, then extracting has with empirical mode decomposition EMD by the electric EEG signal of brain first Feature is imitated, then the electric EEG signal of brain is classified with support vector machines, finally to epileptic attack intermittent phase and stage of attack Discrimination reaches 99%.2014, Niu et al. profit genetic algorithms carried out feature selecting and using k nearest neighbor grader to based on the heart Electricity, myoelectricity, breathing, the emotional state of skin electricity are identified, and discrimination is up to 96%.But it is this dependent on distinct methods combination or The characteristics of higher discrimination that the method for unlike signal combination is obtained has high degree of specificity, it is difficult to ordinary circumstance is generalized to, And certain combination is found in itself also with larger contingency.

Since Hinton in 2012 et al. is shown one's talent using convolutional neural networks in Image Net matches, by depth Climax has been shifted in the research of study onto, causes many concerns and application from this in the research of signal and field of information processing, especially It in directions such as image procossing, speech recognitions is even more to achieve unprecedented effect to be.It is deep with the fast development of deep learning Degree study has also obtained preliminary application in terms of the electro-physiological signals processing such as brain electricity, myoelectricity, electrocardio, skin electricity, and achieves order The pleasantly surprised effect of people.By constantly development, substantial amounts of deep learning framework has been occurred in that (such as：Deeplearn TooLbox, Caffe, DeepLearning etc.) and model is (such as：It is depth belief network (Deep Belief Net, DBN), sparse Autocoder, Recognition with Recurrent Neural Network etc.).But, these frameworks and model how are utilized and improved, is allowed to be applied to actually ask Topic, is the content for being currently needed for research.

The content of the invention

For the problems such as traditional shallow-layer grader needs artificial extraction feature, recognition effect is unstable, it is contemplated that carrying For a kind of one-dimensional physiological signal feature extraction based on deep learning and state identification method, traditional one-dimensional physiology is effectively solved The input of artificial selection feature is needed to cause the problem of nicety of grading is not high during Modulation recognition, it is non-by deep belief network Linear Mapping, automatically derives the feature that can highly divide and combinations of features and is obtained more for classifying, and continuing to optimize network structure Good classifying quality.

The present invention basic ideas be：The feature extraction of one-dimensional physiological signal based on deep learning and state identification data Storehouse network model DBN, model uses " pre-training+fine setting " training process：Pre-training process is unsupervised using from bottom to top Training, basic principle is first hidden layer of training first, then successively trains next hidden layer, and by a upper hidden layer The output of node as input, using the output of this hidden layer node as next hidden layer input；Trim process is to pass through Top-down supervised training is carried out to tape label data so that error back propagation, model parameter is finely adjusted, finely tuned Journey typically uses error reversely to relay algorithm BP algorithm.The training process of " pre-training+fine setting " can be regarded as will be substantial amounts of Parameter is grouped, be each group of parameter look for it is local seem preferably setting, then these local more excellent solutions are combined again Get up to find global optimal solution, changing for weights is carried out using different activation primitives, CD algorithms, small lot gradient descent algorithm The free degree that model quantity of parameters is provided both had been make use of for update method technical schemes, and has effectively saved training Expense.

DBN model is stacked by multiple RBM and formed.The training process of DBN model is：In the pre-training stage, is trained first One RBM, then using the node trained as second RBM input, second RBM of retraining, by that analogy；It is all RBM training is finely adjusted after completing using BP algorithm to network.DBN is stacked by multiple RBM, every layer to last layer at Reason can regard the successively processing to input as, be changed into and class not close input is contacted between initial value and output classification It is not closer to represent.

What the purpose of the present invention was achieved in that：

Set up feature extraction and state identification data the analysis model DBN, DBN of the one-dimensional physiological signal based on deep learning Model uses " pre-training+fine setting " training process：Pre-training process trains first first using unsupervised training from bottom to top Individual hidden layer, then successively trains next hidden layer, and using the output of a upper hidden layer node as input, this is implicit Layer node output as next hidden layer input；Trim process to tape label data by carrying out top-down supervision Training, in the pre-training stage, trains first RBM first, then using the node trained as second RBM input, then Second RBM is trained, by that analogy；All RBM training are finely adjusted after completing using BP algorithm to network, will finally be deeply convinced Spend in the characteristic vector input Softmax graders of network output, the individual state of the one-dimensional physiological signal to bringing into is made Judge.

The step of extraction and sorting technique：

S1：Include one or more in one-dimensional physiological signal, including brain electricity, electrocardio, myoelectricity, breathing, skin electricity, and it is entered Row pretreatment operation and Feature Mapping operation, Feature Mapping is carried out in normed space, the Feature Mapping in normed space is obtained Figure, pretreatment contains denoising, filtering, hierachical decomposition, reconstructed operation；

S2：Building one includes input layer, multiple limited Boltzmann machine RBM, a counterpropagation structure and one The deep belief network DBN of grader, wherein, the limited Boltzmann machine RBM is as the core texture of whole network in quantity On have 1~it is N number of, it is nested against one another in structure；

S3：The deep belief network built using step S2 is to one-dimensional after preprocessed and Feature Mapping in step S1 Physiological signal carries out feature extraction, and extraction process is finely adjusted containing RBM training and BP algorithm to network；RBM is trained and BP algorithm Including：

1), in RBM training and BP algorithm fine setting, batch normalized is carried out before each layer of output；

2), successive ignition employs the CD algorithm CD-k algorithms of k iteration in Gibbs samplings；

3), the maximum likelihood that input data is changed into solution input sample is being fitted most possibly using Gibbs samplings In estimation over-fitting is prevented from Dropout methods；

4), during BP algorithm is finely adjusted to network, when being adjusted with the negative gradient direction of target to parameter, The iteration renewal for carrying out weights is adopted to every group of small sample using small lot gradient descent algorithm；

5), in propagated forward process choosing Sigmoid activation primitives from bottom to top；In top-down backpropagation Select ReLU activation primitives；

S4：The characteristic vector of deep belief network output described in step S3 is inputted in Softmax graders, to bringing into The individual state of one-dimensional physiological signal judge.

S31：It is described to carry out batch normalized before each layer of output in RBM training and BP algorithm fine setting, it is choosing Be normalized with Z-score standardized methods, training set and test set are utilized respectively Z-score transform the data into for Average is the normal distribution that 0, standard deviation is 1, then is transformed data in the range of [0,1], and Z-score standardized methods are utilized The average and standard deviation of Yuan's art data are normalized, and formula is as follows：

U is represented per one-dimensional average value in above formula, and σ represents that, per one-dimensional standard deviation, the data fit average after processing is 0, standard deviation is that standard 1) is just distributed very much；

S32：Successive ignition is using the CD algorithm CD-k algorithms of k iteration in Gibbs samplings：

For an input sample：V=(v₁,v₂,…,v_m), according to RBM, obtain the output sample h=after sample v codings (h₁,h₂,…,h_n), the output that this n is tieed up after coding is not understood as having extracted the input sample of n feature：

1) a training sample x is inputted₀, hidden layer number of stories m, learning rate ε；

2) initialization visual layers v₁=x₀, weight w, visual layers biasing b, hidden layer biasing c be close to 0；

3)for i<m；

4) formula is utilizedCalculate the distribution of hidden layer；

5) result for obtaining step (3), which is brought into, utilizes formulaCalculate visual layers The distribution of reconstruct；

6) result for obtaining step (4) brings formula intoIt is hidden after being reconstructed Containing layer distribution；

7) according to gradient descent algorithm, w, b, c are updated：%rec represents the mould after reconstruct

△ w=ε (<v_ih_j>_data-<v_ih_j>_rec)

△ b=ε (<v_i>_data-<v_i>_rec)

△ c=ε (<h_j>_data-<h_j>_rec)

8)end for；

9) w, b, c after output updates.

S33：The maximum likelihood that input data is changed into solution input sample is fitted most possibly being sampled using Gibbs Over-fitting is prevented from Dropout methods in estimation, is that Dropout prevented plan to realize in itself by changing model； Dropout randomly " deletes " node of a hidden layer part, is temporarily to be regarded as being not present by the node of " deletion ", parameter is temporary When do not update, but need to retain, these nodes of next iteration are possible to that training can be participated in again；

S34：During BP algorithm is finely adjusted to network, parameter is adjusted with the negative gradient direction of target When, the iteration renewal for carrying out weights is adopted to every group of small sample using small lot gradient descent algorithm, step is：

1) one group of small sample is randomly selected from sample is fully entered every time, the sample size that every group of small sample is included is Mini-batch；

2) iteration that every group of small sample carries out weights using batch gradient descent algorithm is updated；

3) repeat step 1) and 2), number of repetition is：Fully enter sample size/Mini-batch；

S35：When being adjusted with the negative gradient direction of target to parameter, in propagated forward process choosing from bottom to top Sigmoid activation primitives；

Selection course is：The Maximum-likelihood estimation of input sample

Derivation is carried out to parameter, likelihood function is solved and solves maximum, constantly carried object function using gradient rise method Rise, until reaching stop condition；J-th of visual layers node is obtained to the process of likelihood function maximizing and is activated that (value is " 1 ") probability and the probability that is activated of i-th of hidden layer node be respectively：

F is Sigmoid activation primitives in above formula；

Sigmoid activation primitives are defined as

The derivation of Sigmoid functions can be obtained：

Derivative is called soft saturation activation function for 0 activation primitive, and will | x | during more than certain number derivative for 0 it is sharp Function living is called hard saturation activation function, i.e.,：

ReLU activation primitives are selected in top-down backpropagation, ReLU (x) is in x<Occur that hard saturation shows when 0 As, but work as x>When 0, ReLU (x) derivative is 1, is not in " gradient disappearance ", so in back-propagation process, gradient is more Scattered phenomenon is lighter, and convergence faster, can effectively alleviate " gradient disappearance " phenomenon.ReLU functions are defined as：

ReLU (x)=max (0, x) (0-7)

It is described to select Dropout methods to prevent over-fitting in S33, before Dropout methods, the training flow of network It is, first input by network propagated forward, then error inversely to be propagated using BP algorithm, after Dropout methods, Training flow is changed into：

1) part hidden layer node in random erasure network；

2) input is reused BP algorithm and is inversely passed error by remaining node by remaining node propagated forward Broadcast；

3) recover deleted node, and at this moment do not updated by the parameter of the node of " deletion ", not deleted node Parameter have updated；Three above step is repeated, until iteration is completed.

In the characteristic vector input Softmax graders that deep belief network is exported, parameter C is in scope [2^-10,2¹⁰] in seek Look for optimal classification accuracy.

In using Gibbs samplings, the input sample for extracting n feature is comprised the following steps that：Likelihood function is asked most The process being worth greatly obtain j-th of visual layers node be activated (value be " 1 ") probability and i-th of hidden layer node be activated it is general Rate is respectively：

F is Sigmoid activation primitives in above formula；

1) i-th of node of hidden layer is calculated first with formula (2-13) to be activated the Probability p (h of (value be " 1 ")_i=1 |v)；

2) and then according to Gibbs sampling fitting input datas, h=(h are obtained₁,h₂,…,h_n), detailed process is：Produce one Individual 0~1 random number, if the value of random number is less than p (h_i=1 | v), then h_iValue be " 1 ", be otherwise " 0 "；

3) h after the coding obtained according to step (1), (2) decode obtaining original input v ', similarly, first P (v are calculated using formula (2-12)_j=1 | h), obtain the probability that j-th of node of visual layers is activated；

4) same to step (2) equally, produces the random number of one 0~1, if the value of random number is less than p (v_j=1 | h), that V_j' value be " 1 ", be otherwise " 0 "；

5) v ' for obtaining step (4) brings formula into and same step (2) equally obtains h ' by Gibbs sampling calculating；

6) weights, visual layers biasing, hidden layer biasing are updated finally according to formula (2-14), (2-15), (2-16)；Wherein η is learning rate, represents the speed increased or decreased when weights or biasing update；

Δ w=η (vh-v'h') (0-9)

Δ b=η (v-v') (0-10)

Δ c=η (h-h') (0-11).

The positive effect of the present invention is：

1st, efficiently solving needs the input of artificial selection feature to cause classification in traditional one-dimensional physiological signal assorting process The problem of precision is not high, by the Nonlinear Mapping of deep belief network, feature/combinations of features that automatically deriving height can divide is used for Classify, and network structure can be continued to optimize and obtain more preferable classifying quality.The training process of " pre-training+fine setting " can be regarded Substantial amounts of parameter is grouped by work, be each group of parameter look for it is local seem preferably to set, then again by these parts More excellent solution joins together to find global optimal solution, and such technical scheme both make use of the free degree that model quantity of parameters is provided, The expense of training is effectively saved again.

2nd, Gibbs samplings are a kind of method of samplings based on Markov Monte Carlo, make full use of conditional probability distribution, Iteratively x each component is sampled, with the increase of iterations, conditional probability distribution is by with the geometry of frequency in sampling The speed convergence of series shortens convergence time in joint probability distribution.

3rd, batch normalized is carried out before each layer of output, Z-score is utilized respectively to training set and test set will It is the normal distribution that 0, standard deviation is 1 that data, which are transformed to average, then is transformed data in the range of [0,1], greatly improves net The generalization ability of network, improves the training speed of network.

4th, optional activation primitive of the invention has：Sigmoid、ReLU；Before the deep belief network that the present invention relates to is divided into To communication process and back-propagation process, propagated forward and backpropagation can select same activation primitive to select also to may be selected Different activation primitive, it is adaptable to which various different physiological signals need.

5th, need to carry out successive ignition for Gibbs algorithms, the problem of convergence rate is slower is of the invention in Gibbs algorithms On the basis of utilize to sdpecific dispersion CD-k algorithms, can rapidly obtain the desired value of model, model is just obtained by k iteration Estimation, and k can be obtained by when taking less value it is preferably approximate.

6th, the present invention prevents over-fitting from Dropout methods, and over-fitting is reduced on the whole, improves efficiency.

Brief description of the drawings

Fig. 1 is the DBN networkings model structure and training process figure of the present invention.

Fig. 2 is the BP network structures of the present invention.

Fig. 3 is Sigmoid activation primitive figures.

Fig. 4 is ReLU activation primitive figures.

Fig. 5 is the network structure before and after Dropout, wherein, the left side is the network structure before Dropout；The right is Network structure after Dropout.

Fig. 6-1 is grader SVM recognition result confusion matrix figures.

Fig. 6-2 is grader DBN recognition result confusion matrix figures.

Fig. 7 is the average absolute value distribution map of the weights of DBN first layer after training in embodiment.

Embodiment

The present embodiment experiment hardware used, software environment are as shown in table 4-1：

Table 4-1

Data acquisition：

This experimental data is mood eeg data storehouse (the SJTU Emotion EEG provided by Shanghai Communications University Dataset,SEED)^[, this database contains three kinds of mood datas (actively, passive, neutral) based on EEG signals.Number According to 15 subjects are collected in, each each subject of requirement of experiment, which watches 15, can induce the vidclip of these three moods, During subject's viewing vidclip, the EEG signals of subject are gathered with the electric cap of the dry electrode brain of 62 passages, every time Test each subject and obtain 15 groups of EEG signals, (be actively to every group of EEG signals mark label according to the description of subject "+1 ", passiveness is " -1 ", and neutrality is " 0 "), there are 5 groups " positive ", 5 groups " passiveness ", 5 groups " neutrality " respectively.Each subject interval The time of 7 days or more than 7 days carries out above-mentioned experiment again, and each subject participates in 3 experiments altogether, therefore for 15 subjects Person, altogether obtain 15 × 3 × 15=675 group eeg datas, herein by once experiment in preceding 12 groups of data (contain 4 groups it is positive Mood, 4 groups of neutral moods, 4 groups of negative feelings) as training set, rear 3 groups of data (contain disposition in 1 group of active mood, 1 group Thread, 1 group of negative feeling) it is used as test set.

Collect after initial data, data providing is pre-processed to original EEG signals, then obtained again by filtering Take five frequency ranges of EEG signals (Delta frequency ranges (1~3Hz), Theta frequency ranges (4~7Hz), Alpha frequency ranges (8~13Hz), Beta frequency ranges (14~30Hz), Gamma frequency ranges (31~50Hz)) signal.Six kinds are used on the basis of this five frequency ranges again Eigentransformation method has carried out feature extraction to the data under each frequency range, is respectively：PSD、DE、ASM、DASM、RASM、 DCAU, this six kinds of eigentransformations have calculate it is simple, the features such as effectively can represent EEG signals.DE is in the conceptive of Shannon entropy Expand, the composition that can be effectively tested low frequency energy in the complexity of continuous random variable, EEG signals is more, DE can be with Effectively low frequency energy part in EEG signals and high-frequency energy part are made a distinction, due to there is 62 passages, so DE Sample dimension is 62 × 5=310.It is another there are some researches show, the asymmetric activity of brain has a significant impact in mood processing, thus DASM, RASM are extracted on the basis of DE asymmetric and reasonable not right as the difference between the DE of 27 pairs of brain asymmetric electrodes Claim, DASM and RASM combinations have just been obtained into ASM.DCAU represents the DE of 23 pairs of brain frontal cortexs and posterior lobe electrode difference.Except Carry out outside DE eigentransformations, be also extracted PSD features.Six kinds of eigentransformations of PSD, DE, ASM, DASM, RASM, DCAU, each The intrinsic dimensionality of eigentransformation sample is respectively：310、310、270、135、135、115.

Experiment flow：

DBN model of this experiment based on DeepLearn Toolbox frameworks, and batch normalization calculation is introduced on this basis Method and ReLU activation primitives.Successive ignition employs the CD algorithm CD-k algorithms of k iteration in Gibbs samplings.Using Fitting input data is changed into the Maximum-likelihood estimation for solving input sample from Dropout most possibly for Gibbs samplings Method prevents over-fitting；During BP algorithm is finely adjusted to network, parameter is adjusted with the negative gradient direction of target When whole, the iteration renewal for carrying out weights is adopted to every group of small sample using small lot gradient descent algorithm.By testing adjustment repeatedly The parameters of DBN model, determine optimal DBN model, and contrasted with SVM classification results.To based on different subjects Person is tested every time, the different eigentransformations of EEG signals and the recognition result of different frequency range are analyzed, and to iteration time The influence of number, learning rate, hidden layer node number to classification results is discussed.

As shown in figure 1, the flow chart of classification is trained for the DBN model that the present invention is used, first by original training Collection and test set are normalized, and then bringing training set and test set into model is trained classification.Fig. 2 is BP network configurations Figure.Know from Fig. 1,2, training is broadly divided into two steps of pre-training and fine setting, then will adjust the weights after updating, biasing band Enter grader and be predicted classification, classification accuracy is calculated finally according to the difference predicted the outcome between actual result.RBM is trained Parameter has：Connection weight w between hidden layer and visual layers_ij(i=1,2,3 ..., n；J=1,2,3 ..., m), visual layers biasing B=(b₁,b₂,b₃,…,b_m), hidden layer biasing c=(c₁,c₂,c₃,…,c_n)。

DBN training is mainly the process of constantly regulate weights and biasing, and the influence to weights and biasing is maximum just It is the number of plies of the depth of network, i.e. hidden layer and the nodal point number of each hidden layer.The study of network when the hidden layer number of plies is smaller Ability not enough, can only learn to some shallow-layer features, and be then changed into artificial neural network when the hidden layer number of plies is reduced to 1；Reason By above saying, increase hidden layer the number of plies can more accurately abstract input data essence, make classifying quality more preferably, but with layer Several increases, can bring more parameters to whole model, and the training time increases so that DBN generalization ability declines, and causes Fitting.The present embodiment combination initial data actual conditions, selection adds totally 4 layers of input layer and output layer using 2 hidden layers. By taking DE features as an example, input layer nodal point number is 310, and output layer nodal point number is 3, and centre contains two hidden layers, hidden layer knot Points are selected in 50~500 and 20~500 two scopes respectively.

When being adjusted with the negative gradient direction of target to parameter, using small lot gradient descent algorithm to every group of sample Originally the iteration renewal for carrying out weights is adopted, step is：

The specific steps of the present embodiment are summarized as two parts of DBN and BP, comprise the following steps that：

1)：Initialize DBN：The hidden layer number of plies, hidden layer node number, iterations, learning rate, momentum, every group of small sample Comprising sample size Mini-batch, Mini-batch be m, it is desirable to the quantity that can be fully entered sample divides exactly；Connection weight Value w, visual layers biasing b, hidden layer biasing c is both configured to 0；

2)：for i<Hidden layer number of plies %RBM training；

3)：repeat；

4)：for j<(N1/Mini-batch1)；

5)：RBM is trained, and according to formula (2-17)~(2-19)

Δ w'=m × Δ w+ η (vh-v'h') (0-12)

Δ b'=m × Δ b+ η (v-v') (0-13)

Δ c'=m × Δ c+ η (h-h') (0-14)

Update connection weight w, visual layers biasing b, hidden layer biasing c；

6) output when layer is calculated according to formula (2-13), and is used as the input of next hidden layer

7)：end for

8)：Until cycle-indexes=iterations；

9)：end for

10)：Initialize BP：Classification number, activation primitive, learning rate, momentum；Iterations, grader；Use

Connection weight w that face is obtained, visual layers biasing b, hidden layer biasing c initialization BP；

11)：repeat；

12)：for l<(N1/Mini-batch2)；

13)：The output of each hidden layer, and calculation error e are calculated according to formula (2-23)；

14)：According to formula (2-31)~(2-33) more connection weight w, visual layers biasing b, hidden layer biasing c；

15)：end for

16)：Until cycle-indexes=iterations；

17)：By test set, connection weight w, visual layers biasing b, hidden layer biasing c；Bring formula (2-23) meter into

Calculate prediction label y '；

18)：Calculate the true tag y of each classification；

19)：Export each category classification accuracy rate.

From above step it can be seen that, training is divided into pre-training (step 1~9) and fine setting (step 10~16) two is walked Suddenly, it then will adjust the weights after updating, biasing to bring grader into and be predicted classification, finally according to predicting the outcome and actual tie Difference between fruit calculates classification accuracy.Return because the present invention introduces to criticize on the basis of DeepLearn Toolbox frameworks One changes algorithm, so to carry out batch normalized before each layer exports, step 6,13,17 bring into activation primitive it Before will carry out batch normalized.

Step 13) as shown in Figure 2：Fig. 2, which gives one, d input node, q hidden layer node, l output node BP network structures, wherein input layer node x=(x₁,x₂,…,x_i,…,x_d), hidden layer node b=(b₁,b₂,…,b_h,… b_q), output node y=(y₁,y₂,…,y_j,…,y_l), θ_jRepresent the threshold value of j-th of node of output layer, γ_hRepresent to imply for h-th The threshold value of layer node, v_ihRepresent the weights between i-th of input layer node and h-th of hidden layer, w_hjRepresent h-th of hidden layer Weights between node and j-th of output layer node.

The input for obtaining h-th of node of hidden layer by input layer node and weights is：

The input for obtaining j-th of node of output layer by hidden layer node and weights is：

B in above formula_hThe output of h-th of node of hidden layer is represented, can obtain calculation formula according to above formula is：

If an input sample (x_k,y_k), if the output obtained by BP network trainings Calculation formula is：

So, the last mean square error E of network_kFor：

For the BP network structures shown in Fig. 2, it is thus necessary to determine that parameter have：(d+l+1) × q+l, it is respectively：d×q Weights, q × l hidden layer between individual input layer and hidden layer are to the weights between output layer, the threshold of q hidden layer node Value, the threshold value of l output layer node.BP algorithm is the process that a continuous iteration updates, and above-mentioned parameter can be according to following formula Come more new estimation (wherein v represents any one parameter)：

v←v+Δv (0-20)

BP algorithm is declined based on gradient, and parameter is adjusted with the negative gradient direction of target, therefore, works as learning rate η gives timing, and the knots modification of weights is：

According to Fig. 2 as can be seen that w_hjIt is the input value β for j-th of node for first passing through influence hidden layer output_j, then influence The output valve of j-th of nodeFinally mean square error E is had influence on again_k, therefore above formula can also be expressed as：

According to Sigmoid functions as activation primitive, then：

F'(x)=f (x) (1-f (x)) (0-23)

The gradient terms g of j-th of output layer node can be obtained according to formula (2-23), (2-24), (2-28)_jFor：

Similarly, the gradient terms e of h-th of hidden layer node_hFor：

Therefore, formula (2-22), (2-29) are brought into formula (2-27) and can be obtained by weight w_hjMore new formula：

Δw_hj=η g_jb_h (0-26)

Likewise it is possible to obtain θ_j、v_ih、γ_hMore new formula, be respectively：

Δθ_j=-η g_j (0-27)

Δv_ih=η e_hx_i (0-28)

Δγ_h=-η e_h (0-29)

Referring to accompanying drawing 3,4.

Activation primitive is that the problem of non-linear factor is to solve linearly inseparable is added in learning process, and the present invention is optional Activation primitive has：Sigmoid、ReLU；The deep belief network that the present invention relates to is divided into propagated forward process from bottom to top With in top-down back-propagation process, propagated forward and backpropagation same activation primitive can be selected to select Different activation primitives.

Sigmoid is most widely used activation primitive, is defined as：

Function curve to the derivation of Sigmoid functions as shown in figure 3, can obtain：

Derivative is called soft saturation activation function for 0 activation primitive, and incite somebody to action | x | derivative is during more than certain number

0 activation primitive is called hard saturation activation function, i.e.,：

Due to Sigmoid soft saturability, rear to during transmission, one is contained in the gradient that Sigmoid conducts downwards The factor relevant with derivative f ' (x), if input falls into soft saturation region, f ' (x) value just levels off to 0, therefore the ladder transmitted downwards Degree meeting very little, so that network parameter training effect is bad, this is also the major reason for once hindering neutral net development.This Plant phenomenon to be otherwise known as " gradient disappearance ", the general networking number of plies is easier to occur when within 5 layers.Although Sigmoid is activated " gradient disappearance " phenomenon occurs in function, but also has some advantages：Sigmoid in the physical sense with biological neural meta-model most It is close, input is compressed in the range of (0,1) by Sigmoid, the normalized to input can be regarded as, can also be regarded as point The probability of class is (such as：Activation primitive is output as 0.9, then the probability that can be construed to 90% is positive sample).

Linear function (Rectified Linear Function, ReLU) is corrected compared with Sigmoid functions, Ke Yiyou Effect alleviates " gradient disappearance " phenomenon, and ReLU functions are defined as：

ReLU (x)=max (0, x) (0-33)

Function curve is as shown in figure 4, ReLU (x) is in x<Hard saturated phenomenon occurs when 0, but works as x>When 0, ReLU's (x) Derivative is 1, is not in " gradient disappearance ", so in back-propagation process, gradient diffusing phenomenon is lighter, convergence is faster.

The successive ignition of the present embodiment uses the CD algorithm CD-k algorithms of k iteration：

For an input sample：V=(v₁,v₂,…,v_m), according to RBM, obtain the output sample h=after sample v codings (h₁,h₂,…,h_n), the output that this n is tieed up after coding is not understood as having extracted the input sample of n feature,

Step is：

3)for i<m；

4) formula is utilizedCalculate the distribution of hidden layer；

(6) result for obtaining step (4) brings formula intoAfter being reconstructed Hidden layer is distributed；

(7) according to gradient descent algorithm, w, b, c are updated：%rec represents the mould after reconstruct

△ w=ε (<v_ih_j>_data-<v_ih_j>_rec)

△ b=ε (<v_i>_data-<v_i>_rec)

△ c=ε (<h_j>_data-<h_j>_rec)

(8)end for；

(9) w, b, c after output updates.

In DBN training process, it is more likely that because the hidden layer number of plies is more, hidden layer node number is more, and sample Data volume is smaller etc., and reason causes over-fitting, and over-fitting can cause poor classifying quality.The present invention is come from Dropout methods Prevent over-fitting.

Dropout is also one kind of regularization method, and over-fitting is prevented in itself to realize by changing model.Dropout Thought be：The node of a hidden layer part randomly " is deleted ", 50% is such as deleted.It is temporarily to be regarded as by the node of " deletion " It is not present, parameter does not update temporarily, but needs to retain, these nodes of next iteration is possible to that training can be participated in again.

The H before using Dropout₁And H₂Between weights W₂For：

W₂=(w₁₁,w₁₂,w₁₃,w₁₄,w₂₁,w₂₂,w₂₃,w₂₄,w₃₁,w₃₂,w₃₃,w₃₄) (0-34)

If in H₁Behind use node filter function m=[1,0,1], then H₁Part node will quilt at random " deletion " (intermediate node is by " deletion "), obtains new hidden layer H₁’：

As can be seen from the above equation, node h₁ ²Randomly " deleted ", then in this training process, with node h₁ ² Related parameter (w₂₁,w₂₂,w₂₃,w₂₄) will not be updated, but these parameters are not to be zeroed out, simply temporarily at this Do not updated in iterative process, if node h in next iteration₁ ²Not by " deletion ", then these parameters continue to update.

Before using Dropout methods, the training flow of network is, first input by network propagated forward, then to make Error is inversely propagated with BP algorithm, after Dropout methods, training flow is changed into：

1) part hidden layer node in random erasure network；

Unsupervised pre-training and the items of supervised training in the DBN model of the present embodiment DeepLearn Toolbox frameworks Parameter setting is as shown in table 4-2.

As can be seen from the table, active mood has compared to negative feeling and neutral mood in Gamma frequency ranges and Beta frequency ranges There is higher energy, negative feeling is similar with the energy size of Beta frequency ranges in Gamma frequency ranges with neutral mood, and negative feeling exists Alpha frequency ranges have higher energy.These find that these three moods of explanation have specific nerve pattern in high-frequency band, are The classification of follow-up mood provides foundation.

Table 4-2

The Emotion identification result of the present embodiment is with being analyzed as follows：

Based on EEG signals Emotion identification research in one it is extremely important the problem of be exactly：Can accurately and reliably it know Do not go out the identical mood that same subject induces under different time and different conditions, therefore the present embodiment is to each subject The mood data of three experiments of person is recognized.By taking DE features as an example, as shown in table 4-3, to use SVM and two kinds points of DBN The recognition result that class device is tested each time to 15 subjects.

Table 4-3

Although from table 4-3 can be seen that each experimentation in harvester, the psychologic status of subject etc. might have Different degrees of difference, but each subject tested at three times in can obtain similar accuracy rate (average value of standard deviation is 1.44%).Therefore, it is stable and repeatable to the experiment that mood is identified based on EEG signals, so in practical application In, it is feasible that the mood of same subject's different time, which is identified, using EEG signals.

Simultaneously as can be seen that being 89.12% using the DBN Average Accuracies being identified, standard deviation is 6.54%, than number According to the recognition effect of provider, (average recognition rate is 86.08%, and standard deviation is that 8.34%) more preferably, average recognition rate is improved 3.04%, standard deviation reduces 1.80%.

In addition, can be obtained from table, SVM average classification accuracy is 84.2%, and standard deviation is 9.24%, and is based on DBN average classification accuracy is 89.12%, and the classifying quality that standard deviation is 6.54%, DBN is significantly better than SVM, with higher Classification accuracy, stability also more preferably (average value is higher, and standard deviation is relatively low).

As shown in Fig. 6-1,6-2, be deep belief network-DBN and two kinds of graders of SVMs-SVM to one be tested Obtained classification accuracy confusion matrix is identified in experimental data of person.The original classification of row representative sample in figure, arranges generation Numeral (i, j) in the classification of table sort device prediction, matrix represents that classification i is identified as the color on the right in classification j probability, figure Bar color is corresponding with the size of probability.As can be seen that actively can well be recognized by SVM and DBN with neutral mood；Although disappearing Recognition effect of the pole mood in SVM and DBN is not fine, but in SVM, negative feeling and neutral and positive two kinds of moods Many (passiveness for having 31% is identified as neutrality, and 24% passiveness is identified as actively) are all compared in the part obscured, and DBN can be bright Improve the recognition result of negative feeling aobviously (only 5% passiveness is identified as neutrality, and 9% passiveness is identified as actively).

The Emotion identification result converted based on different characteristic is as follows：

In order to study six kinds of eigentransformations PSD, DE, DASM, RASM, ASM, DCAU to the Emotion identification based on EEG signals Influence, as shown in table 4-4, be under full frequency band using different characteristic convert recognition result.

Table 4-4

The PSD features used compared to tradition are can be seen that from table 4-4, two kinds of graders of DBN and SVM are known using DE features Other effect is best, with highest average value and minimum standard deviation.Because high frequency of the DE features for brain mood Feature has balanced action to a certain extent, and making the effect of high-frequency characteristic becomes strong, therefore, and DE features are compared to PSD features more Suitable for the Emotion identification based on brain electricity.Meanwhile, knowledge of these four non-symmetrical features of DASM, RASM, ASM, DCAU to mood Also not there is higher accuracy rate, although these four features it is less compared to the dimension of DE features and PSD features (DASM is 27 dimensions, RASM is 27 dimensions, and ASM is 54 dimensions, and DCAU is 23 dimensions), but the accuracy rate suitable with DE features can be also reached, this explanation mood EEG signals have asymmetry during generation, and the asymmetry activity of brain is meaningful in Emotion identification.But also need to The reason for subsequent experimental is because of intrinsic dimensionality further to verify whether causes this four features of DASM, RASM, ASM, DCAU Accuracy rate it is relatively low with respect to DE features.

It is special with DE as shown in table 4-5 to further study influence of the frequency range to the Emotion identification based on EEG signals Exemplified by levying, the result (%) being identified using the EEG signals under different frequency range and full frequency band.

Table 4-5

It can be found that using the data of different frequency range there are different effects to Emotion identification by table 4-5, uses full range The data of section have best effect.And in five frequency ranges, the discrimination of Beta frequency ranges and Gamma frequency ranges have compare other three Individual frequency range has higher average value and relatively low standard deviation, it can be said that bright Beta frequency ranges and Gamma frequency ranges are known in mood There is crucial effect in not.

Referring to accompanying drawing 7.

Feature extraction and feature selecting are combined together by DBN, can be automatically selected out to classification useful feature, and mistake Filter the feature unrelated with classification.Fig. 7 is the distribution of the average absolute value of first hidden layer weights of the DBN after training Figure, it can be seen that the higher value of weights is mainly distributed on Beta frequency ranges and Gamma frequency ranges after training.And weights it is larger show with The connected input of the weights has larger contribution to the classification results finally exported, this explanation Beta frequency range and Gamma frequency ranges Contain more information relevant with mood.Therefore Beta frequency ranges and Gamma frequency ranges can be referred to as to the key frequency band of mood.

Claims

1. a kind of feature extraction and state recognition of the one-dimensional physiological signal based on deep learning, it is characterised in that：

Set up feature extraction and the state identification data analysis model DBN, DBN model of the one-dimensional physiological signal based on deep learning Using " pre-training+fine setting " training process：Pre-training process is using unsupervised training from bottom to top, and training first first is hidden Containing layer, next hidden layer is then successively trained, and using the output of a upper hidden layer node as input, by this hidden layer knot Point output as next hidden layer input；Trim process is instructed by the way that tape label data are carried out with top-down supervision Practice, in the pre-training stage, first RBM is trained first, then then instruct the node trained as second RBM input Practice second RBM, by that analogy；All RBM training are finely adjusted after completing using BP algorithm to network, finally will degree of deeply convinceing In the characteristic vector input Softmax graders of network output, the individual state of the one-dimensional physiological signal to bringing into, which is made, to be sentenced It is disconnected；

The step of extraction and sorting technique：

S1：Include one or more in one-dimensional physiological signal, including brain electricity, electrocardio, myoelectricity, breathing, skin electricity, and it is carried out pre- Processing operation and Feature Mapping operation, Feature Mapping is carried out in normed space, the Feature Mapping figure in normed space is obtained, in advance Processing contains denoising, filtering, hierachical decomposition, reconstructed operation；

S2：Building one includes input layer, multiple limited Boltzmann machine RBM, a counterpropagation structure and a classification The deep belief network DBN of device, wherein, the limited Boltzmann machine RBM quantitatively has 1 as the core texture of whole network ~N number of, it is nested against one another in structure；

S3：The deep belief network built using step S2 is to the one-dimensional physiology after preprocessed and Feature Mapping in step S1 Signal carries out feature extraction, and extraction process is finely adjusted containing RBM training and BP algorithm to network；RBM is trained and BP algorithm includes：

3), the Maximum-likelihood estimation that input data is changed into solution input sample is being fitted most possibly using Gibbs samplings Middle selection Dropout methods prevent over-fitting；

4), during BP algorithm is finely adjusted to network, when being adjusted with the negative gradient direction of target to parameter, use Small lot gradient descent algorithm adopts the iteration renewal for carrying out weights to every group of small sample；

5), in propagated forward process choosing Sigmoid activation primitives from bottom to top；Selected in top-down backpropagation ReLU activation primitives；

S4：By in the characteristic vector input Softmax graders of deep belief network output described in step S3, to bring into one The individual state of dimension physiological signal is judged.

2. the feature extraction and state recognition of the one-dimensional physiological signal as claimed in claim 1 based on deep learning, its feature It is：

S31：It is described to carry out criticizing normalized in RBM is trained and BP algorithm is finely tuned, before each layer of output, it is to select Z- Score standardized methods are normalized, and being utilized respectively Z-score to training set and test set transforms the data into as average The normal distribution for being 1 for 0, standard deviation, then transform data in the range of [0,1], Z-score standardized methods utilize arithmetic The average and standard deviation of data are normalized, and formula is as follows：

U is represented per one-dimensional average value in above formula, and σ is represented per one-dimensional standard deviation, and the data fit average after processing is 0, mark Quasi- difference is just distributed very much for 1 standard；

For an input sample：V=(v₁,v₂,…,v_m), according to RBM, obtain the output sample h=(h after sample v codings₁, h₂,…,h_n), the output that this n is tieed up after coding is not understood as having extracted the input sample of n feature：

3)for i<m；

4) formula is utilizedCalculate the distribution of hidden layer；

5) result for obtaining step (3), which is brought into, utilizes formulaCalculate visual layers reconstruct Distribution；

6) result for obtaining step (4) brings formula intoHidden layer after being reconstructed Distribution；

△ w=ε (<v_ih_j>_data-<v_ih_j>_rec)

△ b=ε (<v_i>_data-<v_i>_rec)

△ c=ε (<h_j>_data-<h_j>_rec)

8)end for；

9) w, b, c after output updates.

S33：The Maximum-likelihood estimation that input data is changed into solution input sample is fitted most possibly being sampled using Gibbs Middle selection Dropout methods prevent over-fitting, are Dropout prevented plan to realize in itself by changing model；Dropout with Machine " deletion " hidden layer part node, be temporarily to be regarded as being not present by the node of " deletion ", parameter does not update temporarily, But need to retain, these nodes of next iteration are possible to that training can be participated in again；

S34：During BP algorithm is finely adjusted to network, when being adjusted with the negative gradient direction of target to parameter, adopt The iteration renewal for carrying out weights is adopted to every group of small sample with small lot gradient descent algorithm, step is：

1) one group of small sample is randomly selected from sample is fully entered every time, the sample size that every group of small sample is included is Mini- batch；

Selection course is：The Maximum-likelihood estimation of input sample

Derivation is carried out to parameter, likelihood function is solved and solves maximum, constantly lifted object function using gradient rise method, directly Reach stop condition；J-th of visual layers node is obtained to the process of likelihood function maximizing to be activated the general of (value is " 1 ") The probability that rate and i-th of hidden layer node are activated is respectively：

F is Sigmoid activation primitives in above formula；

Sigmoid activation primitives are defined as

The derivation of Sigmoid functions can be obtained：

Derivative is called soft saturation activation function for 0 activation primitive, and will | x | during more than certain number derivative for 0 activation letter Number is called hard saturation activation function, i.e.,：

| x | ＞ c, c are constant (0-6)

ReLU activation primitives are selected in top-down backpropagation, ReLU (x) is in x<Hard saturated phenomenon occurs when 0, but Work as x>When 0, ReLU (x) derivative is 1, is not in " gradient disappearance ", ReLU functions are defined as：

ReLU (x)=max (0, x) (0-7).

3. the feature extraction and state recognition of the one-dimensional physiological signal as claimed in claim 2 based on deep learning, its feature exist In：

Described to select Dropout methods to prevent over-fitting in S33, before Dropout methods, the training flow of network is first Input by network propagated forward, then error is inversely propagated using BP algorithm, after Dropout methods, training Flow is changed into：

1) part hidden layer node in random erasure network；

2) input is reused BP algorithm and is inversely propagated error by remaining node by remaining node propagated forward；

3) recover deleted node, and at this moment do not updated by the parameter of the node of " deletion ", not the ginseng of deleted node Number have updated；Three above step is repeated, until iteration is completed.

4. the feature extraction and state recognition of the one-dimensional physiological signal as claimed in claim 1 based on deep learning, its feature It is：The characteristic vector that deep belief network is exported is inputted in Softmax graders, and parameter hidden layer biases C in scope [2^-10,2¹⁰] the optimal classification accuracy of interior searching.

5. the feature extraction and state recognition of the one-dimensional physiological signal as claimed in claim 1 based on deep learning, its feature It is：In using Gibbs samplings, the input sample for extracting n feature is comprised the following steps that：To likelihood function maximizing Process obtain j-th of visual layers node and be activated, be worth the probability point being activated for the probability and i-th of hidden layer node of " 1 " It is not：

F is Sigmoid activation primitives in above formula；

1) i-th of node of hidden layer is calculated first with formula (2-13) to be activated the Probability p (h of (value be " 1 ")_i=1 | v)；

2) and then according to Gibbs sampling fitting input datas, h=(h are obtained₁,h₂,…,h_n), detailed process is：Produce one 0 ~1 random number, if the value of random number is less than p (h_i=1 | v), then h_iValue be " 1 ", be otherwise " 0 "；

3) h after the coding obtained according to step (1), (2) decode obtaining original input v ', similarly, first with Formula (2-12) calculates p (v_j=1 | h), obtain the probability that j-th of node of visual layers is activated；

4) same to step (2) equally, produces the random number of one 0~1, if the value of random number is less than p (v_j=1 | h), then v_j’ Value be " 1 ", be otherwise " 0 "；

Δ w=η (vh-v'h') (0-9)

Δ b=η (v-v') (0-10)

Δ c=η (h-h') (0-11).