CN107256393B - Feature extraction and state recognition of one-dimensional physiological signals based on deep learning - Google Patents

Feature extraction and state recognition of one-dimensional physiological signals based on deep learning Download PDF

Info

Publication number
CN107256393B
CN107256393B CN201710414832.1A CN201710414832A CN107256393B CN 107256393 B CN107256393 B CN 107256393B CN 201710414832 A CN201710414832 A CN 201710414832A CN 107256393 B CN107256393 B CN 107256393B
Authority
CN
China
Prior art keywords
training
input
network
layer
hidden layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710414832.1A
Other languages
Chinese (zh)
Other versions
CN107256393A (en
Inventor
张俊然
杨豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201710414832.1A priority Critical patent/CN107256393B/en
Publication of CN107256393A publication Critical patent/CN107256393A/en
Application granted granted Critical
Publication of CN107256393B publication Critical patent/CN107256393B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a feature extraction and state recognition method of a one-dimensional physiological signal based on deep learning. Establishing a DBN (direct binary digit network) model based on deep learning one-dimensional physiological signals, wherein the DBN model adopts a training process of 'pre-training + fine adjustment', and in the pre-training stage, firstly training a first RBM (radial basis function), then taking a trained node as the input of a second RBM, then training the second RBM, and so on; and after all RBM training is finished, fine tuning is carried out on the network by using a BP algorithm, finally, the feature vector output by the deep confidence network is input into a Softmax classifier, and the individual state of the incorporated one-dimensional physiological signal is judged. The invention effectively solves the problem of low classification precision caused by manually selecting feature input in the traditional one-dimensional physiological signal classification process, automatically obtains highly separable feature/feature combinations for classification through the nonlinear mapping of a deep confidence network, and can continuously optimize a network structure to obtain better classification effect.

Description

Feature extraction and state recognition of one-dimensional physiological signals based on deep learning
Technical Field
The invention relates to the technical field of medical data processing, in particular to a physiological signal feature extraction and classification identification method, and specifically relates to feature extraction and state identification of one-dimensional physiological signals based on deep learning.
Background
Physiological signals are dominated by an autonomic nervous system and an endocrine system, are not controlled by subjective consciousness, and can objectively and truly reflect physiological, mental and emotional states of individuals, so that the physiological signals are researched and applied more and more widely. Since physiological signals are external expressions of states such as physiology, spirit, and emotion of an individual and can directly and truly reflect changes in these states, many researchers have used different classifiers to identify the states of an individual based on physiological signals (electroencephalogram-EEG, electrocardiography, myoelectricity, respiration, and bioelectricity). Although the number of classifiers suitable for recognizing individual states of physiological signals is increasing at present, the recognition rate is also increasing, most classifiers need to manually extract features, and the recognition rate is related to manual experience, is unstable and has a certain distance from practical application. For example, Moghimi et al uses a linear discriminant analysis classifier to identify emotional states based on brain blood oxygen variation, and the identification rate is about 72 percent
In 2011, Li Shufang et al adopts an empirical mode decomposition EMD and a support vector machine SVM to classify epileptic state of an electroencephalogram signal, firstly, the EMD is decomposed by the empirical mode to divide the electroencephalogram EEG signal into a plurality of empirical mode components, then effective characteristics are extracted, then, the support vector machine SVM is used to classify the electroencephalogram EEG signal, and finally, the recognition rate of an intermittent period and a seizure period of epileptic seizure reaches 99%. In 2014, Niu et al performed feature selection by using a genetic algorithm and identified emotional states based on electrocardio, myoelectricity, respiration and electrodermal by using a K-nearest neighbor classifier, wherein the identification rate reaches 96%. However, the higher recognition rate obtained by the method depending on different method combinations or different signal combinations has the characteristic of high specificity, so that the method is difficult to popularize to the general situation, and the method which finds a certain combination mode has greater chance.
Since the development of deep learning in Image Net competition made by Hinton et al stand out by using convolutional neural network in 2012, the research of deep learning is pushed to climax, and therefore, the research in the field of signal and information processing has attracted much attention and application, and especially has achieved unprecedented effects in the directions of Image processing, voice recognition and the like. With the rapid development of deep learning, the deep learning is also primarily applied to the processing of physiological electrical signals such as electroencephalogram, myoelectricity, electrocardio, dermatophytosis and the like, and a surprising effect is achieved. Through continuous development, a large number of Deep learning frameworks (such as Deep TooLbox, Caffe, Deep learning, etc.) and models (such as Deep Belief Networks (DBN), sparse automatic encoders, recurrent neural networks, etc.) have appeared. However, how to utilize and improve these frameworks and models to adapt them to practical problems is a matter of current research.
Disclosure of Invention
The invention aims to provide a one-dimensional physiological signal feature extraction and state recognition method based on deep learning, aiming at solving the problem that the classification precision is not high due to the fact that feature input needs to be manually selected in the traditional one-dimensional physiological signal classification process, automatically obtaining highly separable features and feature combinations for classification through nonlinear mapping of a deep belief network, and continuously optimizing a network structure to obtain a better classification effect.
The basic idea of the invention is as follows: the method comprises the following steps of (1) extracting features of one-dimensional physiological signals based on deep learning and identifying a state recognition database network model DBN, wherein the model adopts a training process of pre-training and fine-tuning: the pre-training process adopts the unsupervised training from bottom to top, and the basic principle is that a first hidden layer is trained firstly, then a next hidden layer is trained layer by layer, the output of a node of the previous hidden layer is used as the input, and the output of the node of the current hidden layer is used as the input of the next hidden layer; the fine tuning process is to carry out supervision training from top to bottom on the data with the label, so that errors are propagated reversely, and model parameters are fine tuned, wherein an error reverse rebroadcasting algorithm BP algorithm is generally used in the fine tuning process. The training process of 'pre-training + fine-tuning' can be regarded as grouping a large number of parameters, finding local better setting for each group of parameters, then combining the local better solutions to find global optimal solution, and using different activation functions, CD algorithms and small-batch gradient descent algorithms to carry out iterative update technology of weights.
The DBN model is formed by stacking a plurality of RBMs. The training process of the DBN model is as follows: in the pre-training stage, firstly training a first RBM, then taking the trained nodes as the input of a second RBM, then training the second RBM, and so on; and after all RBM training is finished, fine tuning the network by using a BP algorithm. The DBN is stacked through a plurality of RBMs, and processing of the previous layer by each layer can be regarded as processing of input layer by layer, and input with an unfamiliar relationship between an initial value and an output category is converted into a more intimate representation with the category.
The purpose of the invention is achieved by the following steps:
establishing a one-dimensional physiological signal feature extraction and state recognition data analysis model DBN based on deep learning, wherein the DBN adopts a training process of pre-training and fine-tuning: the pre-training process adopts the unsupervised training from bottom to top, firstly trains a first hidden layer, then trains the next hidden layer by layer, takes the output of the node of the previous hidden layer as the input, and takes the output of the node of the current hidden layer as the input of the next hidden layer; the fine adjustment process is implemented by performing supervision training on labeled data from top to bottom, in the pre-training stage, a first RBM is trained firstly, then the trained nodes are used as the input of a second RBM, the second RBM is trained again, and the like; and after all RBM training is finished, fine tuning is carried out on the network by using a BP algorithm, finally, the feature vector output by the deep confidence network is input into a Softmax classifier, and the individual state of the incorporated one-dimensional physiological signal is judged.
The extraction and classification method comprises the following steps:
s1: bringing in one-dimensional physiological signals including one or more of electroencephalogram, electrocardio, myoelectricity, respiration and electrodermal, performing preprocessing operation and feature mapping operation on the signals, performing feature mapping in a standard space to obtain a feature mapping image in the standard space, wherein the preprocessing comprises denoising, filtering, hierarchical decomposition and reconstruction operation;
s2: constructing a deep confidence network DBN which comprises an input layer, a plurality of restricted Boltzmann machines RBM, a back propagation structure and a classifier, wherein the number of the restricted Boltzmann machines RBM serving as core structures of the whole network is 1-N, and the restricted Boltzmann machines RBM are nested in the structure; s3: performing feature extraction on the one-dimensional physiological signal subjected to preprocessing and feature mapping in the step S1 by using the deep confidence network constructed in the step S2, wherein the extraction process comprises RBM training and BP algorithm to perform fine adjustment on the network; the RBM training and BP algorithm comprises:
1) in RBM training and BP algorithm fine tuning, batch normalization processing is carried out before each layer is output;
2) a CD-k algorithm of k iterations is adopted in multiple iterations in Gibbs sampling;
3) selecting Dropout method to prevent overfitting in maximum likelihood estimation of converting to solve input sample using Gibbs sampling to fit input data to maximum possible;
4) in the process of fine adjustment of the BP algorithm to the network, when parameters are adjusted in the negative gradient direction of a target, a small batch gradient descent algorithm is adopted to carry out iterative update of weights on each group of small samples;
5) selecting a Sigmoid activation function in the process of forward propagation from bottom to top; selecting a ReLU activation function in a top-down back propagation;
s4: and inputting the feature vector output by the deep belief network in the step S3 into a Softmax classifier, and judging the individual state of the one-dimensional physiological signal included in the feature vector.
S31: in RBM training and BP algorithm fine tuning, batch normalization processing is carried out before each layer of output, a Z-score standardization method is selected for normalization processing, data are converted into normal distribution with a mean value of 0 and a standard deviation of 1 by respectively using Z-score for a training set and a test set, then the data are converted into a range of [0,1], the Z-score standardization method carries out normalization by using the mean value and the standard deviation of Yuan-Chi data, and the formula is as follows:
Figure GDA0002242905460000031
in the formula, u represents the average value of each dimension, sigma represents the standard deviation of each dimension, the processed data conforms to the standard positive-phase distribution with the average value of 0 and the standard deviation of 1);
s32: the CD-k algorithm, which uses k iterations for multiple iterations in Gibbs sampling, is:
for one input sample: v ═ v (v)1,v2,…,vm) According to RBM, obtaining the output sample h (h) after the sample v is coded1,h2,…,hn) The n-dimensional encoded output is understood to be the input sample with n features extracted:
1) inputting a training sample x0Implicit layer number d, learning rate epsilon;
2) initializing a visual layer v1=x0The weight w, the visible layer bias b and the hidden layer bias c are close to 0;
3)for g<s;
substituting the obtained result into the utilization formula
Figure GDA0002242905460000041
Calculating the distribution of visual layer reconstruction;
using formulas
Figure GDA0002242905460000042
Calculating the distribution of the hidden layer;
i and j represent the neuron node serial numbers of the hidden layer and the visible layer (i is less than or equal to n, and j is less than or equal to m);
substituting the obtained result into a formula
Figure GDA0002242905460000043
Obtaining the reconstructed hidden layer distribution;
according to the gradient descent algorithm, updating w, b, c: % rec denotes the modulus after reconstruction
△w=ε(<vihj>data-<vihj>rec)
△b=ε(<vi>data-<vi>rec)
△c=ε(<hj>data-<hj>rec)
end for;
4) And outputting the updated w, b and c.
S33: selecting Dropout method to prevent overfitting in maximum likelihood estimation of transforming to solve input sample using Gibbs sampling to fit input data most likely, it is Dropout that prevents overfitting by changing the model itself; dropout randomly "deletes" nodes that are part of the hidden layer, the "deleted" nodes are only temporarily regarded as nonexistent, the parameters are not updated temporarily, but need to be retained, and the nodes may participate in training in the next iteration;
s34: in the process of fine-tuning the network by the BP algorithm, when parameters are adjusted in the negative gradient direction of a target, a small-batch gradient descent algorithm is adopted to carry out iterative update of weights on each group of small samples, and the steps are as follows:
1) randomly extracting a group of small samples from all input samples each time, wherein the number of samples contained in each group of small samples is Mini-batch;
2) carrying out iterative updating on the weight value of each group of small samples by adopting a batch gradient descent algorithm;
3) repeating the steps 1) and 2) for the following times: inputting the total number of samples/Mini-batch;
s35: when parameters are adjusted in the negative gradient direction of a target, a Sigmoid activation function is selected in the bottom-up forward propagation process;
the selection process is as follows: maximum likelihood estimation of input samples
Figure GDA0002242905460000051
Carrying out derivation on the parameters, solving a likelihood function to solve a maximum value, and continuously improving the target function by using a gradient increasing method until a stopping condition is reached; the process of maximizing the likelihood function obtains the probability that the jth visible layer node is activated (with the value of "1") and the probability that the ith hidden layer node is activated respectively as follows:
Figure GDA0002242905460000052
Figure GDA0002242905460000053
in the above formula, f is a Sigmoid activation function;
the Sigmoid activation function is defined as
Figure GDA0002242905460000054
Derivation of the Sigmoid function yields:
Figure GDA0002242905460000055
an activation function with a derivative of 0 is called a soft saturation activation function, while an activation function with a derivative of 0 when | x | is greater than a certain number is called a hard saturation activation function, i.e.:
Figure GDA0002242905460000056
the ReLU activation function is selected during the backward propagation from top to bottom, the ReLU (x) can generate a hard saturation phenomenon when x is less than 0, but when x is greater than 0, the derivative of the ReLU (x) is 1, and the gradient disappearance cannot occur, so that the gradient diffusion phenomenon is lighter and the convergence is faster in the backward propagation process, and the gradient disappearance phenomenon can be effectively relieved.
The ReLU function is defined as:
ReLU(x)=max(0,x) (8)
the Dropout method is selected at S33 to prevent overfitting, before the Dropout method is used, the training procedure of the network is to propagate the input forward through the network, then propagate the error backward using the BP algorithm, after the Dropout method is used, the training procedure becomes:
1) randomly deleting part of hidden layer nodes in the network;
2) the input is propagated forwards through the residual nodes, and then the error is propagated reversely through the residual nodes by using a BP algorithm;
3) restoring the deleted nodes, wherein the parameters of the nodes which are deleted are not updated at the moment, and the parameters of the nodes which are not deleted are updated; and repeating the three steps until the iteration is completed.
Inputting the feature vector output by the deep confidence network into a Softmax classifier, wherein the parameter C is in the range [2 ]-10,210]And searching the optimal classification accuracy.
In using Gibbs sampling, the specific steps to extract input samples of n features are as follows: the process of maximizing the likelihood function obtains the probability that the jth visible layer node is activated (with the value of "1") and the probability that the ith hidden layer node is activated respectively as follows:
Figure GDA0002242905460000064
Figure GDA0002242905460000061
in the above formula, f is a Sigmoid activation function;
1) firstly, the probability p (h) that the ith node of the hidden layer is activated (taking the value as '1') is calculated by using the formula (4)i=1|v);
Figure GDA0002242905460000062
2) The input data is then fitted according to Gibbs sampling to yield h ═ (h)1,h2,…,hn) The specific process is as follows: generating a random number of 0-1, if the value of the random number is less than p (h)i1| v), then hiIs "1", otherwise is "0";
3) decoding the coded h obtained in the steps 1) and 2) to obtain the original input v', and similarly, firstly calculating p (v) by using the formula (4)j1| h), obtaining the activated probability of the jth node of the visual layer;
Figure GDA0002242905460000063
4) generating a random number of 0-1 as in step 2), if the value of the random number is less than p (v)j1| h), then vjThe value of' is "1", otherwise "0";
5) substituting v 'obtained in the step 4) into a formula and calculating h' by Gibbs sampling in the same way as in the step 2);
6) finally, updating the weight, the visual layer bias and the hidden layer bias according to the formulas (9), (10) and (11), wherein η is the learning rate and represents the increasing or decreasing rate when the weight or the bias is updated;
Δw=η(vh-v'h') (9)
Δb=η(v-v') (10)
Δc=η(h-h') (11)。
the invention has the positive effects that:
1. the problem that the classification precision is low due to the fact that manual feature selection input is needed in the traditional one-dimensional physiological signal classification process is effectively solved, highly separable feature/feature combinations are automatically obtained for classification through nonlinear mapping of a deep confidence network, and the network structure can be continuously optimized to obtain a better classification effect. The training process of 'pre-training + fine-tuning' can be regarded as grouping a large number of parameters, finding local better setting for each group of parameters, and then combining the local better solutions to find a global optimal solution, so that the technical scheme not only utilizes the freedom degree provided by a large number of parameters of the model, but also effectively saves the training overhead.
2. Gibbs sampling is a sampling method based on Markov Monte Carlo, each component of x is iteratively sampled by fully utilizing conditional probability distribution, and the conditional probability distribution converges to joint probability distribution at the speed of geometric progression of sampling times along with the increase of iteration times, so that the convergence time is shortened.
3. Before each layer is output, batch normalization processing is carried out, data are converted into normal distribution with the mean value of 0 and the standard deviation of 1 by the aid of Z-score for the training set and the test set respectively, and then the data are converted into the range of [0,1], so that generalization capability of the network is greatly improved, and training speed of the network is improved.
4. The optional activation functions of the present invention are: sigmoid, ReLU; the deep-information network related by the invention is divided into a forward propagation process and a backward propagation process, the forward propagation and the backward propagation can select the same activation function or different activation functions, and the deep-information network is suitable for various different physiological signal requirements.
5. For the problems that the Gibbs algorithm needs to be iterated for multiple times and the convergence rate is low, the invention can quickly calculate the expected value of the model by using the contrast divergence CD-k algorithm on the basis of the Gibbs algorithm, the estimation of the model is obtained through iteration for k times, and better approximation can be obtained when k takes a smaller value.
6. The invention adopts Dropout method to prevent overfitting, reduces overfitting on the whole and improves efficiency.
Drawings
FIG. 1 is a diagram of a DBN network model structure and training process of the present invention.
Fig. 2 is a diagram of a BP network architecture of the present invention.
Fig. 3 is a diagram of Sigmoid activation function.
Fig. 4 is a graph of the ReLU activation function.
Fig. 5 is a network structure diagram before and after Dropout, in which the network structure before Dropout is on the left; on the right is the network structure after Dropout.
FIG. 6-1 is a diagram of a confusion matrix of the recognition results of the SVM classifier.
Fig. 6-2 is a diagram of a classifier DBN recognition result confusion matrix.
FIG. 7 is a graph of the mean absolute values of the weights of the first layer of the DBN after training in an embodiment.
Detailed Description
The hardware and software environment used in the experiment of this example is shown in table 4-1:
TABLE 4-1
Figure GDA0002242905460000081
Data acquisition:
the experimental data is Emotion electroencephalogram database provided by Shanghai university of transportation (SJTU Emotion EEGDataset, SEED)[This database contains three emotional data (positive, negative, neutral) based on the brain electrical signals. The data is collected in 15 testees, each experiment requires that each tester watches 15 movie fragments capable of inducing the three emotions, in the process that the testees watch the movie fragments, a 62-channel dry electrode electroencephalogram cap is used for collecting electroencephalogram signals of the testees, each tester obtains 15 groups of electroencephalogram signals in each experiment, each group of electroencephalogram signals is marked with labels (positive is +1, negative is-1 and neutral is 0) according to the description of the testees, and 5 groups of positive, 5 groups of negative and 5 groups of neutral are provided respectively. The above experiment was performed again at intervals of 7 days or more for each subject, and each subject participated in 3 experiments in total, so that 15 × 3 × 15 (675) groups of electroencephalogram data were obtained for 15 subjects, and the first 12 groups of data (including the first 12 groups of data) in one experiment are included herein4 groups of positive emotions, 4 groups of neutral emotions, 4 groups of negative emotions) as a training set, and the last 3 groups of data (including 1 group of positive emotions, 1 group of neutral emotions, 1 group of negative emotions) as a test set.
After the original data are collected, a data provider preprocesses the original EEG signals, and then signals of five frequency bands (Delta frequency band (1-3 Hz), Theta frequency band (4-7 Hz), Alpha frequency band (8-13 Hz), Beta frequency band (14-30 Hz) and Gamma frequency band (31-50 Hz)) of the EEG signals are obtained through filtering. And then on the basis of the five frequency bands, six feature transformation methods are used for carrying out feature extraction on the data under each frequency band, wherein the feature extraction methods are as follows: PSD, DE, ASM, DASM, RASM, DCAU, these six kinds of characteristic transformation have characteristics such as calculating simply, can effectively represent brain electrical signal. DE is expanded on the concept of Shannon entropy, can effectively test the complexity of continuous random variables, has more components of low-frequency energy in electroencephalogram signals, can effectively distinguish the low-frequency energy part from the high-frequency energy part in the electroencephalogram signals, and has 62 channels, so the sample dimension of DE is 62 multiplied by 5-310. Another study showed that the brain's asymmetric activity has a significant impact on emotional processing, and therefore DASM, RASM were extracted on the basis of DE as the differential and rational asymmetry between DE for 27 pairs of brain asymmetric electrodes, and combining DASM and RASM resulted in ASM. DCAU represents the difference in DE of 23 pairs of frontal and posterior brain electrodes. In addition to the DE feature transformation, PSD features are also extracted. Six feature transformations of PSD, DE, ASM, DASM, RASM and DCAU are provided, wherein the feature dimensions of each feature transformation sample are respectively as follows: 310. 310, 270, 135, 115.
The experimental process comprises the following steps:
the experiment is based on a DBN model of a DeepLearn Toolbox framework, and a batch normalization algorithm and a ReLU activation function are introduced on the basis. The CD-k algorithm is used for k iterations in multiple iterations in Gibbs sampling. Choosing Dropout method to prevent overfitting in maximum likelihood estimation that transitions to solving input samples using Gibbs sampling to most likely fit the input data; in the process of fine tuning the network by the BP algorithm, when parameters are adjusted in the negative gradient direction of a target, a small batch gradient descent algorithm is adopted to carry out iterative updating of the weight of each group of small samples. And adjusting various parameters of the DBN model through repeated experiments, determining the optimal DBN model, and comparing the optimal DBN model with the classification result of the SVM. The method analyzes the results of each experiment based on different testees, the characteristic transformation of different electroencephalogram signals and the identification of different frequency bands, and discusses the influence of iteration times, learning rate and hidden layer node number on the classification result.
As shown in fig. 1, which is a flowchart of training and classifying a DBN model used in the present invention, an original training set and a test set are normalized, and then the training set and the test set are brought into the model for training and classifying. FIG. 2 is a block diagram of a BP network. As shown in fig. 1 and 2, the training is mainly divided into two steps of pre-training and fine-tuning, then the adjusted and updated weight and bias are brought into a classifier for prediction classification, and finally the classification accuracy is calculated according to the difference between the prediction result and the actual result. The RBM training parameters are as follows: connection weight w between hidden layer and visual layerij(i-1, 2,3, …, n; j-1, 2,3, …, m), and a visible layer bias b- (b)1,b2,b3,…,bm) Hidden layer bias c ═ c1,c2,c3,…,cn)。
The training of the DBN is mainly a process of continuously adjusting the weights and the offsets, and what has the greatest influence on the weights and the offsets is the depth of the network, i.e., the number of hidden layers and the number of nodes of each hidden layer. When the number of hidden layers is smaller, the learning capability of the network is insufficient, only some shallow features can be learned, and when the number of hidden layers is reduced to 1, the network becomes an artificial neural network; theoretically speaking, the nature of input data can be abstracted more accurately by increasing the number of layers of the hidden layer, so that the classification effect is better, but more parameters can be brought to the whole model along with the increase of the number of layers, the training time is prolonged, the generalization capability of the DBN is reduced, and overfitting is caused. In this embodiment, 2 hidden layers are selected and used in combination with the actual situation of the original data, and 4 layers including an input layer and an output layer are added. Taking DE characteristics as an example, the node number of an input layer is 310, the node number of an output layer is 3, two hidden layers are included in the middle, and the node numbers of the hidden layers are respectively selected from the range of 50-500 and the range of 20-500.
When parameters are adjusted in the negative gradient direction of a target, a small-batch gradient descent algorithm is adopted to carry out iterative update of weights on each group of small samples, and the steps are as follows:
1) randomly extracting a group of small samples from all input samples each time, wherein the number of samples contained in each group of small samples is Mini-batch;
2) carrying out iterative updating on the weight value of each group of small samples by adopting a batch gradient descent algorithm;
3) repeating the steps 1) and 2) for the following times: inputting the total number of samples/Mini-batch;
the specific steps of this embodiment are summarized as DBN and BP, and the specific steps are as follows:
1): initializing a DBN: the number of hidden layer layers, the number of hidden layer nodes, the number of iterations, the learning rate and the momentum, the number of samples Mini-batch contained in each group of small samples, namely m, are required to be evenly divided by the number of all input samples; connecting the weight w, the visible layer bias b and the hidden layer bias c to be 0;
2): for i < RBM of hidden layer number percentage;
3):repeat;
4):for j<(N1/Mini-batch1);
5): training RBM, and according to formulas (12) - (14)
Δw'=m×Δw+η(vh-v'h') (12)
Δb'=m×Δb+η(v-v') (13)
Δc'=m×Δc+η(h-h') (14)
Updating the connection weight w, the visible layer bias b and the hidden layer bias c;
6) the output of the current layer is calculated according to the formula (2-13) and is used as the input of the next hidden layer
Figure GDA0002242905460000101
7):end for
8): the number of until cycles is equal to the number of iterations;
9):end for
10): initializing BP: category number, activation function, learning rate, momentum; iteration times, classifier; initializing BP by using the obtained connection weight w, visible layer bias b and hidden layer bias c;
11):repeat;
12):for l<(N1/Mini-batch2);
13): calculating the output of each hidden layer according to formula (18) and calculating the error e;
Figure GDA0002242905460000111
14): connecting weight w, visible layer bias b and hidden layer bias c according to the formula (26-28);
15):end for
16): the number of until cycles is equal to the number of iterations;
17): connecting the test set with the weight w, the visible layer bias b and the hidden layer bias c; substituting the formula (18) to calculate a prediction label y';
18): calculating a real label y of each category;
19): and outputting the classification accuracy of each category.
As seen from the steps, the training is divided into two steps of pre-training steps 1) to 9) and fine-tuning steps 10) to 16), then the adjusted and updated weight and bias are brought into a classifier for prediction classification, and finally the classification accuracy is calculated according to the difference between the prediction result and the actual result. Since the invention introduces the batch normalization algorithm on the basis of the DeepLearn Toolbox framework, the batch normalization processing is carried out before each layer is output, and the batch normalization processing is carried out before the activation functions are brought in at the steps 6), 13) and 17).
Step 13) is shown in fig. 2: fig. 2 shows a BP network structure with d input nodes, q hidden layer nodes, and l output nodes, where input layer node x ═ x (x)1,x2,…,xi,…,xd) The hidden layer node b ═ b1,b2,…,bh,…bq) And the output node y is (y)1,y2,…,yj,…,yl),θjThreshold, gamma, representing the jth node of the output layerhThreshold, v, representing the h-th hidden layer nodeihRepresents the weight between the ith input layer node and the h hidden layer, whjRepresenting the weight between the h hidden layer node and the j output layer node.
The input of the h-th node of the hidden layer obtained by the input layer node and the weight is as follows:
Figure GDA0002242905460000112
the input of the jth node of the output layer obtained by the nodes of the hidden layer and the weight is as follows:
Figure GDA0002242905460000113
in the above formula bhThe output of the h-th node of the hidden layer is represented, and the calculation formula can be obtained according to the formula:
Figure GDA0002242905460000114
if an input sample (x)k,yk) Output if BP network training
Figure GDA0002242905460000115
The calculation formula is as follows:
Figure GDA0002242905460000121
then the final mean square error E of the networkkComprises the following steps:
Figure GDA0002242905460000122
for the BP network structure shown in fig. 2, the parameters that need to be determined are common: (d + l +1) × q + l, each: the weight between d × q input layers and hidden layers, the weight between q × l hidden layers and output layers, the threshold of q hidden layer nodes, and the threshold of l output layer nodes. The BP algorithm is a continuous iterative updating process, and the above parameters can be updated according to the following formula (where v represents any one of the parameters):
v←v+Δv (20)
the BP algorithm adjusts the parameters in the negative gradient direction of the target based on the gradient descent, so when the learning rate η is given, the change amount of the weight is:
Figure GDA0002242905460000123
as can be seen from FIG. 2, whjIs first passed through the input value β of the jth node that affects the hidden layer outputjAgain affecting the output value of the j node
Figure GDA0002242905460000124
Finally, the mean square error E is influencedkThus, the above formula can also be expressed as:
Figure GDA0002242905460000125
if the Sigmoid function is adopted as the activation function, then:
f'(x)=f(x)(1-f(x)) (23)
obtaining a gradient term g of a j output layer node according to the formulas (2-23), (2-24) and (2-28)jComprises the following steps:
Figure GDA0002242905460000126
similarly, the gradient term e of the h-th hidden layer nodehComprises the following steps:
Figure GDA0002242905460000131
therefore, by substituting the equations (2-22) and (2-29) into the equation (2-27), it is possible to obtainTo weight value whjThe update formula of (2):
Δwhj=ηgjbh(26)
likewise, θ can be obtainedj、vih、γhThe update formula of (2) is respectively:
Δθj=-ηgj(27)
Δvih=ηehxi(28)
Δγh=-ηeh(29)
see fig. 3 and 4.
The activation function is to add nonlinear factors in the learning process to solve the problem of inseparability of linearity, and the selectable activation functions of the invention are as follows: sigmoid, ReLU; the deep belief network related by the invention is divided into a forward propagation process from bottom to top and a backward propagation process from top to bottom, wherein the forward propagation and the backward propagation can select the same activating function or different activating functions.
Sigmoid is the most widely used activation function, defined as:
Figure GDA0002242905460000132
the functional curve is shown in fig. 3, and the Sigmoid function is derived as follows:
Figure GDA0002242905460000133
an activation function with a derivative of 0 is called a soft saturation activation function, while an activation function with a derivative of 0 when | x | is greater than a certain number is called a hard saturation activation function, i.e.:
Figure GDA0002242905460000134
due to the soft saturation of Sigmoid, when the Sigmoid is transmitted in the backward direction, the gradient of downward conduction of Sigmoid includes a factor related to the derivative f '(x), and if the input falls into the soft saturation region, the value of f' (x) approaches to 0, so the gradient of downward transmission is very small, and the training effect of network parameters is not good, which is also an important reason for once hindering the development of the neural network. This phenomenon, also known as "gradient vanishing", typically occurs when the number of network layers is within 5. Although Sigmoid activation functions may exhibit the phenomenon of "gradient disappearance", there are some advantages: sigmoid is closest to the biological neuron model in physical sense, and Sigmoid compresses the input to the range of (0,1), which can be regarded as the normalization processing of the input, and also as the probability of classification (for example, the output of the activation function is 0.9, which can be interpreted as 90% probability of positive sample).
Compared with a Sigmoid Function, a corrected Linear Function (ReLU) can effectively alleviate the phenomenon of gradient disappearance, and the ReLU Function is defined as:
ReLU(x)=max(0,x) (8)
as shown in FIG. 4, ReLU (x) shows hard saturation when x <0, but when x >0, the derivative of ReLU (x) is 1, so that the gradient dispersion is lighter and the convergence is faster in the backward propagation process.
The multiple iterations of this embodiment use the CD-k algorithm for k iterations:
for one input sample: v ═ v (v)1,v2,…,vm) According to RBM, obtaining the output sample h (h) after the sample v is coded1,h2,…,hn) The n-dimensional encoded output is understood to be the input sample from which n features have been extracted,
the method comprises the following steps:
1) inputting a training sample x0Implicit layer number d, learning rate epsilon;
2) initializing a visual layer v1=x0The weight w, the visible layer bias b and the hidden layer bias c are close to 0;
3)for g<s; wherein g represents the g-th training process with the total number less than the number s of samples, and g is a positive integer to bring the obtained result into a formula
Figure GDA0002242905460000141
Calculating the distribution of visual layer reconstruction;
using formulas
Figure GDA0002242905460000142
Calculating the distribution of the hidden layer;
i and j represent the neuron node serial numbers of the hidden layer and the visible layer (i is less than or equal to n, and j is less than or equal to m);
substituting the obtained result into the formula again
Figure GDA0002242905460000143
Obtaining the reconstructed hidden layer distribution;
according to the gradient descent algorithm, updating w, b, c: % rec denotes the modulus after reconstruction
△w=ε(<vihj>data-<vihj>rec)
△b=ε(<vi>data-<vi>rec)
△c=ε(<hj>data-<hj>rec)
end for;
4) And outputting the updated w, b and c.
In the training process of the DBN, overfitting is probably caused by the reasons that the number of hidden layer layers is large, the number of nodes of the hidden layers is large, the sample data size is small and the like, and the overfitting can cause poor classification effect. The invention chooses Dropout method to prevent overfitting.
Dropout is also one of the regularization methods, implemented by changing the model itself to prevent overfitting. The idea of Dropout is: nodes that are part of the hidden layer are "deleted" randomly, e.g., 50%. The nodes that are "deleted" are only temporarily regarded as not existing, the parameters are not updated temporarily, but need to be retained, and the nodes may participate in training in the next iteration.
Before using Dropout H1And H2Weight W between2Comprises the following steps:
W2=(w11,w12,w13,w14,w21,w22,w23,w24,w31,w32,w33,w34) (30)
if at H1Followed by a node filter function m ═ 1,0,1]Then H1Part of nodes are randomly deleted (the intermediate nodes are deleted) to obtain a new hidden layer H1’:
Figure GDA0002242905460000151
As can be seen from the above equation, node h1 2Is randomly 'deleted', and node h is connected in the training process1 2Related parameter (w)21,w22,w23,w24) The parameters are not updated, but are not set to zero, and are not updated in the iteration process temporarily, if the node h in the next iteration1 2Without being "deleted," the parameters will continue to be updated.
Before using the Dropout method, the training process of the network is to propagate the input forward through the network, then to propagate the error backward using the BP algorithm, and after using the Dropout method, the training process becomes:
1) randomly deleting part of hidden layer nodes in the network;
2) the input is propagated forwards through the residual nodes, and then the error is propagated reversely through the residual nodes by using a BP algorithm;
3) restoring the deleted nodes, wherein the parameters of the nodes which are deleted are not updated at the moment, and the parameters of the nodes which are not deleted are updated; and repeating the three steps until the iteration is completed.
The parameter settings of the unsupervised pre-training and supervised training in the DBN model of the deep leanron Toolbox framework in this embodiment are shown in table 4-2.
It can be seen from the table that the positive emotions have higher energy in the Gamma frequency band and the Beta frequency band than the negative emotions and the neutral emotions, the negative emotions and the neutral emotions have similar energy in the Gamma frequency band and the Beta frequency band, and the negative emotions have higher energy in the Alpha frequency band. These findings indicate that these three emotions have specific neural patterns in high frequency bands, which provides a basis for classification of subsequent emotions.
TABLE 4-2
Figure GDA0002242905460000161
The training of the DBN is mainly a process of continuously adjusting the weights and the offsets, and what has the greatest influence on the weights and the offsets is the depth of the network, i.e., the number of hidden layers and the number of nodes of each hidden layer. When the number of hidden layers is smaller, the learning capability of the network is insufficient, only some shallow features can be learned, and when the number of hidden layers is reduced to 1, the network becomes an artificial neural network; theoretically speaking, the nature of input data can be abstracted more accurately by increasing the number of layers of the hidden layer, so that the classification effect is better, but more parameters can be brought to the whole model along with the increase of the number of layers, the training time is prolonged, the generalization capability of the DBN is reduced, and overfitting is caused. In this embodiment, 2 hidden layers are selected and used in combination with the actual situation of the original data, and 4 layers including an input layer and an output layer are added. Taking DE characteristics as an example, the node number of an input layer is 310, the node number of an output layer is 3, two hidden layers are included in the middle, and the node numbers of the hidden layers are respectively selected from the range of 50-500 and the range of 20-500.
The emotion recognition results and analysis of this example are as follows:
one very important problem in emotion recognition research based on electroencephalogram signals is that: whether the same emotion induced by the same subject at different time and in different states can be accurately and reliably recognized, so that the emotion data of each subject in three experiments is recognized. Taking the DE characteristics as an example, as shown in table 4-3, the recognition results of each experiment on 15 subjects using two classifiers, SVM and DBN, were obtained.
Tables 4 to 3
Figure GDA0002242905460000171
As can be seen from tables 4-3, although the collection device, the psychological condition of the subject, etc. may vary to different degrees during each experiment, each subject obtained similar accuracy (average value of standard deviation of 1.44%) in three experiments. Therefore, the experiment for recognizing the emotion based on the electroencephalogram signals is stable and repeatable, so that the emotion of the same subject at different times can be recognized by utilizing the electroencephalogram signals in practical application.
Meanwhile, the average accuracy rate of recognition by using the DBN is 89.12%, the standard deviation is 6.54%, the recognition effect is better than that of a data provider (the average recognition rate is 86.08%, and the standard deviation is 8.34%), the average recognition rate is improved by 3.04%, and the standard deviation is reduced by 1.80%.
In addition, it can be found from the table that the average classification accuracy of SVM is 84.2% and the standard deviation is 9.24%, while the average classification accuracy based on DBN is 89.12% and the standard deviation is 6.54%, and the classification effect of DBN is significantly better than that of SVM, with higher classification accuracy and better stability (higher average value and lower standard deviation).
As shown in fig. 6-1 and 6-2, a classification accuracy confusion matrix is obtained by identifying the data of one experiment of a human subject through two classifiers, namely a deep confidence network-DBN classifier and a support vector machine-SVM classifier. The rows in the figure represent the original classes of the samples, the columns represent the classes predicted by the classifier, the numbers (i, j) in the matrix represent the probability that class i is identified as class j, and the color bar on the right in the figure corresponds to the size of the probability. It can be seen that both positive and neutral emotions are well recognized by SVM and DBN; although the recognition effect of the negative emotions is not good in both SVM and DBN, the recognition effect of the negative emotions is high in the SVM (31% of the negative emotions are recognized as neutral and 24% of the negative emotions are recognized as positive), while the recognition effect of the negative emotions is significantly improved by the DBN (only 5% of the negative emotions are recognized as neutral and 9% of the negative emotions are recognized as positive).
The emotion recognition results based on different feature transformations are as follows:
in order to study the effect of six feature transformations PSD, DE, DASM, RASM, ASM, and DCAU on emotion recognition based on electroencephalogram signals, as shown in tables 4 to 4, the recognition results using different feature transformations in the full frequency band were obtained.
Tables 4 to 4
Figure GDA0002242905460000181
As can be seen from tables 4-4, compared to the traditionally used PSD features, the two classifiers, DBN and SVM, have the best recognition effect with DE features, with the highest mean and the lowest standard deviation. This is because the DE feature has a certain degree of balance with the high-frequency feature of the brain emotion, and the effect of the high-frequency feature becomes strong, and therefore, the DE feature is more suitable for emotion recognition based on electroencephalogram than the PSD feature. Meanwhile, four asymmetric features of DASM, RASM, ASM and DCAU also have higher accuracy in emotion recognition, and although the dimensions of DE features and PSD features are less compared with the four features (27 dimensions for DASM, 27 dimensions for RASM, 54 dimensions for ASM and 23 dimensions for DCAU), the accuracy equivalent to the DE features can be achieved, which shows that electroencephalogram signals have asymmetry when emotion is generated, and asymmetric activities of the brain are meaningful in emotion recognition. But subsequent experiments are also needed to further verify whether the accuracy of the four features DASM, RASM, ASM, DCAU is lower than that of the DE feature due to the feature dimension.
In order to further study the influence of the frequency band on emotion recognition based on electroencephalogram signals, as shown in tables 4-5, the DE characteristics are taken as an example, and the electroencephalogram signals under different frequency bands and full frequency bands are used for recognition results (%).
Tables 4 to 5
Figure GDA0002242905460000191
It can be found from tables 4 to 5 that the use of data of different frequency bands has different effects on emotion recognition, and the use of data of full frequency bands has the best effect. In the five frequency bands, the recognition rates of the Beta frequency band and the Gamma frequency band have higher average values and lower standard deviations compared with those of other three frequency bands, so that the Beta frequency band and the Gamma frequency band have key functions in emotion recognition.
See fig. 7.
The DBN combines feature extraction and feature selection, and can automatically select features useful for classification, while filtering out features irrelevant to classification. Fig. 7 is a distribution diagram of the average absolute value of the first hidden layer weights of the DBN after training, and it can be seen that the larger values of the weights after training are mainly distributed in the Beta frequency band and the Gamma frequency band. And the larger weight value indicates that the input connected with the weight value has larger contribution to the classification result finally output, which indicates that the Beta frequency band and the Gamma frequency band contain more information related to emotion. The Beta band and the Gamma band can be called as keys of emotion.

Claims (5)

1. A feature extraction and state recognition of one-dimensional physiological signals based on deep learning is characterized in that:
establishing a one-dimensional physiological signal feature extraction and state recognition data analysis model DBN based on deep learning, wherein the DBN adopts a training process of pre-training and fine-tuning: the pre-training process adopts the unsupervised training from bottom to top, firstly trains a first hidden layer, then trains the next hidden layer by layer, takes the output of the node of the previous hidden layer as the input, and takes the output of the node of the current hidden layer as the input of the next hidden layer; the fine adjustment process is implemented by performing supervision training on labeled data from top to bottom, in the pre-training stage, a first RBM is trained firstly, then the trained nodes are used as the input of a second RBM, the second RBM is trained again, and the like; after all RBM training is finished, fine tuning is carried out on the network by using a BP algorithm, finally, the feature vector output by the deep confidence network is input into a Softmax classifier, and the individual state of the incorporated one-dimensional physiological signal is judged;
and (3) extracting and state identifying:
s1: bringing in one-dimensional physiological signals including one or more of electroencephalogram, electrocardio, myoelectricity, respiration and electrodermal, performing preprocessing operation and feature mapping operation on the signals, performing feature mapping in a standard space to obtain a feature mapping image in the standard space, wherein the preprocessing comprises denoising, filtering, hierarchical decomposition and reconstruction operation;
s2: constructing a deep confidence network DBN which comprises an input layer, a plurality of limited Boltzmann machines RBM and a back propagation structure and finally comprises a classifier, wherein the limited Boltzmann machines RBM are used as core structures of the whole network, have 1-N in number and are nested with each other structurally;
s3: performing feature extraction on the one-dimensional physiological signal subjected to preprocessing and feature mapping in the step S1 by using the deep confidence network constructed in the step S2, wherein the extraction process comprises RBM training and BP algorithm to perform fine adjustment on the network; the RBM training and BP algorithm comprises:
1) in RBM training and BP algorithm fine tuning, batch normalization processing is carried out before each layer is output;
2) a CD-k algorithm of k iterations is adopted in multiple iterations in Gibbs sampling;
3) selecting Dropout method to prevent overfitting in maximum likelihood estimation of converting to solve input sample using Gibbs sampling to fit input data to maximum possible;
4) in the process of fine adjustment of the BP algorithm to the network, when parameters are adjusted in the negative gradient direction of a target, a small batch gradient descent algorithm is adopted to carry out iterative update of weights on each group of small samples;
5) selecting a Sigmoid activation function in the process of forward propagation from bottom to top; selecting a ReLU activation function in a top-down back propagation;
s4: and inputting the feature vector output by the deep belief network in the step S3 into a Softmax classifier, and judging the individual state of the one-dimensional physiological signal included in the feature vector.
2. The feature extraction and state recognition of one-dimensional physiological signals based on deep learning of claim 1, wherein:
s31: in RBM training and BP algorithm fine tuning, batch normalization processing is carried out before each layer of output, a Z-score normalization method is selected for normalization processing, data are converted into normal distribution with the mean value of 0 and the standard deviation of 1 by respectively using Z-score for a training set and a test set, then the data are converted into the range of [0,1], the Z-score normalization method carries out normalization by using the mean value and the standard deviation of arithmetic data, and the formula is as follows:
Figure FDA0002242905450000021
in the formula, u represents the average value of each dimension, sigma represents the standard deviation of each dimension, and the processed data conform to the standard normal distribution with the average value of 0 and the standard deviation of 1;
s32: the CD-k algorithm, which uses k iterations for multiple iterations in Gibbs sampling, is:
for one input sample: v ═ v (v)1,v2,…,vm) According to RBM, obtaining the output sample h (h) after the sample v is coded1,h2,…,hn) The n-dimensional encoded output is understood to be the input sample with n features extracted:
1) inputting a training sample x0Implicit layer number k, learning rate epsilon;
2) initializing a visual layer v1=x0The weight w, the visible layer bias b and the hidden layer bias c are close to 0;
3)for g<s;
substituting the obtained result into the utilization formula
Figure FDA0002242905450000022
Calculating the distribution of visual layer reconstruction;
using formulas
Figure FDA0002242905450000023
Calculating the distribution of the hidden layer;
i and j represent the neuron node serial numbers of the hidden layer and the visible layer (i is less than or equal to n, and j is less than or equal to m);
will obtainThe result is again substituted into the formula
Figure FDA0002242905450000024
Obtaining the reconstructed hidden layer distribution;
according to the gradient descent algorithm, updating w, b, c: % rec denotes the modulus after reconstruction
△w=ε(<vihj>data-<vihj>rec)
△b=ε(<vi>data-<vi>rec)
△c=ε(<hj>data-<hj>rec)
end for;
4) Outputting updated w, b and c;
s33: selecting Dropout method to prevent overfitting in maximum likelihood estimation of transforming to solve input sample using Gibbs sampling to fit input data most likely, it is Dropout that prevents overfitting by changing the model itself; dropout randomly "deletes" nodes that are part of the hidden layer, the "deleted" nodes are only temporarily regarded as nonexistent, the parameters are not updated temporarily, but need to be retained, and the nodes may participate in training in the next iteration;
s34: in the process of fine-tuning the network by the BP algorithm, when parameters are adjusted in the negative gradient direction of a target, a small-batch gradient descent algorithm is adopted to carry out iterative update of weights on each group of small samples, and the steps are as follows:
1) randomly extracting a group of small samples from all input samples each time, wherein the number of samples contained in each group of small samples is Mini-batch;
2) carrying out iterative updating on the weight value of each group of small samples by adopting a batch gradient descent algorithm;
3) repeating the steps 1) and 2) for the following times: inputting the total number of samples/Mini-batch;
s35: when parameters are adjusted in the negative gradient direction of a target, a Sigmoid activation function is selected in the bottom-up forward propagation process;
the selection process is as follows: maximum likelihood estimation of input samples
Figure FDA0002242905450000031
Carrying out derivation on the parameters, solving a likelihood function to solve a maximum value, and continuously improving the target function by using a gradient increasing method until a stopping condition is reached; the process of maximizing the likelihood function obtains the probability that the jth visible layer node is activated (with the value of "1") and the probability that the ith hidden layer node is activated respectively as follows:
Figure FDA0002242905450000032
Figure FDA0002242905450000033
in the above formula, f is a Sigmoid activation function;
the Sigmoid activation function is defined as
Figure FDA0002242905450000034
Derivation of the Sigmoid function yields:
Figure FDA0002242905450000035
an activation function with a derivative of 0 is called a soft saturation activation function, while an activation function with a derivative of 0 when | x | is greater than a certain number is called a hard saturation activation function, i.e.:
Figure FDA0002242905450000041
the ReLU activation function is selected in the top-down backward propagation, ReLU (x) shows hard saturation when x <0, but when x >0, the derivative of ReLU (x) is 1, no gradient vanishing occurs,
the ReLU function is defined as:
ReLU(x)=max(0,x) (8) 。
3. the feature extraction and state recognition of one-dimensional physiological signals based on deep learning of claim 2, wherein:
the Dropout method is selected at S33 to prevent overfitting, before the Dropout method is used, the training procedure of the network is to propagate the input forward through the network, then propagate the error backward using the BP algorithm, after the Dropout method is used, the training procedure becomes:
1) randomly deleting part of hidden layer nodes in the network;
2) the input is propagated forwards through the residual nodes, and then the error is propagated reversely through the residual nodes by using a BP algorithm;
3) restoring the deleted nodes, wherein the parameters of the nodes which are deleted are not updated at the moment, and the parameters of the nodes which are not deleted are updated; and repeating the three steps until the iteration is completed.
4. The feature extraction and state recognition of one-dimensional physiological signals based on deep learning of claim 1, wherein: inputting the characteristic vector output by the deep confidence network into a Softmax classifier, wherein the parameter hidden layer bias C is in the range [2 ]-10,210]And searching the optimal classification accuracy.
5. The feature extraction and state recognition of one-dimensional physiological signals based on deep learning of claim 1, wherein: in using Gibbs sampling, the specific steps to extract input samples of n features are as follows: the process of solving the maximum value of the likelihood function obtains that the jth visible layer node is activated, and the probability that the jth visible layer node is 1 and the probability that the ith hidden layer node is activated are respectively as follows:
Figure FDA0002242905450000042
Figure FDA0002242905450000043
in the above formula, f is a Sigmoid activation function;
1) firstly, the probability p (h) that the ith node of the hidden layer is activated (taking the value as '1') is calculated by using the formula (4)i=1|v);
Figure FDA0002242905450000044
2) The input data is then fitted according to Gibbs sampling to yield h ═ (h)1,h2,…,hn) The specific process is as follows: generating a random number of 0-1, if the value of the random number is less than p (h)i1| v), then hiIs "1", otherwise is "0";
3) decoding the coded h obtained in the steps 1) and 2) to obtain the original input v', and similarly, firstly calculating p (v) by using a formulaj1| h), obtaining the activated probability of the jth node of the visual layer;
Figure FDA0002242905450000051
4) generating a random number of 0-1 as in step 2), if the value of the random number is less than p (v)j1| h), then vjThe value of' is "1", otherwise "0";
5) substituting the v 'obtained in the step 4) into a formula (3) and obtaining h' by Gibbs sampling calculation in the same way as in the step 2);
6) finally, updating the weight, the visual layer bias and the hidden layer bias according to the formulas (9), (10) and (11), wherein η is the learning rate and represents the increasing or decreasing rate when the weight or the bias is updated;
Δw=η(vh-v'h') (9)
Δb=η(v-v') (10)
Δc=η(h-h') (11)。
CN201710414832.1A 2017-06-05 2017-06-05 Feature extraction and state recognition of one-dimensional physiological signals based on deep learning Active CN107256393B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710414832.1A CN107256393B (en) 2017-06-05 2017-06-05 Feature extraction and state recognition of one-dimensional physiological signals based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710414832.1A CN107256393B (en) 2017-06-05 2017-06-05 Feature extraction and state recognition of one-dimensional physiological signals based on deep learning

Publications (2)

Publication Number Publication Date
CN107256393A CN107256393A (en) 2017-10-17
CN107256393B true CN107256393B (en) 2020-04-24

Family

ID=60024431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710414832.1A Active CN107256393B (en) 2017-06-05 2017-06-05 Feature extraction and state recognition of one-dimensional physiological signals based on deep learning

Country Status (1)

Country Link
CN (1) CN107256393B (en)

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107634911B (en) * 2017-10-31 2020-03-10 河南科技大学 Adaptive congestion control method based on deep learning in information center network
CN108209870A (en) * 2017-12-25 2018-06-29 河海大学常州校区 Long-term EEG monitoring automatic seizure detection method based on convolutional neural networks
CN108062572B (en) * 2017-12-28 2021-04-06 华中科技大学 Hydroelectric generating set fault diagnosis method and system based on DdAE deep learning model
CN108523907B (en) * 2018-01-22 2021-07-16 上海交通大学 Fatigue state identification method and system based on deep shrinkage sparse self-coding network
CN108347764A (en) * 2018-01-23 2018-07-31 南京航空航天大学 Examination hall radio cheating signal framing method and system based on deep learning
CN108040073A (en) * 2018-01-23 2018-05-15 杭州电子科技大学 Malicious attack detection method based on deep learning in information physical traffic system
CN108287763A (en) * 2018-01-29 2018-07-17 中兴飞流信息科技有限公司 Parameter exchange method, working node and parameter server system
CN108229664B (en) * 2018-01-31 2021-04-30 北京市商汤科技开发有限公司 Batch standardization processing method and device and computer equipment
CN108449295A (en) * 2018-02-05 2018-08-24 西安电子科技大学昆山创新研究院 Combined modulation recognition methods based on RBM networks and BP neural network
US20190279082A1 (en) * 2018-03-07 2019-09-12 Movidius Ltd. Methods and apparatus to determine weights for use with convolutional neural networks
CN108926341A (en) * 2018-04-20 2018-12-04 平安科技(深圳)有限公司 Detection method, device, computer equipment and the storage medium of ECG signal
CN112055878B (en) * 2018-04-30 2024-04-02 皇家飞利浦有限公司 Adjusting a machine learning model based on the second set of training data
CN108710974B (en) * 2018-05-18 2020-09-11 中国农业大学 Water ammonia nitrogen prediction method and device based on deep belief network
CN109033936A (en) * 2018-06-01 2018-12-18 齐鲁工业大学 A kind of cervical exfoliated cell core image-recognizing method
CN108805204B (en) * 2018-06-12 2021-12-03 东北大学 Electric energy quality disturbance analysis device based on deep neural network and use method thereof
CN109060892B (en) * 2018-06-26 2020-12-25 西安交通大学 SF based on graphene composite material sensor array6Method for detecting decomposition product
CN109106384B (en) * 2018-07-24 2021-12-24 安庆师范大学 Psychological stress condition prediction method and system
CN109308471B (en) * 2018-09-29 2022-07-15 河海大学常州校区 Electromyographic signal feature extraction method
CN109394205B (en) * 2018-09-30 2022-06-17 安徽心之声医疗科技有限公司 Electrocardiosignal analysis method based on deep neural network
CN109492751A (en) * 2018-11-02 2019-03-19 重庆邮电大学 Network safety situation element securing mechanism based on BN-DBN
CN109602414B (en) * 2018-11-12 2022-01-28 安徽心之声医疗科技有限公司 Multi-view-angle conversion electrocardiosignal data enhancement method
CN109602415B (en) * 2018-11-12 2022-02-18 安徽心之声医疗科技有限公司 Electrocardio equipment lead inversion identification method based on machine learning
CN109222963A (en) * 2018-11-21 2019-01-18 燕山大学 A kind of anomalous ecg method for identifying and classifying based on convolutional neural networks
CN109787926A (en) * 2018-12-24 2019-05-21 合肥工业大学 A kind of digital signal modulation mode recognition methods
CN110045335A (en) * 2019-03-01 2019-07-23 合肥工业大学 Based on the Radar Target Track recognition methods and device for generating confrontation network
CN109998525B (en) * 2019-04-03 2022-05-20 哈尔滨理工大学 Arrhythmia automatic classification method based on discriminant deep belief network
CN110378286B (en) * 2019-07-19 2023-03-28 东北大学 DBN-ELM-based electric energy quality disturbance signal classification method
CN110766099A (en) * 2019-11-08 2020-02-07 哈尔滨理工大学 Electrocardio classification method combining discriminant deep belief network and active learning
CN112949671B (en) * 2019-12-11 2023-06-30 中国科学院声学研究所 Signal classification method and system based on unsupervised feature optimization
CN110782025B (en) * 2019-12-31 2020-04-14 长沙荣业智能制造有限公司 Rice processing online process detection method
CN111488968A (en) * 2020-03-03 2020-08-04 国网天津市电力公司电力科学研究院 Method and system for extracting comprehensive energy metering data features
CN112438733A (en) * 2020-11-06 2021-03-05 南京大学 Portable neonatal convulsion electroencephalogram monitoring system
CN112347984A (en) * 2020-11-27 2021-02-09 安徽大学 Olfactory stimulus-based EEG (electroencephalogram) acquisition and emotion recognition method and system
CN112508088A (en) * 2020-12-03 2021-03-16 重庆邮智机器人研究院有限公司 DEDBN-ELM-based electroencephalogram emotion recognition method
CN113017568A (en) * 2021-03-03 2021-06-25 中国人民解放军海军军医大学 Method and system for predicting physiological changes and death risks of severely wounded patients
CN113141375A (en) * 2021-05-08 2021-07-20 国网新疆电力有限公司喀什供电公司 Network security monitoring method and device, storage medium and server
CN113554131B (en) * 2021-09-22 2021-12-03 四川大学华西医院 Medical image processing and analyzing method, computer device, system and storage medium
CN115105079B (en) * 2022-07-26 2022-12-09 杭州罗莱迪思科技股份有限公司 Electroencephalogram emotion recognition method based on self-attention mechanism and application thereof
CN115722797A (en) * 2022-11-03 2023-03-03 深圳市微谱感知智能科技有限公司 Laser welding signal analysis method based on machine learning
CN116662742B (en) * 2023-06-28 2024-07-12 北京理工大学 Brain electrolysis code method based on hidden Markov model and mask empirical mode decomposition

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9418334B2 (en) * 2012-12-06 2016-08-16 Nuance Communications, Inc. Hybrid pre-training of deep belief networks
CN105654046B (en) * 2015-12-29 2019-01-18 中国科学院深圳先进技术研究院 Electrocardiosignal personal identification method and device
CN106096616A (en) * 2016-06-08 2016-11-09 四川大学华西医院 Magnetic resonance image feature extraction and classification method based on deep learning
CN106214145B (en) * 2016-07-20 2019-12-10 杨一平 Electrocardiogram classification method based on deep learning algorithm
CN106503654A (en) * 2016-10-24 2017-03-15 中国地质大学(武汉) A kind of face emotion identification method based on the sparse autoencoder network of depth
CN106778685A (en) * 2017-01-12 2017-05-31 司马大大(北京)智能系统有限公司 Electrocardiogram image-recognizing method, device and service terminal

Also Published As

Publication number Publication date
CN107256393A (en) 2017-10-17

Similar Documents

Publication Publication Date Title
CN107256393B (en) Feature extraction and state recognition of one-dimensional physiological signals based on deep learning
CN110610168B (en) Electroencephalogram emotion recognition method based on attention mechanism
Jiang et al. Recognition of epileptic EEG signals using a novel multiview TSK fuzzy system
CN112667080B (en) Intelligent control method for electroencephalogram signal unmanned platform based on deep convolution countermeasure network
Bozhkov et al. Reservoir computing for emotion valence discrimination from EEG signals
Sapkota et al. Deep convolutional hashing for low-dimensional binary embedding of histopathological images
Harada et al. Biosignal generation and latent variable analysis with recurrent generative adversarial networks
Tang et al. Hidden-layer visible deep stacking network optimized by PSO for motor imagery EEG recognition
Hayashi et al. A recurrent probabilistic neural network with dimensionality reduction based on time-series discriminant component analysis
CN106955112A (en) Brain wave Emotion recognition method based on Quantum wavelet neural networks model
Bagherzadeh et al. Emotion recognition from physiological signals using parallel stacked autoencoders
Padfield et al. Sparse learning of band power features with genetic channel selection for effective classification of EEG signals
Jinliang et al. EEG emotion recognition based on granger causality and capsnet neural network
Ganesh et al. Gated deep reinforcement learning with red deer optimization for medical image classification
Tan et al. Style interleaved learning for generalizable person re-identification
Jayalskshmi et al. Impact of preprocessing for diagnosis of diabetes mellitus using artificial neural networks
CN110299194B (en) Similar case recommendation method based on comprehensive feature representation and improved wide-depth model
Musa et al. Predicting autism spectrum disorder (ASD) for toddlers and children using data mining techniques
Deiss et al. HAMLET: interpretable human and machine co-learning technique
Zhang et al. Human movements classification using multi-channel surface EMG signals and deep learning technique
Ahmed et al. Deep convolutional neural network ensembles using ECOC
CN111368686B (en) Electroencephalogram emotion classification method based on deep learning
CN112259228A (en) Depression screening method by dynamic attention network non-negative matrix factorization
Zhang et al. Epileptic EEG signals recognition using a deep view-reduction tsk fuzzy system with high interpretability
Pandian et al. Effect of data preprocessing in the detection of epilepsy using machine learning techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant