CN116089874A

CN116089874A - Emotion recognition method and device based on ensemble learning and migration learning

Info

Publication number: CN116089874A
Application number: CN202310020744.9A
Authority: CN
Inventors: 伍冬睿; 邓凌飞
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-05-09

Abstract

The invention discloses an emotion recognition method and device based on ensemble learning and transfer learning, and belongs to the field of brain-computer interfaces for emotion calculation. Firstly, based on a deep migration learning algorithm, pre-training an emotion recognition model by using source domain labeled data, then predicting target domain unlabeled data by using the pre-trained emotion recognition model, and selecting a plurality of prediction results with high confidence as pseudo labels to obtain target domain labeled data. Then, the feature diversity is increased by performing random nonlinear mapping on the features. Then, constructing a first data sample set by using the label data in the source domain and the pseudo label data in the target domain, and training a base classifier; and constructing a second data sample set by using the pseudo tag data of the target domain and the rest tag data of the target domain, and training another base classifier. And (3) performing loop iteration, and finally summing the results of all the base classifiers to obtain final output so as to realize identification of the label-free data of the target domain.

Description

Emotion recognition method and device based on ensemble learning and migration learning

Technical Field

The invention belongs to the field of brain-computer interfaces for emotion calculation, and particularly relates to an emotion recognition method and device based on ensemble learning and transfer learning.

Background

Emotion recognition is an important component in man-machine interaction systems. The emotion brain-computer interface is one of emotion recognition modes, and analyzes the category of the tested emotion corresponding to the current signal by collecting the scalp brain electrical signal of the tested. To reduce the annotation data required to train the emotion recognition model (referred to as reducing the calibration data being tested), the training model may be aided by annotation data that has been acquired by other users. The electroencephalogram signals of different users are often greatly different, so that other users and tested data do not accord with the assumption of independent same distribution, models directly trained on other users have poor performance on new tested, and the problem can be solved by transfer learning. Transfer learning aims at learning knowledge of auxiliary data (source domain) and transferring to target data (target domain) so that a high performance model is trained with little or even zero annotation data on the target domain.

The transfer learning method has wide application in emotion brain-computer interfaces and can be mainly divided into two types:

1) Traditional transfer learning algorithm: manual features are extracted from the electroencephalogram signals and trained by using a traditional transfer learning algorithm. The differential entropy characteristics (Differential Entropy, DE), namely the average energy of the electroencephalogram signals in a specific frequency band, are proposed by the prior scholars and are widely used in the field of emotion brain-computer interfaces; migration learning methods such as migration component analysis (Transfer component analysis, TCA), nuclear component analysis (Kernel principal component analysis, KPCA), direct push type parameter migration (Transductive parameter transfer, TPT) and the like are used for emotion brain-computer interface tasks. They also propose an individual similarity migration learning framework that measures individual differences using maximum mean differences (Maximum mean discrepancy, MMD).

2) Deep migration learning algorithm: and (3) directly extracting features from the original electroencephalogram signals and completing classification based on a transfer learning algorithm of a convolutional neural network. The prior scholars use deep migration learning methods such as domain countermeasure network (Domain adaptation neural network, DANN), joint domain adaptation network (Joint adaptation network, JAN), conditional domain countermeasure network (Conditional domain adversarial network, CDAN) and the like on emotion brain-computer interface tasks, and try to combine with eye movement signal model by using a multi-model fusion strategy to improve performance.

However, in the existing emotion recognition algorithm based on transfer learning only, mapping of features is completed to align different distributions from the viewpoint of feature extraction, and influence on classification accuracy after feature mapping is ignored, so that accuracy still has room for improvement.

Disclosure of Invention

Aiming at the defects and improvement demands of the prior art, the invention provides an emotion recognition method and device based on ensemble learning and transfer learning, aiming at improving the performance of a deep transfer learning algorithm which is proposed by an emotion brain-computer interface across tested tasks by using ensemble learning. The invention aims at an unsupervised transfer learning scene: the target tested (target domain) has acquired a large amount of unlabeled electroencephalogram data, and a model needs to be trained by means of the data of other tested (source domain) which are acquired signals and are labeled, so that the highest accuracy possible can be obtained on the unlabeled data of the target domain.

In order to achieve the above object, in a first aspect, the present invention provides an emotion recognition method based on ensemble learning and transfer learning, including:

s1, pre-training an emotion recognition model by using source domain tagged data, predicting target domain untagged data by using the pre-trained emotion recognition model, and selecting a plurality of prediction results with high confidence as pseudo tags to obtain target domain tagged data; the emotion recognition model comprises a feature extractor and a classifier, wherein the feature extractor is used for extracting features from original electroencephalogram signals, and the classifier is used for recognizing emotion categories;

s2, carrying out random nonlinear mapping on the data features extracted in the S1;

s3, constructing a first data sample set by using the source domain labeled data and the target domain pseudo-labeled data mapped by the S2, and training a base classifier; if the sample belongs to the source domain, the label data exists, and the prediction result is wrong, the weight of the sample is set to zero;

s4, constructing a second data sample set by using the target domain pseudo-tag data mapped by the S2 and the rest tag data of the target domain, and training another base classifier in a mode that the output of the model is consistent before and after the constraint sample is added into random disturbance; the labels of the rest label data of the target domain are obtained through calculation of a base classifier trained by the S3;

s5, repeating the steps S2 to S4 until the iteration stop condition is reached, and summing the results of all the base classifiers to obtain final output so as to realize identification of the target domain label-free data.

Further, in S2, the random nonlinear mapping process is expressed as:

h _k (z)＝δ[ZS(z ^τ M _k ，μ，σ)]，

ZS[z ^τ M _k ，μ，σ)＝(z ^τ M _k -μ)·/σ，

wherein h is _k (Z) represents the mapped feature, δ represents any nonlinear mapping activation function, and z= [ Z ] ₁ ，z ₂ ，…，z _n ]A data matrix representing the composition of all samples, n being the number of samples, z ^τ Represents the transpose of z, M _k Represents a randomly generated matrix, μ and σ represent Z respectively ^τ M _k The point number represents the element-to-element operation.

Further, in the step S3, at the kth cycle, the loss function of the 2k-1 th basis classifier is expressed as:

wherein N is ^T And N ^S Respectively representing the sample number of the target domain with pseudo tag data and the source domain with tag data; l represents cross entropy loss, x _i And y _i Features and labels representing the ith sample, respectively; f (f) _2k-1 Represents the 2k-1 basis classifier, F _2k-2 Representing the sum of all already trained base classifiers prior to the current step.

Further, in the step S3, when solving the loss function of the 2k-1 th basis classifier, the multi-classification loss is converted into a sum of a series of classification losses by using Taylor expansion.

Further, in the step S4, a disturbance is added to the characteristics of the rest of tag data in the target domain in the second data sample set, where the disturbance is composed of randomly generated gaussian noise; and training the other base classifier by restraining the output consistency of the two groups of data before and after disturbance through the model.

In order to achieve the above object, in a second aspect, the present invention provides an emotion recognition device based on ensemble learning and transfer learning, including:

the data processing unit is used for pre-training the emotion recognition model by using the source domain labeled data, predicting the target domain unlabeled data by using the pre-trained emotion recognition model, and selecting a plurality of prediction results with high confidence as pseudo labels to obtain target domain labeled data; the emotion recognition model comprises a feature extractor and a classifier, wherein the feature extractor is used for extracting features from original electroencephalogram signals, and the classifier is used for recognizing emotion categories;

the feature mapping unit is used for carrying out random nonlinear mapping on the extracted data features;

the first training unit is used for constructing a first data sample set by using the mapped source domain labeled data and the target domain pseudo-labeled data and training a base classifier; if the sample belongs to the source domain, the label data exists, and the prediction result is wrong, the weight of the sample is set to zero;

the second training unit is used for constructing a second data sample set by using the mapped target domain pseudo-tag data and the target domain rest tag data, and training another base classifier in a mode that the output of the model is consistent before and after the constraint sample is added into random disturbance; the labels of the rest label data of the target domain are obtained through calculation by the base classifier trained by the first training unit;

and the emotion recognition unit is used for repeating the operations of the feature mapping unit, the first training unit and the second training unit until the iteration stopping condition is reached, and summing the results of all the base classifiers to obtain final output so as to realize the recognition of the target domain label-free data.

To achieve the above object, in a third aspect, the present invention provides an electronic device, comprising: a processor; a memory storing a computer executable program which, when executed by the processor, causes the processor to perform the emotion recognition method based on ensemble learning and transfer learning as described in the first aspect.

In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:

the method comprises the steps of firstly, based on a deep migration learning algorithm, pre-training an emotion recognition model by using source domain tagged data, then predicting target domain untagged data by using the pre-trained emotion recognition model, and selecting a plurality of prediction results with high confidence as pseudo tags to obtain target domain tagged data. And secondly, carrying out random nonlinear mapping on the features to increase feature diversity. Thirdly, constructing a first data sample set by using the source domain labeled data and the target domain pseudo-labeled data, and training a base classifier; this is a supervised transfer learning task because all data is annotated. Fourthly, constructing a second data sample set by using the pseudo tag data of the target domain and the rest tag data of the target domain, and training another base classifier; because the data all come from the target domain, the data accords with independent homodisperse assumption, and is a common semi-supervised learning task. Repeating the second to fourth steps, and continuously training to obtain a new base classifier, wherein the updated weight of each step of sample is taken as the weight of the sample in the next iteration to calculate the loss. And finally, summing the results of all the base classifiers to obtain final output so as to realize the identification of the label-free data of the target domain. Therefore, the invention combines the integrated learning on the basis of the transfer learning, and can further improve the accuracy of emotion recognition.

Drawings

FIG. 1 is a flowchart of an emotion recognition method based on ensemble learning and transfer learning according to an embodiment of the present invention;

FIG. 2 is a second flowchart of an emotion recognition method based on ensemble learning and transfer learning according to an embodiment of the present invention;

fig. 3 is a third flowchart of an emotion recognition method based on ensemble learning and transfer learning according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

In the present invention, the terms "first," "second," and the like in the description and in the drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

Ensemble learning is to train and fuse multiple models to improve the performance of the underlying model. The ensemble learning is mainly divided into two strategies of Bagging and Boosting. The Bagging strategy is to select a part of data training models from the data set each time, repeat the training models for a plurality of times, and integrate the obtained models. The Bagging strategy can improve the stability of the basic model (reduce the variance of the results of multiple experiments). The Boosting strategy is to train a basic model by using an initial data set, adjust weights of different samples in the data set according to the accuracy of model classification, obtain greater weights of the samples which are misplaced so as to be more valued by the model, repeat the process for a plurality of times, and integrate a plurality of obtained basic models. Logitoost is an improvement of Boosting, and the problem of over-fitting can be effectively solved by using the gradient of the loss of the training base classifier as the basis for updating the sample weight, so that the generalization performance of the model is improved. Boosting can improve the accuracy of the base model. At present, no algorithm thought combining ensemble learning and migration learning in emotion recognition exists.

Based on the above, the invention can obtain better performance by fine-tuning the existing deep transfer learning algorithm through ensemble learning, and the name is SS-TrBoosting, which is detailed in the following specific embodiments.

Example 1

Referring to fig. 1, in combination with fig. 2 and 3, the present invention provides an emotion recognition method based on ensemble learning and transfer learning, the method including operations S1 to S5.

The method comprises the following steps of S1, pre-training an emotion recognition model by using source domain tagged data, predicting target domain untagged data by using the pre-trained emotion recognition model, and selecting a plurality of prediction results with high confidence as pseudo tags to obtain target domain tagged data; the emotion recognition model comprises a feature extractor and a classifier, wherein the feature extractor is used for extracting features from the original electroencephalogram signals, and the classifier is used for recognizing emotion categories.

In this embodiment, emotion recognition may be understood, for example, that a person is watching a video, and the brain signal thereof is recorded, and the label corresponding to the brain signal may be happy, wounded, or non-emotional.

As shown in fig. 2, the "deep unsupervised/semi-supervised transfer learning algorithm" section shows a certain existing deep transfer learning algorithm, which includes two sections: the feature extraction layer (Feature Extractor) formed by the convolutional neural network and the Classifier (Classifier) formed by the single-layer fully-connected neural network are used as the input of the Classifier and the input of the algorithm of the invention, and the Classifier is used as the initial Classifier of the ensemble learning.

Taking one sample as an example after pre-training, the feature extracted by the feature extractor from the original electroencephalogram signal is z, and the model output is f (z) = [ a ] ₁ ，a ₂ ，…，a _c ]，a _c Referring to the probability that the c-th class is predicted, the final predicted value of the model is the class with the highest predicted probability, i.e., arg max (a _i ) i=1, 2, …, c. The prediction category and the probability of the category being predicted after all samples are subjected to a pre-training model are recorded as follows

Wherein c _i Indicating the type of sample i predicted, +.>

Indicating that the ith sample is predicted as c _i The probability of a class, n, is the number of samples, then they are ordered, the first 10% of samples are taken to assume their predicted class as their true class, so that 10% of the target tested data are labeled (hereinafter referred to as target domain pseudo-tag data). The output of this step is the features of all samples and the pre-trained classifier f.

And S2, carrying out random nonlinear mapping on the data features extracted in the step S1.

In this embodiment, as shown in FIG. 3, we set the features and labels of one sample of the source domain as (z ^S ，y ^S ) The sample of the target domain with pseudo tag is (z ^T ，y ^T ) The unlabeled sample of the target domain is characterized by z ^U . The three groups of samples are respectively mapped randomly and nonlinearly, and the kth cycle process is as follows:

h _k (z)＝δ[ZS(z ^τ M _k ，μ，σ)]，

ZS(z ^τ M _k ，μ，σ)＝(z ^τ M _k -μ)./σ，

wherein h is _k (z) represents the mapped feature, delta represents any nonlinear mapping activation function, such as sigmoid function; m is M _k Is a randomly generated matrix, μ and σ are Z ^τ M _k And standard deviation of z= [ Z ] ₁ ，z ₂ ，…，z _n ]Data matrix representing all sample compositions, z ^τ The transpose of z, point number, element-to-element operation. Finally we let a sample mapped feature be x, i.e. x=h _k (z)。

S3, constructing a first data sample set by using the source domain labeled data and the target domain pseudo-labeled data mapped by the S2, and training a base classifier; if the sample belongs to the source domain and has label data and the prediction result is wrong, the weight of the sample is set to zero.

As shown in the "supervised transfer learning" module of fig. 3, we choose the source domain labeled data and the target domain labeled pseudo-labeled data obtained in S2 to form a data set:

wherein N is ^T And N ^S Respectively representing the sample number of the pseudo tag data in the target domain and the tag data in the source domain, x ^T And x ^S Indicating that the sample belongs to the target domain and the source domain, respectively.

Because all data is labeled, this is a supervised transfer learning task, and the present invention improves the LogitBoost algorithm to train the base classifier and update the weights of the individual samples based on training loss. At the kth cycle, we optimized the following penalty to train the 2k-1 th basis classifier:

wherein f _2k-1 Representing a newly trained basis classifier, l represents cross entropy loss, f _2k-2 Representing the sum of all trained base classifiers before the current step, if the current cycle is the first (k=1), F is the initial classifier F output in S1; if the current is the kth cycle, i.e

Through Taylor expansion, subtraction, addition of the loss formula, and transformation of multi-class loss into a series of sum of class-two loss (C class problem is transformed into C class-two, each class-two problem judges whether the sample belongs to the j-th class, j=1, 2, …, C), the following formula is adopted:

wherein,,

class j on-class classification problem calculation loss representing sample iWeights of (2);

p ^j (x _i ) Represents the ith sample is F _2k-2 Model predicts the probability of the j-th class, as sample x _i For source domain samples (i > N ^T ) And when the model is mispredicted, i.e. argmax [ F _2k-2 (x _i )]≠y _i We consider that this source domain sample is too different from the target domain, and then set its weight to zero, i.e. let

Training the loss to obtain a base classifier f _2k-1 The total model integrated at this time is denoted as F _2k-1 ＝F _2k-2 +f _2k-1 And the updated sample weight is +.>

S4, constructing a second data sample set by using the target domain pseudo-tag data mapped by the S2 and the rest tag data of the target domain, and training another base classifier in a mode that the output of the model is consistent before and after the constraint sample is added into random disturbance; the labels of the rest label data of the target domain are obtained through calculation of a base classifier trained by the S3.

As shown in the "semi-supervised learning" module of FIG. 3, taking the kth iteration as an example, the model F trained in S3 is first used _2k-1 Calculating pseudo tags of all target domains without marked data:

we select the targets obtained in S2Domain with pseudo tag data (the pseudo tag is calculated by the deep transfer learning model pre-trained in S1 and is not truly labeled) and the rest of the tag data of the target domain (using F) _2k-1 Pseudo tag obtained by model calculation is used as label) to form a data set, and a disturbance is added to the characteristics of the rest tag data of the target domain, wherein the disturbance consists of random Gaussian noise, and the specific expression is as follows:

wherein N is ^T And N ^U Respectively representing the number of samples of the pseudo tag data of the target domain and the rest tag data of the target domain;

representing disturbance epsilon _i Obeys a gaussian distribution with a mean value of 0 and a standard deviation of Σ.

Because the data all come from the target domain, the data accords with independent homodisperse assumption, and is a common semi-supervised learning task. Aiming at the rest label data of the target domain, the invention adds disturbance to the target domain, uses the output of the model before disturbance as the pseudo label of the sample after disturbance, and trains a new base classifier according to the Logitboost algorithm in the S3; for the target domain with pseudo tag data, cross entropy loss is used. The loss is then calculated as in S3 using the current dataset, with the difference from S3 that the last thing to be done is removed

And (3) setting zero. Finally obtaining the trained 2 k-th basis classifier f _2k At the same time, the integrated model is updated to F _2k ＝F _2k-1 +f _2k The sample weights and the calculation process in S3 are also the same (no zeroing operation).

And S5, repeating the steps S2 to S4 until the iteration stop condition is reached, and summing the results of all the base classifiers to obtain final output so as to realize the identification of the target domain label-free data.

In this embodiment, operations S2-S4 are cycled K times, and S3 and S4 are alternately generated during the cycle, so that the kth cycle generates the 2K-1 and 2K basis classifiers, respectively. Continuously training to obtain a new base classifier, wherein the updated weight of each step of sample is used as the weight of the sample in the next iteration to calculate the loss; and finally, summing the results of all the base classifiers to obtain a final output.

Example two

An emotion recognition device based on ensemble learning and transfer learning, comprising:

The related technical solution is the same as the first embodiment, and will not be described herein.

Example III

An electronic device, comprising: a processor; a memory storing a computer executable program that, when executed by the processor, causes the processor to perform the ensemble learning and transfer learning based emotion recognition method as described in embodiment one.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. An emotion recognition method based on ensemble learning and transfer learning is characterized by comprising the following steps:

2. The emotion recognition method based on ensemble learning and transfer learning as set forth in claim 1, wherein in S2, the process of random nonlinear mapping is expressed as:

h _l (z)＝δ[ZS(z ^τ M _k ，μ，σ)]，

zS(z ^τ M _k ，μ，σ)＝(z ^τ M _k -μ)./σ，

3. The emotion recognition method based on ensemble learning and transfer learning as set forth in claim 1, wherein in S3, at the kth cycle, a loss function of the 2k-1 th basis classifier is expressed as:

wherein N is ^T And N ^S Respectively representing the sample number of the target domain with pseudo tag data and the source domain with tag data; l represents cross entropy loss, x _i And y _i Features and labels representing the ith sample, respectively; f (f) _2k-1 Represents the first2k-1 basis classifier, F _2k-2 Representing the sum of all already trained base classifiers prior to the current step.

4. The emotion recognition method based on ensemble learning and transfer learning as set forth in claim 3, wherein in S3, when solving the loss function of the 2k-1 th basis classifier, the multi-class loss is converted into a sum of a series of class-two losses by using taylor expansion.

5. The emotion recognition method based on ensemble learning and transfer learning according to claim 1, wherein in S4, a disturbance is added to the characteristics of the rest of tag data in the target domain in the second data sample set, and the disturbance is composed of gaussian noise generated randomly; and training the other base classifier by restraining the output consistency of the two groups of data before and after disturbance through the model.

6. An emotion recognition device based on ensemble learning and transfer learning, comprising:

7. An electronic device, comprising:

a processor;

a memory storing a computer executable program that, when executed by the processor, causes the processor to perform the ensemble learning and transfer learning based emotion recognition method of any one of claims 1-5.