CN106096642B - Multi-mode emotional feature fusion method based on identification of local preserving projection - Google Patents

Multi-mode emotional feature fusion method based on identification of local preserving projection Download PDF

Info

Publication number
CN106096642B
CN106096642B CN201610397708.4A CN201610397708A CN106096642B CN 106096642 B CN106096642 B CN 106096642B CN 201610397708 A CN201610397708 A CN 201610397708A CN 106096642 B CN106096642 B CN 106096642B
Authority
CN
China
Prior art keywords
emotion
matrix
mode
equal
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610397708.4A
Other languages
Chinese (zh)
Other versions
CN106096642A (en
Inventor
徐嵚嵛
卢官明
闫静杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201610397708.4A priority Critical patent/CN106096642B/en
Publication of CN106096642A publication Critical patent/CN106096642A/en
Application granted granted Critical
Publication of CN106096642B publication Critical patent/CN106096642B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Abstract

The invention discloses a multi-mode emotional feature fusion method based on identification local preserving projection, which comprises the steps of firstly extracting emotional features, such as voice features, expression features, posture features and the like, from sample data of each mode in a multi-mode emotional database, then mapping the emotional features of various modes into a uniform identification subspace by adopting the identification local preserving projection method, and finally performing series fusion on a plurality of groups of mapped features to obtain the fused multi-mode emotional features. The classifier taking the fused multi-modal emotion characteristics as input can effectively identify basic emotions such as anger, counter-emotion, fear, happiness, sadness, surprise and the like, and provides a new method and a new way for developing a human emotion classification identification system and realizing human-computer interaction.

Description

Multi-mode emotional feature fusion method based on identification of local preserving projection
Technical Field
The invention belongs to the field of image processing and pattern recognition, relates to a feature fusion method applied to multi-mode emotion recognition, and particularly relates to a multi-mode emotion feature fusion method based on identification of local preserving projection.
Background
Emotional expression has been the most prominent way for humans to communicate and understand each other. As Computer technology has been developed, Human Computer Interaction (HCI) has become more and more valuable for research and practical purposes, and how computers recognize Human emotions has become important. With the continuous development of information technology, emotional information expressed by human beings, whether in a laboratory or in real life, is easily obtained by various sensors. Where images and speech are the most readily available affective information and are also the most important information for emotion recognition.
It is a complicated problem that a computer can recognize which emotions, which are expressed by people in real life, often have only slight differences which are difficult to distinguish by humans, so that the computer can recognize only basic emotions such as anger, counter-sensation, fear, happiness, sadness, surprise and the like at present. However, emotion technologies for recognizing these basic emotions have been widely used, for example, in education, medical treatment, human-computer interaction, video entertainment, and the like.
Over the past decades, there have been many single-modality based emotion recognitions, most commonly facial expression emotion recognition, speech emotion recognition and gesture-based emotion recognition, however, single-modality emotion recognition has a major limitation because the emotion information expressed by a person is a multi-modal emotion information, for example, a person expresses anger, and his voice, facial expression, body posture, heart rate and body temperature, etc. are greatly different from those in a normal state. If only one modal emotional feature is used for identification, better results are not obtained, especially in a real environment. Research results show that compared with single-mode emotion recognition, multi-mode emotion recognition is more reliable and accurate. The multi-modal emotion recognition considers various emotion information expressed by a person, comprehensively measures the emotion expressed by the person, and has robustness to interference of different conditions (for example, image information of a face may have different illumination, angles and other problems) in actual life.
For multi-modal emotion recognition, feature fusion is the most important ring, and different emotion features obtained by different sensors are fused to obtain fused features which are sent to a classifier for recognition. Common feature fusion methods are mainly classified into three categories: and obtaining a hierarchical fusion method, a characteristic layer fusion method and a decision layer fusion method. At present, in order to facilitate real-time performance, the three methods need to maintain enough important information and realize information compression, and inevitably have information loss, so that the identification accuracy is reduced. The feature layer fusion method has wide application in the fields of voice and images. At present, the research on multi-modal emotion recognition is far from perfection and enrichment of single-modal emotion recognition.
In the prior art, the invention patent with the publication number of CN105138991A and the name of 'a video emotion recognition method based on emotion significant feature fusion' discloses a video emotion recognition method based on emotion significant feature fusion, and the defects of the method are as follows: only image features and voice features in the video can be subjected to feature fusion, the expandability is poor, and features of other more modes cannot be subjected to feature fusion; the extracted image and voice features are not direct emotional features but are represented by color emotional intensity values and audio emotional dictionaries; the fusion algorithm is too simple, and the emotion characteristics after the fusion algorithm is simply weighted are poorer in distinguishability.
Disclosure of Invention
The invention aims to solve the technical problems that the fused emotion feature discrimination is poor in the multi-modal emotion recognition feature fusion method and the existing single-modal emotion recognition technology cannot obtain a more accurate recognition result.
In order to solve the problems, the invention provides a multi-mode emotion feature fusion method based on identification local preserving projection aiming at the requirements of an automatic human emotion assessment system and a human-computer interaction system, and provides a more accurate and reliable way for human-computer interaction. The specific technical scheme is as follows:
the multi-mode emotion feature fusion method based on identification of local preserving projection comprises the following steps:
A. firstly, extracting emotional characteristics from sample data of each mode in a multi-mode emotional database, then carrying out dimension reduction processing on emotional characteristic vectors of various modes, and using d for a sample of a jth modejDimensional feature vector xijrTo indicate that is
Figure BDA0001011349360000021
Wherein j is more than or equal to 1 and less than or equal to m, m is the number of modes, i is more than or equal to 1 and less than or equal to c, c is the number of emotion categories, r is more than or equal to 1 and less than or equal to nij,nijNumber of samples belonging to the ith emotion, jth mode, xijrRepresenting a characteristic vector of an r sample belonging to an ith emotion and a jth mode;
B. identifying local preserving projection on the feature vectors of different modes after dimension reduction to obtain the optimal projection direction alpha;
C. mapping the feature vectors of different modes, Yj=αTXj,XjIs c XijThe constituent matrices, i.e. Xj=[X1j,...,Xij,...,Xcj]T
D. And connecting the mapped features in series to obtain a fusion feature:
Z=[αTX1,...,αTXj,...,αTXm]T
further, step (ii)B, after dimension reduction, identifying local preserving projection is carried out to solve the optimal projection matrix alpha and obtain the emotional characteristic vectors x of various modesijrMapping to a uniform identification subspace to obtain a mapped feature vector yijrThe method comprises the following specific steps:
b1: defining a matrix of dispersion within a class
Figure BDA0001011349360000038
Figure BDA0001011349360000031
Wherein, yiklRepresenting the feature vector after mapping of the ith sample from the ith emotion and the kth mode, wherein k is more than or equal to 1 and less than or equal to m, and WrlWeights are maintained for local between feature vectors from the same emotion and modality;
b2: defining inter-class dispersion matrices
Figure BDA0001011349360000032
Figure BDA0001011349360000033
Wherein B isihFor local preservation of weights, μ, between feature vector means from the same modalityiAnd (3) the mean value of the feature vector after mapping for the ith sample:
Figure BDA0001011349360000034
wherein n isiIs the number of samples in class i,. mu.hThe characteristic vector mean value of the h type sample is obtained;
b3: maximizing the inter-class dispersion matrix and minimizing the intra-class dispersion matrix, this goal can be expressed as the following optimization problem:
Figure BDA0001011349360000035
wherein Tr (-) is the trace of the matrix.
Further, a matrix of dispersion within the defined class
Figure BDA0001011349360000036
In step B1, a local retention weight matrix W between the feature vectorsrlSpecifically, the following are defined:
defining feature vectors x from the same emotion and modalityijrAnd xijlLocal hold weight matrix in between
Figure BDA0001011349360000037
Figure BDA0001011349360000041
Wherein x isijlRepresenting the feature vector of the ith sample from the ith emotion and the jth mode, 1 ≦ l ≦ nijThe parameter t may be set empirically, regardless of the weights between feature vectors from different emotions or modalities.
Further, defining inter-class dispersion matrices
Figure BDA0001011349360000042
In step B2, a local retention weight matrix B between the feature vector meansihThe specific steps and definitions are as follows:
firstly, calculating the characteristic vector mean value of the ith emotion and the jth mode
Figure BDA0001011349360000043
Figure BDA0001011349360000044
Wherein
Figure BDA0001011349360000045
The superscript (x) represents the original sample space, and the mean of the feature vector from the h-type emotion and the j-type mode is calculated
Figure BDA0001011349360000046
Figure BDA0001011349360000047
Wherein n ishjNumber of samples belonging to the h-emotion, j-mode, xhjrRepresenting the characteristic vector of the r sample belonging to the h type emotion and the j type mode, wherein h is more than or equal to 1 and is less than or equal to c;
defining mean values of feature vectors from the same modality
Figure BDA0001011349360000048
And
Figure BDA0001011349360000049
local hold weight matrix in between
Figure BDA00010113493600000410
Figure BDA00010113493600000411
The parameter t can also be set empirically, without considering the weights between them for the mean of the feature vectors from different modalities.
Further, in step B3, the optimization problem maximizes the inter-class dispersion matrix and minimizes the intra-class dispersion to obtain the maximum projection direction
Figure BDA00010113493600000412
The method comprises the following specific steps:
b3.1: transforming the optimization problem in B3 to obtain the following optimization problem:
Figure BDA0001011349360000051
the part of the son-mother in the optimized expression is an in-class dispersion matrix:
Figure BDA0001011349360000052
wherein the matrix
Figure BDA0001011349360000053
Can be expressed as:
Figure BDA0001011349360000054
wherein muik (x)Is the mean value of the feature vector from the ith emotion, the kth modeikNumber of samples from the ith emotion, kth modality, XijIs nijA feature vector xijrFormed feature matrix, L ═ mDrr-Wrl,DrrIs a diagonal matrix whose value is the row or column sum of a weight matrix W of the inter-sample eigenvectors (W being a symmetric matrix), i.e. a
Figure BDA0001011349360000055
The sub-divisions in the optimized formula are inter-class dispersion matrices:
Figure BDA0001011349360000056
wherein the matrix
Figure BDA0001011349360000057
Can be expressed as:
Figure BDA0001011349360000058
wherein
Figure BDA0001011349360000059
Is c mean vectors
Figure BDA00010113493600000510
A composed matrix, EjjIs a local holding weight B of the meanihAnd row or column of, i.e.
Figure BDA00010113493600000511
B3.2: because the optimization problem in B3.1 does not have a closed-form solution, the ratio of traces needs to be converted into the trace of the ratio, and the following optimization problem is finally obtained:
Figure BDA0001011349360000061
the optimal projection matrix is obtained by solving the above formula through a generalized eigenvalue decomposition method
Figure BDA0001011349360000062
Compared with the prior art, the invention has the advantages that:
(1) compared with single-mode emotional characteristics, the emotional characteristics adopting multi-mode fusion in the emotional recognition problem have higher accuracy and objectivity and better robustness in real situations.
(2) The multimode emotional characteristic fusion method based on the identification of the local preserving projection not only considers the dispersion between classes, but also considers the dispersion in the classes, has better discrimination on samples of different classes, and the introduced local preserving projection can be well adapted to the nonlinear condition. Finally, obtaining the multi-mode emotion fusion characteristics more suitable for emotion recognition.
The invention introduces a multi-modal emotion characteristic fusion method based on identification local preserving projection, applies the multi-modal emotion characteristic fusion method to multi-modal expression classification and identification work, can effectively identify six expressions such as anger, counter sensation, fear, happiness, sadness, surprise and the like, and provides a new method and approach for developing a human emotion automatic evaluation system and a human-computer interaction system.
Drawings
FIG. 1 is a flow chart of the multi-modal emotion feature fusion method based on the identification of local preserving projection of the present invention.
FIG. 2 is a partial image in a bimodal emotion database.
Detailed Description
The embodiments of the present invention will now be described in further detail with reference to the accompanying drawings. The implementation of the multi-modal emotional feature fusion method based on the identification of local preserving projection, as shown in fig. 1, mainly comprises the following steps:
step 1: capturing still images and speech segments of video in a multimodal database
In the specific implementation process, an eNBACTACE bimodal database is adopted. The database contains 1260 video segments from 42 people, each with emotion tags, expressing 6 basic emotions: anger, reaction, fear, happiness, sadness, and surprise (labels 1-6, respectively), as shown in fig. 2. The video size is 720 × 576, the sampling frequency is 25fps, and the sampling frequency of the sound in the video is 48 kHz. And (4) framing the video, and taking a frame with the most abundant expression as a static picture of the video. And separating voice of each video as a voice segment corresponding to the video. Finally, each video clip corresponds to a static image and a voice. And randomly selecting 75% of the images and the corresponding voices as training samples, and using the rest 25% as test samples.
Step 2: extracting the characteristics of image and voice information, reducing dimensions, and expressing by characteristic vector
Firstly, the static image obtained in the previous step is cut, the image of the face is cut out, the size of the image is 128 × 128, then image preprocessing operations such as alignment, scale normalization and gray level equalization are performed, and finally features such as Gabor, SIFT and LBP are extracted from the image (in the embodiment, Gabor features are extracted). For speech segments, use is made ofThe specialized speech processing toolbox OpenSmile extracts various features (in this embodiment, the emobase2010 feature is extracted). Because the extracted feature vector often has the problem of overhigh dimension, the feature with proper dimension is obtained by using a PCA dimension reduction method and using djDimensional feature vectors to represent the reduced-dimension image features and speech feature vectors, i.e.
Figure BDA0001011349360000071
Wherein j is more than or equal to 1 and less than or equal to m, m is the number of modes, i is more than or equal to 1 and less than or equal to c, c is the number of emotion categories, r is more than or equal to 1 and less than or equal to nij,nijNumber of samples belonging to the ith emotion, jth mode, xijrA feature vector representing the r-th sample belonging to the i-th emotion and the j-th mode, and niIs the number of samples in class i and n is the number of all samples. In this embodiment, c is 6, m is 2, nij=210,niFor other different multimodal databases, only these parameters need to be changed, for example, for a trimodal database, m is 3.
And step 3: solving the optimal projection matrix alpha by adopting a method of identifying local preserving projection, and enabling the emotional characteristic vectors x of various modesijrMapping to a uniform identification subspace to obtain a mapped feature vector yijrThe method comprises the following specific steps:
first, feature vectors x from the same class and modality are definedijrAnd xijlLocal hold weight matrix in between
Figure BDA0001011349360000072
Figure BDA0001011349360000073
Wherein x isijlRepresenting the feature vector of the i-th sample from the i-th and j-th modes, 1 ≦ l ≦ nijThe parameter t may be obtained empirically. The weights between them are not taken into account for feature vectors from different modalities or classes. Then defining the dispersion matrix in class of each class
Figure BDA0001011349360000074
Figure BDA0001011349360000075
Wherein, yiklRepresenting the feature vector after mapping of the ith sample from the ith emotion and the kth mode, wherein k is more than or equal to 1 and less than or equal to m.
Then, the mean value of the feature vectors from the i-type emotion and the j-type mode is obtained
Figure BDA0001011349360000076
Figure BDA0001011349360000081
Wherein
Figure BDA0001011349360000082
The superscript (x) represents the original sample space, and the mean of the feature vector from the h-type emotion and the j-type mode is calculated
Figure BDA0001011349360000083
Figure BDA0001011349360000084
Wherein n ishjNumber of samples belonging to the h-emotion, j-mode, xhjrAnd the characteristic vector of the r sample belonging to the h type emotion and the j type mode is represented, and h is more than or equal to 1 and less than or equal to c. And intra-class dispersion matrix
Figure BDA0001011349360000085
Similarly, feature vector means from the same modality are defined
Figure BDA0001011349360000086
And
Figure BDA0001011349360000087
local hold weight matrix in between
Figure BDA0001011349360000088
Figure BDA0001011349360000089
The parameter t can also be set empirically, without considering the weights between them for the mean of the feature vectors from different modalities.
An inter-class divergence matrix for each class is then defined
Figure BDA00010113493600000810
Figure BDA00010113493600000811
Wherein, muiAverage of feature vectors after mapping for class i samples:
Figure BDA00010113493600000812
similarly, μhIs the average of the h-th class sample features.
Finally, in order to maximize the inter-class dispersion matrix while minimizing the intra-class dispersion matrix, the following optimization equation is obtained:
Figure BDA00010113493600000813
wherein Tr (-) is the trace of the matrix. The following optimization problem is obtained by simplification and transformation:
Figure BDA00010113493600000814
the part of the son-mother in the optimized expression is an in-class dispersion matrix:
Figure BDA0001011349360000091
wherein the matrix
Figure BDA0001011349360000092
Can be expressed as:
Figure BDA0001011349360000093
wherein muik (x)Is the mean value of the feature vector from the ith emotion, the kth modeikFor emotion from class i, number of samples in modality k, XijIs nijA feature vector xijrFormed feature matrix, L ═ mDrr-Wrl,DrrIs a diagonal matrix whose value is the row or column sum of a weight matrix W of the inter-sample eigenvectors (W being a symmetric matrix), i.e. a
Figure BDA0001011349360000094
The sub-divisions in the optimized formula are inter-class dispersion matrices:
Figure BDA0001011349360000095
wherein the matrix
Figure BDA0001011349360000096
Can be expressed as:
Figure BDA0001011349360000097
wherein
Figure BDA0001011349360000098
Mean vector of c features
Figure BDA0001011349360000099
A composed matrix, EjjIs a local holding weight B of the meanihAnd row or column of, i.e.
Figure BDA00010113493600000910
Since there is no closed-form solution for equation (9), the trace ratio needs to be converted into a trace of the ratio:
Figure BDA00010113493600000911
solving the equation (13) by generalized eigenvalue decomposition to obtain the optimal mapping
Figure BDA00010113493600000912
And 4, step 4: projecting the training sample and the test sample to obtain mapped features, and connecting the mapped features in series to obtain fused features
Mapping image features and speech features by multiplying alpha, Yj=αTXjWherein X isjIs c XijThe constituent matrices, i.e. Xj=[X1j,...,Xij,...,Xcj]TAnd then, connecting the mapped features in series, wherein the specific method comprises the following steps:
Figure BDA0001011349360000101
Figure BDA0001011349360000102
and 5: sending the fusion characteristics of the training samples into a classifier for training and testing by using the test samples
The fusion features of the training samples obtained in the previous step are sent to a classifier (in this embodiment, libSVM is used), appropriate models and parameters are obtained through training of the classifier, and finally, test data are sent to the classifier to obtain a recognition result.
The above embodiments are not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (1)

1. A multi-mode emotion feature fusion method based on identification of local preserving projection is characterized by comprising the following steps:
A. firstly, extracting emotional characteristics from sample data of each mode in a multi-mode emotional database, then carrying out dimension reduction processing on emotional characteristic vectors of various modes, and using d for a sample of a jth modejDimensional feature vector xijrTo indicate that is
Figure FDA0002442388720000011
Wherein j is more than or equal to 1 and less than or equal to m, m is the number of modes, i is more than or equal to 1 and less than or equal to c, c is the number of emotion categories, r is more than or equal to 1 and less than or equal to nij,nijNumber of samples belonging to the ith emotion, jth mode, xijrRepresenting a characteristic vector of an r sample belonging to an ith emotion and a jth mode;
B. identifying local preserving projection to the characteristic vectors of different modes after dimension reduction, obtaining the optimal projection direction alpha of identifying the local preserving projection by maximizing the inter-class dispersion matrix and minimizing the intra-class dispersion matrix, and obtaining a local preserving weight matrix W between the characteristic vectorsrlSpecifically, the following are defined:
defining feature vectors x from the same emotion and modalityijrAnd xijlLocal hold weight matrix in between
Figure FDA0002442388720000012
Figure FDA0002442388720000013
Wherein x isijlRepresenting the feature vector of the ith sample from the ith emotion and the jth mode, 1 ≦ l ≦ nijThe parameter t can be set empirically, and the weight between the feature vectors from different emotions or modalities is not considered, and beta is generally 3-5, and comprises:
the objective of the described discriminating locally preserving projection is to solve the optimal projection matrix alpha*The emotional feature vectors x of various modes are combinedijrMapping to a uniform identification subspace to obtain a mapped feature vector yijrThe method comprises the following specific steps:
b1: defining a matrix of dispersion within a class
Figure FDA0002442388720000014
Figure FDA0002442388720000015
Wherein, yiklRepresenting the feature vector after mapping of the ith sample from the ith emotion and the kth mode, wherein k is more than or equal to 1 and less than or equal to m, and WrlWeights are maintained for local between feature vectors from the same emotion and modality;
b2: defining inter-class dispersion matrices
Figure FDA0002442388720000016
Figure FDA0002442388720000017
Wherein B isihFor local preservation of weights, μ, between feature vector means from the same modalityiAnd (3) the mean value of the feature vector after mapping for the ith sample:
Figure FDA0002442388720000018
wherein n isiIs the number of samples in class i,. mu.hThe characteristic vector mean value of the h type sample is obtained;
local hold weight matrix B between the feature vector meansihThe specific steps and definitions are as follows:
firstly, calculating the characteristic vector mean value of the ith emotion and the jth mode
Figure FDA0002442388720000019
Figure FDA0002442388720000021
Wherein
Figure FDA0002442388720000022
The superscript (x) represents the original sample space, and the mean of the feature vector from the h-type emotion and the j-type mode is calculated
Figure FDA0002442388720000023
Figure FDA0002442388720000024
Wherein n ishjNumber of samples belonging to the h-emotion, j-mode, xhjrRepresenting the characteristic vector of the r sample belonging to the h type emotion and the j type mode, wherein h is more than or equal to 1 and is less than or equal to c;
defining mean values of feature vectors from the same modality
Figure FDA0002442388720000025
And
Figure FDA0002442388720000026
local hold weight in betweenMatrix array
Figure FDA0002442388720000027
Figure FDA0002442388720000028
Wherein the parameter t can also be set empirically, without considering the weights between the mean values of the feature vectors from different modalities
B3: maximizing the inter-class dispersion matrix and minimizing the intra-class dispersion matrix, this goal can be expressed as the following optimization problem:
Figure FDA0002442388720000029
wherein Tr (-) is the trace of the matrix, here
Figure FDA00024423887200000210
To form an optimal projection direction matrix alpha*J is more than or equal to 1 and less than or equal to m, and m is the number of modes;
the optimization problem is to maximize the inter-class dispersion matrix and minimize the intra-class dispersion to obtain the maximum projection direction
Figure FDA00024423887200000211
The method comprises the following specific steps:
b3.1: transforming the optimization problem in B3 to obtain the following optimization problem:
Figure FDA00024423887200000212
the part of the son-mother in the optimized expression is an in-class dispersion matrix:
Figure FDA00024423887200000213
wherein the matrix
Figure FDA00024423887200000214
Can be expressed as:
Figure FDA0002442388720000031
wherein muik (x)Is the mean value of the feature vector from the ith emotion, the kth modeikNumber of samples from the ith emotion, kth modality, XijIs nijA feature vector xijrFormed feature matrix, L ═ mDrr-Wrl,DrrIs a diagonal matrix whose value is the row or column sum of a weight matrix W of the inter-sample eigenvectors, where W is a symmetric matrix and the row and column sums are equal, i.e.
Figure FDA0002442388720000032
The sub-divisions in the optimized formula are inter-class dispersion matrices:
Figure FDA0002442388720000033
wherein the matrix
Figure FDA0002442388720000034
Can be expressed as:
Figure FDA0002442388720000035
wherein
Figure FDA0002442388720000036
Is c mean vectors
Figure FDA0002442388720000037
A composed matrix, EjjIs a local holding weight B of the meanihAnd row or column of, i.e.
Figure FDA0002442388720000038
B3.2: because the optimization problem in B3.1 does not have a closed-form solution, the ratio of traces needs to be converted into the trace of the ratio, and the following optimization problem is finally obtained:
Figure FDA0002442388720000039
the optimal projection direction is obtained by solving the above formula through a generalized eigenvalue decomposition method
Figure FDA00024423887200000310
C. Mapping the feature vectors of different modes, Yj=αTXj,XjIs c XijThe constituent matrices, i.e. Xj=[X1j,...,Xij,...,Xcj]TWherein X isijRepresenting the emotional characteristics of different modes of different emotional categories, wherein i is more than or equal to 1 and less than or equal to c, c is the number of the emotional categories, j is more than or equal to 1 and less than or equal to m, and m is the number of the modes;
D. and connecting the mapped features in series to obtain a fusion feature:
Z=[αTX1,...,αTXj,...,αTXm]T
E. and sending the fusion characteristics of the training samples into a classifier for training and testing by using the test samples.
CN201610397708.4A 2016-06-07 2016-06-07 Multi-mode emotional feature fusion method based on identification of local preserving projection Active CN106096642B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610397708.4A CN106096642B (en) 2016-06-07 2016-06-07 Multi-mode emotional feature fusion method based on identification of local preserving projection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610397708.4A CN106096642B (en) 2016-06-07 2016-06-07 Multi-mode emotional feature fusion method based on identification of local preserving projection

Publications (2)

Publication Number Publication Date
CN106096642A CN106096642A (en) 2016-11-09
CN106096642B true CN106096642B (en) 2020-11-13

Family

ID=57227299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610397708.4A Active CN106096642B (en) 2016-06-07 2016-06-07 Multi-mode emotional feature fusion method based on identification of local preserving projection

Country Status (1)

Country Link
CN (1) CN106096642B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776740A (en) * 2016-11-17 2017-05-31 天津大学 A kind of social networks Text Clustering Method based on convolutional neural networks
CN108122006A (en) * 2017-12-20 2018-06-05 南通大学 Embedded method for diagnosing faults is locally kept based on differential weights
CN109284783B (en) * 2018-09-27 2022-03-18 广州慧睿思通信息科技有限公司 Machine learning-based worship counting method and device, user equipment and medium
CN109584885A (en) * 2018-10-29 2019-04-05 李典 A kind of audio-video output method based on multimode emotion recognition technology
CN109872728A (en) * 2019-02-27 2019-06-11 南京邮电大学 Voice and posture bimodal emotion recognition method based on kernel canonical correlation analysis
CN112289306B (en) * 2020-11-18 2024-03-26 上海依图网络科技有限公司 Juvenile identification method and device based on human body characteristics

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544963B (en) * 2013-11-07 2016-09-07 东南大学 A kind of speech-emotion recognition method based on core semi-supervised discrimination and analysis
CN104778689B (en) * 2015-03-30 2018-01-05 广西师范大学 A kind of image hashing method based on average secondary image and locality preserving projections
CN105138991B (en) * 2015-08-27 2016-08-31 山东工商学院 A kind of video feeling recognition methods merged based on emotion significant characteristics

Also Published As

Publication number Publication date
CN106096642A (en) 2016-11-09

Similar Documents

Publication Publication Date Title
CN106096642B (en) Multi-mode emotional feature fusion method based on identification of local preserving projection
CN108596039B (en) Bimodal emotion recognition method and system based on 3D convolutional neural network
CN106250855B (en) Multi-core learning based multi-modal emotion recognition method
CN113822192B (en) Method, equipment and medium for identifying emotion of on-press personnel based on multi-mode feature fusion of Transformer
Sidorov et al. Emotion recognition and depression diagnosis by acoustic and visual features: A multimodal approach
Sinha Recognizing complex patterns
Chao et al. Multi task sequence learning for depression scale prediction from video
Sahoo et al. Emotion recognition from audio-visual data using rule based decision level fusion
Sharma et al. D-FES: Deep facial expression recognition system
Huang et al. Characterizing types of convolution in deep convolutional recurrent neural networks for robust speech emotion recognition
Song et al. Speech emotion recognition using transfer non-negative matrix factorization
CN112418166B (en) Emotion distribution learning method based on multi-mode information
Yang et al. Modeling dynamics of expressive body gestures in dyadic interactions
Ocquaye et al. Dual exclusive attentive transfer for unsupervised deep convolutional domain adaptation in speech emotion recognition
Mangin et al. Learning semantic components from subsymbolic multimodal perception
Kumar et al. Artificial Emotional Intelligence: Conventional and deep learning approach
Liu et al. Audio-visual keyword spotting based on adaptive decision fusion under noisy conditions for human-robot interaction
Mathur et al. Unsupervised audio-visual subspace alignment for high-stakes deception detection
Sahu et al. Modeling feature representations for affective speech using generative adversarial networks
Atkar et al. Speech Emotion Recognition using Dialogue Emotion Decoder and CNN Classifier
Javaid et al. A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition.
CN111950592B (en) Multi-modal emotion feature fusion method based on supervised least square multi-class kernel canonical correlation analysis
CN116244474A (en) Learner learning state acquisition method based on multi-mode emotion feature fusion
CN111462762A (en) Speaker vector regularization method and device, electronic equipment and storage medium
Tu et al. Bimodal emotion recognition based on speech signals and facial expression

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 201, building 2, phase II, No.1 Kechuang Road, Yaohua street, Qixia District, Nanjing City, Jiangsu Province, 210003

Applicant after: NANJING University OF POSTS AND TELECOMMUNICATIONS

Address before: 210003 Gulou District, Jiangsu, Nanjing new model road, No. 66

Applicant before: NANJING University OF POSTS AND TELECOMMUNICATIONS

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant