CN106096642B - Multi-mode emotional feature fusion method based on identification of local preserving projection - Google Patents
Multi-mode emotional feature fusion method based on identification of local preserving projection Download PDFInfo
- Publication number
- CN106096642B CN106096642B CN201610397708.4A CN201610397708A CN106096642B CN 106096642 B CN106096642 B CN 106096642B CN 201610397708 A CN201610397708 A CN 201610397708A CN 106096642 B CN106096642 B CN 106096642B
- Authority
- CN
- China
- Prior art keywords
- emotion
- matrix
- mode
- equal
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Abstract
The invention discloses a multi-mode emotional feature fusion method based on identification local preserving projection, which comprises the steps of firstly extracting emotional features, such as voice features, expression features, posture features and the like, from sample data of each mode in a multi-mode emotional database, then mapping the emotional features of various modes into a uniform identification subspace by adopting the identification local preserving projection method, and finally performing series fusion on a plurality of groups of mapped features to obtain the fused multi-mode emotional features. The classifier taking the fused multi-modal emotion characteristics as input can effectively identify basic emotions such as anger, counter-emotion, fear, happiness, sadness, surprise and the like, and provides a new method and a new way for developing a human emotion classification identification system and realizing human-computer interaction.
Description
Technical Field
The invention belongs to the field of image processing and pattern recognition, relates to a feature fusion method applied to multi-mode emotion recognition, and particularly relates to a multi-mode emotion feature fusion method based on identification of local preserving projection.
Background
Emotional expression has been the most prominent way for humans to communicate and understand each other. As Computer technology has been developed, Human Computer Interaction (HCI) has become more and more valuable for research and practical purposes, and how computers recognize Human emotions has become important. With the continuous development of information technology, emotional information expressed by human beings, whether in a laboratory or in real life, is easily obtained by various sensors. Where images and speech are the most readily available affective information and are also the most important information for emotion recognition.
It is a complicated problem that a computer can recognize which emotions, which are expressed by people in real life, often have only slight differences which are difficult to distinguish by humans, so that the computer can recognize only basic emotions such as anger, counter-sensation, fear, happiness, sadness, surprise and the like at present. However, emotion technologies for recognizing these basic emotions have been widely used, for example, in education, medical treatment, human-computer interaction, video entertainment, and the like.
Over the past decades, there have been many single-modality based emotion recognitions, most commonly facial expression emotion recognition, speech emotion recognition and gesture-based emotion recognition, however, single-modality emotion recognition has a major limitation because the emotion information expressed by a person is a multi-modal emotion information, for example, a person expresses anger, and his voice, facial expression, body posture, heart rate and body temperature, etc. are greatly different from those in a normal state. If only one modal emotional feature is used for identification, better results are not obtained, especially in a real environment. Research results show that compared with single-mode emotion recognition, multi-mode emotion recognition is more reliable and accurate. The multi-modal emotion recognition considers various emotion information expressed by a person, comprehensively measures the emotion expressed by the person, and has robustness to interference of different conditions (for example, image information of a face may have different illumination, angles and other problems) in actual life.
For multi-modal emotion recognition, feature fusion is the most important ring, and different emotion features obtained by different sensors are fused to obtain fused features which are sent to a classifier for recognition. Common feature fusion methods are mainly classified into three categories: and obtaining a hierarchical fusion method, a characteristic layer fusion method and a decision layer fusion method. At present, in order to facilitate real-time performance, the three methods need to maintain enough important information and realize information compression, and inevitably have information loss, so that the identification accuracy is reduced. The feature layer fusion method has wide application in the fields of voice and images. At present, the research on multi-modal emotion recognition is far from perfection and enrichment of single-modal emotion recognition.
In the prior art, the invention patent with the publication number of CN105138991A and the name of 'a video emotion recognition method based on emotion significant feature fusion' discloses a video emotion recognition method based on emotion significant feature fusion, and the defects of the method are as follows: only image features and voice features in the video can be subjected to feature fusion, the expandability is poor, and features of other more modes cannot be subjected to feature fusion; the extracted image and voice features are not direct emotional features but are represented by color emotional intensity values and audio emotional dictionaries; the fusion algorithm is too simple, and the emotion characteristics after the fusion algorithm is simply weighted are poorer in distinguishability.
Disclosure of Invention
The invention aims to solve the technical problems that the fused emotion feature discrimination is poor in the multi-modal emotion recognition feature fusion method and the existing single-modal emotion recognition technology cannot obtain a more accurate recognition result.
In order to solve the problems, the invention provides a multi-mode emotion feature fusion method based on identification local preserving projection aiming at the requirements of an automatic human emotion assessment system and a human-computer interaction system, and provides a more accurate and reliable way for human-computer interaction. The specific technical scheme is as follows:
the multi-mode emotion feature fusion method based on identification of local preserving projection comprises the following steps:
A. firstly, extracting emotional characteristics from sample data of each mode in a multi-mode emotional database, then carrying out dimension reduction processing on emotional characteristic vectors of various modes, and using d for a sample of a jth modejDimensional feature vector xijrTo indicate that isWherein j is more than or equal to 1 and less than or equal to m, m is the number of modes, i is more than or equal to 1 and less than or equal to c, c is the number of emotion categories, r is more than or equal to 1 and less than or equal to nij,nijNumber of samples belonging to the ith emotion, jth mode, xijrRepresenting a characteristic vector of an r sample belonging to an ith emotion and a jth mode;
B. identifying local preserving projection on the feature vectors of different modes after dimension reduction to obtain the optimal projection direction alpha;
C. mapping the feature vectors of different modes, Yj=αTXj,XjIs c XijThe constituent matrices, i.e. Xj=[X1j,...,Xij,...,Xcj]T;
D. And connecting the mapped features in series to obtain a fusion feature:
Z=[αTX1,...,αTXj,...,αTXm]T。
further, step (ii)B, after dimension reduction, identifying local preserving projection is carried out to solve the optimal projection matrix alpha and obtain the emotional characteristic vectors x of various modesijrMapping to a uniform identification subspace to obtain a mapped feature vector yijrThe method comprises the following specific steps:
Wherein, yiklRepresenting the feature vector after mapping of the ith sample from the ith emotion and the kth mode, wherein k is more than or equal to 1 and less than or equal to m, and WrlWeights are maintained for local between feature vectors from the same emotion and modality;
Wherein B isihFor local preservation of weights, μ, between feature vector means from the same modalityiAnd (3) the mean value of the feature vector after mapping for the ith sample:
wherein n isiIs the number of samples in class i,. mu.hThe characteristic vector mean value of the h type sample is obtained;
b3: maximizing the inter-class dispersion matrix and minimizing the intra-class dispersion matrix, this goal can be expressed as the following optimization problem:
wherein Tr (-) is the trace of the matrix.
Further, a matrix of dispersion within the defined classIn step B1, a local retention weight matrix W between the feature vectorsrlSpecifically, the following are defined:
defining feature vectors x from the same emotion and modalityijrAnd xijlLocal hold weight matrix in between
Wherein x isijlRepresenting the feature vector of the ith sample from the ith emotion and the jth mode, 1 ≦ l ≦ nijThe parameter t may be set empirically, regardless of the weights between feature vectors from different emotions or modalities.
Further, defining inter-class dispersion matricesIn step B2, a local retention weight matrix B between the feature vector meansihThe specific steps and definitions are as follows:
WhereinThe superscript (x) represents the original sample space, and the mean of the feature vector from the h-type emotion and the j-type mode is calculated
Wherein n ishjNumber of samples belonging to the h-emotion, j-mode, xhjrRepresenting the characteristic vector of the r sample belonging to the h type emotion and the j type mode, wherein h is more than or equal to 1 and is less than or equal to c;
defining mean values of feature vectors from the same modalityAndlocal hold weight matrix in between
The parameter t can also be set empirically, without considering the weights between them for the mean of the feature vectors from different modalities.
Further, in step B3, the optimization problem maximizes the inter-class dispersion matrix and minimizes the intra-class dispersion to obtain the maximum projection directionThe method comprises the following specific steps:
b3.1: transforming the optimization problem in B3 to obtain the following optimization problem:
the part of the son-mother in the optimized expression is an in-class dispersion matrix:
wherein muik (x)Is the mean value of the feature vector from the ith emotion, the kth modeikNumber of samples from the ith emotion, kth modality, XijIs nijA feature vector xijrFormed feature matrix, L ═ mDrr-Wrl,DrrIs a diagonal matrix whose value is the row or column sum of a weight matrix W of the inter-sample eigenvectors (W being a symmetric matrix), i.e. a
The sub-divisions in the optimized formula are inter-class dispersion matrices:
whereinIs c mean vectorsA composed matrix, EjjIs a local holding weight B of the meanihAnd row or column of, i.e.
B3.2: because the optimization problem in B3.1 does not have a closed-form solution, the ratio of traces needs to be converted into the trace of the ratio, and the following optimization problem is finally obtained:
the optimal projection matrix is obtained by solving the above formula through a generalized eigenvalue decomposition method
Compared with the prior art, the invention has the advantages that:
(1) compared with single-mode emotional characteristics, the emotional characteristics adopting multi-mode fusion in the emotional recognition problem have higher accuracy and objectivity and better robustness in real situations.
(2) The multimode emotional characteristic fusion method based on the identification of the local preserving projection not only considers the dispersion between classes, but also considers the dispersion in the classes, has better discrimination on samples of different classes, and the introduced local preserving projection can be well adapted to the nonlinear condition. Finally, obtaining the multi-mode emotion fusion characteristics more suitable for emotion recognition.
The invention introduces a multi-modal emotion characteristic fusion method based on identification local preserving projection, applies the multi-modal emotion characteristic fusion method to multi-modal expression classification and identification work, can effectively identify six expressions such as anger, counter sensation, fear, happiness, sadness, surprise and the like, and provides a new method and approach for developing a human emotion automatic evaluation system and a human-computer interaction system.
Drawings
FIG. 1 is a flow chart of the multi-modal emotion feature fusion method based on the identification of local preserving projection of the present invention.
FIG. 2 is a partial image in a bimodal emotion database.
Detailed Description
The embodiments of the present invention will now be described in further detail with reference to the accompanying drawings. The implementation of the multi-modal emotional feature fusion method based on the identification of local preserving projection, as shown in fig. 1, mainly comprises the following steps:
step 1: capturing still images and speech segments of video in a multimodal database
In the specific implementation process, an eNBACTACE bimodal database is adopted. The database contains 1260 video segments from 42 people, each with emotion tags, expressing 6 basic emotions: anger, reaction, fear, happiness, sadness, and surprise (labels 1-6, respectively), as shown in fig. 2. The video size is 720 × 576, the sampling frequency is 25fps, and the sampling frequency of the sound in the video is 48 kHz. And (4) framing the video, and taking a frame with the most abundant expression as a static picture of the video. And separating voice of each video as a voice segment corresponding to the video. Finally, each video clip corresponds to a static image and a voice. And randomly selecting 75% of the images and the corresponding voices as training samples, and using the rest 25% as test samples.
Step 2: extracting the characteristics of image and voice information, reducing dimensions, and expressing by characteristic vector
Firstly, the static image obtained in the previous step is cut, the image of the face is cut out, the size of the image is 128 × 128, then image preprocessing operations such as alignment, scale normalization and gray level equalization are performed, and finally features such as Gabor, SIFT and LBP are extracted from the image (in the embodiment, Gabor features are extracted). For speech segments, use is made ofThe specialized speech processing toolbox OpenSmile extracts various features (in this embodiment, the emobase2010 feature is extracted). Because the extracted feature vector often has the problem of overhigh dimension, the feature with proper dimension is obtained by using a PCA dimension reduction method and using djDimensional feature vectors to represent the reduced-dimension image features and speech feature vectors, i.e.Wherein j is more than or equal to 1 and less than or equal to m, m is the number of modes, i is more than or equal to 1 and less than or equal to c, c is the number of emotion categories, r is more than or equal to 1 and less than or equal to nij,nijNumber of samples belonging to the ith emotion, jth mode, xijrA feature vector representing the r-th sample belonging to the i-th emotion and the j-th mode, and niIs the number of samples in class i and n is the number of all samples. In this embodiment, c is 6, m is 2, nij=210,niFor other different multimodal databases, only these parameters need to be changed, for example, for a trimodal database, m is 3.
And step 3: solving the optimal projection matrix alpha by adopting a method of identifying local preserving projection, and enabling the emotional characteristic vectors x of various modesijrMapping to a uniform identification subspace to obtain a mapped feature vector yijrThe method comprises the following specific steps:
first, feature vectors x from the same class and modality are definedijrAnd xijlLocal hold weight matrix in between
Wherein x isijlRepresenting the feature vector of the i-th sample from the i-th and j-th modes, 1 ≦ l ≦ nijThe parameter t may be obtained empirically. The weights between them are not taken into account for feature vectors from different modalities or classes. Then defining the dispersion matrix in class of each class
Wherein, yiklRepresenting the feature vector after mapping of the ith sample from the ith emotion and the kth mode, wherein k is more than or equal to 1 and less than or equal to m.
WhereinThe superscript (x) represents the original sample space, and the mean of the feature vector from the h-type emotion and the j-type mode is calculated
Wherein n ishjNumber of samples belonging to the h-emotion, j-mode, xhjrAnd the characteristic vector of the r sample belonging to the h type emotion and the j type mode is represented, and h is more than or equal to 1 and less than or equal to c. And intra-class dispersion matrixSimilarly, feature vector means from the same modality are definedAndlocal hold weight matrix in between
The parameter t can also be set empirically, without considering the weights between them for the mean of the feature vectors from different modalities.
Wherein, muiAverage of feature vectors after mapping for class i samples:
similarly, μhIs the average of the h-th class sample features.
Finally, in order to maximize the inter-class dispersion matrix while minimizing the intra-class dispersion matrix, the following optimization equation is obtained:
wherein Tr (-) is the trace of the matrix. The following optimization problem is obtained by simplification and transformation:
the part of the son-mother in the optimized expression is an in-class dispersion matrix:
wherein muik (x)Is the mean value of the feature vector from the ith emotion, the kth modeikFor emotion from class i, number of samples in modality k, XijIs nijA feature vector xijrFormed feature matrix, L ═ mDrr-Wrl,DrrIs a diagonal matrix whose value is the row or column sum of a weight matrix W of the inter-sample eigenvectors (W being a symmetric matrix), i.e. a
The sub-divisions in the optimized formula are inter-class dispersion matrices:
whereinMean vector of c featuresA composed matrix, EjjIs a local holding weight B of the meanihAnd row or column of, i.e.
Since there is no closed-form solution for equation (9), the trace ratio needs to be converted into a trace of the ratio:
And 4, step 4: projecting the training sample and the test sample to obtain mapped features, and connecting the mapped features in series to obtain fused features
Mapping image features and speech features by multiplying alpha, Yj=αTXjWherein X isjIs c XijThe constituent matrices, i.e. Xj=[X1j,...,Xij,...,Xcj]TAnd then, connecting the mapped features in series, wherein the specific method comprises the following steps:
and 5: sending the fusion characteristics of the training samples into a classifier for training and testing by using the test samples
The fusion features of the training samples obtained in the previous step are sent to a classifier (in this embodiment, libSVM is used), appropriate models and parameters are obtained through training of the classifier, and finally, test data are sent to the classifier to obtain a recognition result.
The above embodiments are not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (1)
1. A multi-mode emotion feature fusion method based on identification of local preserving projection is characterized by comprising the following steps:
A. firstly, extracting emotional characteristics from sample data of each mode in a multi-mode emotional database, then carrying out dimension reduction processing on emotional characteristic vectors of various modes, and using d for a sample of a jth modejDimensional feature vector xijrTo indicate that isWherein j is more than or equal to 1 and less than or equal to m, m is the number of modes, i is more than or equal to 1 and less than or equal to c, c is the number of emotion categories, r is more than or equal to 1 and less than or equal to nij,nijNumber of samples belonging to the ith emotion, jth mode, xijrRepresenting a characteristic vector of an r sample belonging to an ith emotion and a jth mode;
B. identifying local preserving projection to the characteristic vectors of different modes after dimension reduction, obtaining the optimal projection direction alpha of identifying the local preserving projection by maximizing the inter-class dispersion matrix and minimizing the intra-class dispersion matrix, and obtaining a local preserving weight matrix W between the characteristic vectorsrlSpecifically, the following are defined:
defining feature vectors x from the same emotion and modalityijrAnd xijlLocal hold weight matrix in between
Wherein x isijlRepresenting the feature vector of the ith sample from the ith emotion and the jth mode, 1 ≦ l ≦ nijThe parameter t can be set empirically, and the weight between the feature vectors from different emotions or modalities is not considered, and beta is generally 3-5, and comprises:
the objective of the described discriminating locally preserving projection is to solve the optimal projection matrix alpha*The emotional feature vectors x of various modes are combinedijrMapping to a uniform identification subspace to obtain a mapped feature vector yijrThe method comprises the following specific steps:
Wherein, yiklRepresenting the feature vector after mapping of the ith sample from the ith emotion and the kth mode, wherein k is more than or equal to 1 and less than or equal to m, and WrlWeights are maintained for local between feature vectors from the same emotion and modality;
Wherein B isihFor local preservation of weights, μ, between feature vector means from the same modalityiAnd (3) the mean value of the feature vector after mapping for the ith sample:
wherein n isiIs the number of samples in class i,. mu.hThe characteristic vector mean value of the h type sample is obtained;
local hold weight matrix B between the feature vector meansihThe specific steps and definitions are as follows:
WhereinThe superscript (x) represents the original sample space, and the mean of the feature vector from the h-type emotion and the j-type mode is calculated
Wherein n ishjNumber of samples belonging to the h-emotion, j-mode, xhjrRepresenting the characteristic vector of the r sample belonging to the h type emotion and the j type mode, wherein h is more than or equal to 1 and is less than or equal to c;
defining mean values of feature vectors from the same modalityAndlocal hold weight in betweenMatrix array
Wherein the parameter t can also be set empirically, without considering the weights between the mean values of the feature vectors from different modalities
B3: maximizing the inter-class dispersion matrix and minimizing the intra-class dispersion matrix, this goal can be expressed as the following optimization problem:
wherein Tr (-) is the trace of the matrix, hereTo form an optimal projection direction matrix alpha*J is more than or equal to 1 and less than or equal to m, and m is the number of modes;
the optimization problem is to maximize the inter-class dispersion matrix and minimize the intra-class dispersion to obtain the maximum projection directionThe method comprises the following specific steps:
b3.1: transforming the optimization problem in B3 to obtain the following optimization problem:
the part of the son-mother in the optimized expression is an in-class dispersion matrix:
wherein muik (x)Is the mean value of the feature vector from the ith emotion, the kth modeikNumber of samples from the ith emotion, kth modality, XijIs nijA feature vector xijrFormed feature matrix, L ═ mDrr-Wrl,DrrIs a diagonal matrix whose value is the row or column sum of a weight matrix W of the inter-sample eigenvectors, where W is a symmetric matrix and the row and column sums are equal, i.e.
The sub-divisions in the optimized formula are inter-class dispersion matrices:
whereinIs c mean vectorsA composed matrix, EjjIs a local holding weight B of the meanihAnd row or column of, i.e.
B3.2: because the optimization problem in B3.1 does not have a closed-form solution, the ratio of traces needs to be converted into the trace of the ratio, and the following optimization problem is finally obtained:
the optimal projection direction is obtained by solving the above formula through a generalized eigenvalue decomposition method
C. Mapping the feature vectors of different modes, Yj=αTXj,XjIs c XijThe constituent matrices, i.e. Xj=[X1j,...,Xij,...,Xcj]TWherein X isijRepresenting the emotional characteristics of different modes of different emotional categories, wherein i is more than or equal to 1 and less than or equal to c, c is the number of the emotional categories, j is more than or equal to 1 and less than or equal to m, and m is the number of the modes;
D. and connecting the mapped features in series to obtain a fusion feature:
Z=[αTX1,...,αTXj,...,αTXm]T;
E. and sending the fusion characteristics of the training samples into a classifier for training and testing by using the test samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610397708.4A CN106096642B (en) | 2016-06-07 | 2016-06-07 | Multi-mode emotional feature fusion method based on identification of local preserving projection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610397708.4A CN106096642B (en) | 2016-06-07 | 2016-06-07 | Multi-mode emotional feature fusion method based on identification of local preserving projection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106096642A CN106096642A (en) | 2016-11-09 |
CN106096642B true CN106096642B (en) | 2020-11-13 |
Family
ID=57227299
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610397708.4A Active CN106096642B (en) | 2016-06-07 | 2016-06-07 | Multi-mode emotional feature fusion method based on identification of local preserving projection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106096642B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776740A (en) * | 2016-11-17 | 2017-05-31 | 天津大学 | A kind of social networks Text Clustering Method based on convolutional neural networks |
CN108122006A (en) * | 2017-12-20 | 2018-06-05 | 南通大学 | Embedded method for diagnosing faults is locally kept based on differential weights |
CN109284783B (en) * | 2018-09-27 | 2022-03-18 | 广州慧睿思通信息科技有限公司 | Machine learning-based worship counting method and device, user equipment and medium |
CN109584885A (en) * | 2018-10-29 | 2019-04-05 | 李典 | A kind of audio-video output method based on multimode emotion recognition technology |
CN109872728A (en) * | 2019-02-27 | 2019-06-11 | 南京邮电大学 | Voice and posture bimodal emotion recognition method based on kernel canonical correlation analysis |
CN112289306B (en) * | 2020-11-18 | 2024-03-26 | 上海依图网络科技有限公司 | Juvenile identification method and device based on human body characteristics |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544963B (en) * | 2013-11-07 | 2016-09-07 | 东南大学 | A kind of speech-emotion recognition method based on core semi-supervised discrimination and analysis |
CN104778689B (en) * | 2015-03-30 | 2018-01-05 | 广西师范大学 | A kind of image hashing method based on average secondary image and locality preserving projections |
CN105138991B (en) * | 2015-08-27 | 2016-08-31 | 山东工商学院 | A kind of video feeling recognition methods merged based on emotion significant characteristics |
-
2016
- 2016-06-07 CN CN201610397708.4A patent/CN106096642B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106096642A (en) | 2016-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106096642B (en) | Multi-mode emotional feature fusion method based on identification of local preserving projection | |
CN108596039B (en) | Bimodal emotion recognition method and system based on 3D convolutional neural network | |
CN106250855B (en) | Multi-core learning based multi-modal emotion recognition method | |
CN113822192B (en) | Method, equipment and medium for identifying emotion of on-press personnel based on multi-mode feature fusion of Transformer | |
Sidorov et al. | Emotion recognition and depression diagnosis by acoustic and visual features: A multimodal approach | |
Sinha | Recognizing complex patterns | |
Chao et al. | Multi task sequence learning for depression scale prediction from video | |
Sahoo et al. | Emotion recognition from audio-visual data using rule based decision level fusion | |
Sharma et al. | D-FES: Deep facial expression recognition system | |
Huang et al. | Characterizing types of convolution in deep convolutional recurrent neural networks for robust speech emotion recognition | |
Song et al. | Speech emotion recognition using transfer non-negative matrix factorization | |
CN112418166B (en) | Emotion distribution learning method based on multi-mode information | |
Yang et al. | Modeling dynamics of expressive body gestures in dyadic interactions | |
Ocquaye et al. | Dual exclusive attentive transfer for unsupervised deep convolutional domain adaptation in speech emotion recognition | |
Mangin et al. | Learning semantic components from subsymbolic multimodal perception | |
Kumar et al. | Artificial Emotional Intelligence: Conventional and deep learning approach | |
Liu et al. | Audio-visual keyword spotting based on adaptive decision fusion under noisy conditions for human-robot interaction | |
Mathur et al. | Unsupervised audio-visual subspace alignment for high-stakes deception detection | |
Sahu et al. | Modeling feature representations for affective speech using generative adversarial networks | |
Atkar et al. | Speech Emotion Recognition using Dialogue Emotion Decoder and CNN Classifier | |
Javaid et al. | A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition. | |
CN111950592B (en) | Multi-modal emotion feature fusion method based on supervised least square multi-class kernel canonical correlation analysis | |
CN116244474A (en) | Learner learning state acquisition method based on multi-mode emotion feature fusion | |
CN111462762A (en) | Speaker vector regularization method and device, electronic equipment and storage medium | |
Tu et al. | Bimodal emotion recognition based on speech signals and facial expression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: Room 201, building 2, phase II, No.1 Kechuang Road, Yaohua street, Qixia District, Nanjing City, Jiangsu Province, 210003 Applicant after: NANJING University OF POSTS AND TELECOMMUNICATIONS Address before: 210003 Gulou District, Jiangsu, Nanjing new model road, No. 66 Applicant before: NANJING University OF POSTS AND TELECOMMUNICATIONS |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |