CN111178389B - Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling - Google Patents
Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling Download PDFInfo
- Publication number
- CN111178389B CN111178389B CN201911244389.3A CN201911244389A CN111178389B CN 111178389 B CN111178389 B CN 111178389B CN 201911244389 A CN201911244389 A CN 201911244389A CN 111178389 B CN111178389 B CN 111178389B
- Authority
- CN
- China
- Prior art keywords
- modal
- tensor
- data
- dimension
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a multi-mode depth layering fusion emotion analysis method based on multi-channel tensor pooling. Based on the attention mechanism method, corresponding weights can be set for the multi-modal data, and the importance degrees of the data in different modes are divided, so that the interaction of the multi-modal data with large contribution degrees is amplified in the fusion part according to the different contribution degrees of the data in different modes to the task. And compared with the single-channel polynomial tensor pooling module, the multi-channel polynomial tensor pooling module can obtain high-robustness local high-dimensionality complex nonlinear interaction information from a fine-granularity layer. On the basis of judging the importance degree of multi-modal data, the invention can depict stable local high-dimensional complex dynamic interaction information from a fine-grained level, and is an effective supplement to a multi-modal fusion framework in the field of current emotion recognition.
Description
Technical Field
The invention belongs to the field of multi-modal emotion recognition in the cross field of natural language processing, vision and voice, and particularly relates to a method for judging a tested emotion state by performing fine-grained layered fusion on multi-modal information based on a multi-channel polynomial tensor pooling technology of an attention mechanism.
Background
How to effectively judge the emotional state of an individual is always a current research hotspot. For example, the commodity website can analyze and judge the evaluation of a specific commodity by the consumer according to the facial expression, voice or text evaluation of the consumer, so as to obtain emotional feedback (negative emotion or positive emotion) of the consumer on the commodity.
Single modality data, such as facial expressions, speech data or text data, respectively, may be used for emotional state recognition, but single modality data is not sufficient to fully characterize a certain emotional state, the multimodal data can supplement the emotion recognition task with information from multiple perspectives (e.g., from textual information analysis alone, only fuzzy emotion state discrimination may be possible, but emotion type can be further determined in conjunction with expression information. for example, an individual can say "you can really be annoying" in a full smile, from text alone "you can really be annoying" it can be determined that the individual's current valence is negative, but from the individual's facial expression a distinct opposite valence determination-positive emotion), meanwhile, interactive information among the multi-modal data can be used as a common characteristic mode contained in the multi-modal data, so that the robustness of the emotion recognition task is enhanced.
The current multi-modal data fusion method is generally analyzed from a coarse-grained perspective, and only two simple linear fusion methods, namely bilinear fusion and trilinearity fusion, are generally considered, so that only low-dimensional simple interaction information between multi-modal data can be obtained. Meanwhile, the existing tensor-based linear fusion method is to carry out integral decomposition on tensor data obtained by fusion, so that storage burden and computational complexity are increased (because the required storage capacity tends to grow exponentially along with the increase of a fusion order), and higher-order and more-complex interaction cannot be carried out. Meanwhile, the existing multi-modal interaction model considers that the importance degree of each modal data is the same during interaction, different weight information is not given to a plurality of modal data, and the final task precision is deviated.
Disclosure of Invention
The invention aims to provide a multi-mode depth layering fusion emotion analysis method based on multi-channel tensor pooling, aiming at the defects of the prior art. Firstly, an attention network is added to multi-modal data, different weight information is set for each modality, and different importance degrees are represented (the effect of the modality data with large contribution degree on an interaction part can be amplified). Secondly, multi-channel tensor pooling characterization is performed on multi-modal data obtained through the attention network (stability of data characterization is enhanced). And finally, performing deep layered cyclic fusion on the multichannel tensor pooling representation data, wherein the obtained global information representation can be used for judging the emotional task.
The technical scheme adopted by the invention is as follows:
The modality is a source or a form of information, and the multimodal information data comprises voice, video, words and other media data capable of recording human emotion information.
In order to avoid overlarge feature data distribution difference of each modal information data, a Long Short-Term Memory (LSTM) network or a Gated cycle unit (GRU) network is adopted to respectively extract a Short-Term Memory vector of each modal information data at each moment as a feature vector of the moment;
whereinThe characteristic vector corresponding to the t-th moment representing the m-th modal vector, namely the short-term memory vector of the LSTM network at the t-th moment, goutIs the output gate of the LSTM network, C (t) is the long-term memory unit of the LSTM network, and f is the activation function.
Step 3, multi-mode data information organization
Organizing the characteristic vectors of the modal information data preprocessed in the step 2 into a pseudo two-dimensional matrix G, wherein the first dimension is a time dimension, the second dimension is a modal dimension, and each element in the matrix represents the characteristic vector of the corresponding time modal;
wherein T represents the size of the data time dimension, and M represents the number of modes;
step 4, attention mechanism setting
Aiming at the pseudo two-dimensional matrix G obtained in the step 3, an attention network is set for all modal data at all moments to obtain a new pseudo two-dimensional matrix G1:
WhereinRespectively in each modeThe weight at the t-th moment;representing a modular multiplication.
Step 5, multi-channel high-order polynomial tensor pooling operation of multi-modal information
5.1 number of initialization iterations k equals 1, size in time dimension T0=T;
5.2 at size T0In the time dimension, for a pseudo two-dimensional matrix GkSplicing all the characteristic vectors of any two modes in a time window to obtain a new characteristic vector zij(ii) a Then z is paired according to equation (4)ijCarrying out high-order (P-order) polynomial fusion operation to obtain P-order data tensor Zp:
The length of the time window is T1Step length is s;
then to ZpPerforming C single-channel low-rank tensor pooling operation according to each dimension of the P-order tensor to finally obtain C new eigenvectorsWherein the feature vectorH-th data element z ofhThe following were used:
wherein WhIs a tensor weight of the P order, i1,…,ipSubscripts for each dimension of the P-order tensor;
for the C new feature vectorsPerforming maximum pooling to obtain local feature vector of two-mode information fusion in the time windowWhereinH-th data element z'hThe following were used:
wherein C is the number of times of single-channel tensor pooling operation of modal information in the same time window, namely the number of channels of the multi-channel tensor pooling operation; whcA P-order tensor weight for the c-th channel;
for pseudo two-dimensional matrix GkAll the modal feature vectors are subjected to the two-modal fusion operation to obtain a plurality of modal feature vectorsThe final build size isPseudo two-dimensional matrix G ofk+1;
5.3 judging whether k is more than or equal to N, wherein N is the maximum iteration number, and if so, judging whether k is more than or equal to NIf yes, outputting the current pseudo two-dimensional matrix Gk+1Otherwise, resetting k to k +1,and jumps to step 5.2.
Step 6, multi-modal global interaction
For the pseudo two-dimensional matrix G output in the step (5)k+1Splicing all the characteristic vectors to obtain a new characteristic vector z'; and then carrying out a high-order (P-order) polynomial fusion operation (such as formula (4)) on Z 'to obtain a P-order data tensor Z'pAnd then to Z'pAnd (3) performing multichannel low-rank tensor pooling operation according to the dimensions of the P-order tensor (such as the formula (6)), and finally obtaining the global eigenvector z.
Step 7, multi-modal information data classification
And (4) comparing the global interaction vector z obtained in the step (6) with a previous emotion class label to finally obtain a classification result.
The emotion category label is an emotion category label marked in advance when the emotion modal information data is collected in the step (1).
The invention has the beneficial effects that: according to the method, the importance degree of corresponding weight division of different modal data is set for multi-modal data in combination with an attention-based mechanism method, so that the modal data with large contribution degree are amplified in a fusion part for interaction according to the contribution degree of the different modal data to a task; secondly, the multi-channel tensor pooling operation is adopted, and the problem of unstable high-dimensional complex interaction existing in single-channel tensor pooling is solved. The method is based on the iterative fusion of different contribution degrees of multi-modal data, and the stable high-dimensional complex dynamic interaction information with strong robustness is described from a fine granularity level, so that the method is an effective supplement for a multi-modal fusion framework in the current emotion recognition field.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of the multi-channel higher-order polynomial tensor pooling operation of the multi-modal information of the present invention;
FIG. 3 is a diagram of a layered fusion framework of the present invention;
FIG. 4 is a schematic illustration of an attention mechanism;
FIG. 5 is a schematic diagram of a single channel polynomial tensor pooling module;
FIG. 6 is a schematic diagram of a multi-channel polynomial tensor pooling module.
Detailed Description
The method of the present invention will be described in detail below with reference to the accompanying drawings.
The multi-modal depth layered fusion emotion analysis method based on multi-channel tensor pooling is shown in figure 1:
Only fuzzy emotion state judgment can be obtained according to the text information, namely, the emotion type (such as negative emotion or positive emotion) cannot be accurately judged from the text information; the emotional valence (positive or negative) can be preliminarily determined according to the facial expression of the individual in the video; the emotional activation degree can be objectively judged according to the fluctuation state (such as amplitude value) of the sound in a certain period of time.
In order to avoid overlarge feature data distribution difference of each modal information data, a Long Short-Term Memory (LSTM) network or a Gated cycle unit (GRU) network is adopted to respectively extract a Short-Term Memory vector of each modal information data at each moment as a feature vector of the moment;
whereinThe characteristic vector corresponding to the t-th moment representing the m-th modal vector, namely the short-term memory vector of the LSTM network at the t-th moment, goutIs the output gate of the LSTM network, C (t) is the long-term memory unit of the LSTM network, and f is the activation function.
Step 3, multi-mode data information organization
Organizing the feature vectors of the modal information data preprocessed in the step 2 into a pseudo two-dimensional matrix G, wherein the first dimension is a time dimension (T is 8), the second dimension is a modal dimension (M is 3), and each element in the matrix represents the feature vector of the corresponding time modal;
wherein T represents the size of the data time dimension, and M represents the number of modes;
step 4, attention mechanism setting
Aiming at the pseudo two-dimensional matrix G obtained in the step 3, an attention network is set for all modal data at all moments to obtain a new pseudo two-dimensional matrix G1:
WhereinRespectively in each modeThe weight at the t-th moment;representing a modular multiplication.
Step 5, multi-channel high-order polynomial tensor pooling operation of multi-modal information: firstly, scanning a time window along a modal dimension to respectively obtain a video mode and an audio mode]Text modality, audio modality]And [ text modalityVideo modality]After the modal dimension scanning is finished, the scanning is carried out along the time dimension, so that 12 new eigenvectors can be obtained from the first layer and used as a pseudo two-dimensional matrix G of the second layer2Then, two-by-two modal information data fusion is carried out on the feature vectors of the second layer, and 6 new feature vectors can be obtained on the second layer and used as a pseudo two-dimensional matrix G of the third layer3And finally, fusing all nodes of the time window including the current layer on the third layer, and taking the finally obtained output feature vector as an emotional state judgment basis.
5.1 number of initialization iterations k equals 1, size in time dimension T0=T;
5.2 at size T0In the time dimension, e.g. FIG. 3 for a pseudo two-dimensional matrix GkSplicing all the characteristic vectors of any two modes in a time window to obtain a new characteristic vector zij(ii) a Then z is paired according to equation (4)ijCarrying out high-order (P-order) polynomial fusion operation to obtain P-order data tensor Zp:
The length of the time window is T1Step length is s; t is12 (including t)1And t2Time data), s is 2;
as shown in FIG. 5, the conventional is ZpPerforming single-channel low-rank tensor pooling operation according to dimensions of P-order tensor, and finally outputting a new eigenvector z as the output of each time windowij', wherein the feature vector zij' the h-th data element zhThe following were used:
wherein WhIs a tensor weight of the P order, i1,…,ipSubscripts for each dimension of the P-order tensor;
however, although the pooling of the single-channel high-order (P-th order) polynomial fusion tensor can obtain high-dimensional complex interaction information, the method may have an unstable model, so that the robustness of the method is stronger, the invention provides the pooling operation of the multiple single-channel high-order (P-th order) polynomial fusion tensors as shown in fig. 6, specifically:
to ZpPerforming C single-channel low-rank tensor pooling operation according to each dimension of the P-order tensor to finally obtain C new eigenvectorsWherein the feature vectorH-th data element zhThe following were used:
wherein WhIs a tensor weight of the P order, i1,…,ipSubscripts for each dimension of the P-order tensor;
for the C new feature vectorsPerforming maximum pooling to obtain local feature vector of two-mode information fusion in the time windowWhereinH-th data element z'hThe following were used:
formula (where C is the number of times of single-channel tensor pooling operation of modal information in the same time window, i.e. the number of channels of multi-channel tensor pooling operation; WhcA P-order tensor weight for the c-th channel;
as shown in fig. 6, which is a schematic diagram of the multi-channel polynomial tensor pooling module of the present invention, compared to the single-channel polynomial tensor pooling module, the multi-channel pooling operation performs multiple high-order (P-order) polynomial fusion operations on the spliced data to obtain multiple P-order data tensors, and finally outputs multiple new eigenvectors in one time window, and a maximum pooling operation is performed on the multiple eigenvectors, that is, a maximum value solving operation is performed on all element sets specified by the same subscript of the multiple eigenvectors, and the obtained maximum value is used as a new element specified by the subscript, so that finally the multiple eigenvectors perform a dimension reduction operation along the channel dimension, and only one eigenvector is obtained as an output of the time window, which greatly increases robustness and reduces randomness.
For pseudo two-dimensional matrix GkAll the modal feature vectors are subjected to the two-modal fusion operation to obtain a plurality of modal feature vectorsThe final build size isPseudo two-dimensional matrix G ofk+1;
5.3, judging whether k is larger than or equal to N, wherein N is the maximum iteration number (N is 2), and if so, outputting the current pseudo two-dimensional matrix Gk+1Otherwise, resetting k to k +1,and jumps to step 5.2.
Step 6, multi-modal global interaction
For the pseudo two-dimensional matrix G output in the step (5)k+1Splicing all the characteristic vectors to obtain a new characteristic vector z'; then, performing high-order (P order) polynomial fusion on zOperating (as formula (4)) to obtain P-order data tensor Z'pZ 'to'pAnd (3) performing multichannel low-rank tensor pooling operation according to the dimensions of the P-order tensor (such as the formula (6)), and finally obtaining the global eigenvector z.
Step 7, multi-modal information data classification
And (4) comparing the global interaction vector z obtained in the step (6) with a previous emotion class label to finally obtain a classification result.
As shown in Table 1, the emotion state discrimination task is performed on two multi-modal emotion databases CMU-MOSI and IEMOCAP simultaneously by the method and the four basic multi-modal fusion methods, MAE is mean square error, CORR is Pearson correlation coefficient, ACC-7 is 7 classification accuracy, and a plurality of indexes for comparing and measuring the discrimination task are known, so that the result of the method is superior to or equivalent to that of a basic model.
TABLE 1 comparison of results
Claims (1)
1. The multi-modal depth layered fusion emotion analysis method based on multi-channel tensor pooling is characterized by comprising the following steps of:
step 1, acquiring multi-modal information data
Step 2, multi-mode information data preprocessing
Respectively extracting short-term memory vectors of each moment of each modal information data as characteristic vectors of the moment by adopting a long-term and short-term memory network or a gated cyclic unit network;
whereinThe characteristic vector corresponding to the t-th moment representing the m-th modal vector, namely the short-term memory vector of the LSTM network at the t-th moment, goutIs the output gate of the LSTM network, C (t) is the long-term memory unit of the LSTM network, and f is the activation function;
step 3, multi-mode data information organization
Organizing the characteristic vectors of the modal information data preprocessed in the step 2 into a pseudo two-dimensional matrix G, wherein the first dimension is a time dimension, the second dimension is a modal dimension, and each element in the matrix represents the characteristic vector of the corresponding time modal;
wherein T represents the size of the data time dimension, and M represents the number of modes;
step 4, attention mechanism setting
Aiming at the pseudo two-dimensional matrix G obtained in the step 3, an attention network is set for all modal data at all moments to obtain a new pseudo two-dimensional matrix G1:
step 5, multi-channel high-order polynomial tensor pooling operation of multi-modal information
5.1 number of initialization iterations k equals 1, size in time dimension T0=T;
5.2 at size T0In the time dimension, for a pseudo two-dimensional matrix GkSplicing all the characteristic vectors of any two modes in a time window to obtain a new characteristic vector zij(ii) a Then z is paired according to equation (4)ijCarrying out high-order polynomial fusion operation to obtain P-order data tensor Zp:
The length of the time window is T1Step length is s;
then to ZpPerforming C single-channel low-rank tensor pooling operation according to each dimension of the P-order tensor to finally obtain C new eigenvectorsWherein the feature vectorH-th data element zhThe following were used:
wherein WhIs a tensor weight of the P order, i1,...,ipSubscripts for each dimension of the P-order tensor;
for the C new feature vectorsPerforming maximum pooling to obtain local feature vector of two-mode information fusion in the time windowWhereinH-th data element z'hThe following were used:
wherein C is the number of times of single-channel tensor pooling operation of modal information in the same time window, namely the number of channels of the multi-channel tensor pooling operation; whcA P-order tensor weight for the c-th channel;
for pseudo two-dimensional matrix GkAll the modal feature vectors are subjected to the two-modal fusion operation to obtain a plurality of modal feature vectorsThe final build size isPseudo two-dimensional matrix G ofk+1;
5.3 judging whether k is more than or equal to N, wherein N is the maximum iteration number, and if so, outputting the current pseudo two-dimensional matrix Gk+1Otherwise, resetting k to k +1,and skipping to step 5.2;
step 6, multi-modal global interaction
For the pseudo two-dimensional matrix G output in the step (5)k+1Splicing all the characteristic vectors to obtain a new characteristic vector z'; then carrying out high-order polynomial fusion operation on Z 'to obtain P-order data tensor Z'pZ 'to'pAccording to the P-th order tensorPerforming multichannel low-rank tensor pooling operation on each dimension to finally obtain a global eigenvector z;
step 7, multi-modal information data classification
And (4) comparing the global interaction vector z obtained in the step (6) with a previous emotion class label to finally obtain a classification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911244389.3A CN111178389B (en) | 2019-12-06 | 2019-12-06 | Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911244389.3A CN111178389B (en) | 2019-12-06 | 2019-12-06 | Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111178389A CN111178389A (en) | 2020-05-19 |
CN111178389B true CN111178389B (en) | 2022-02-11 |
Family
ID=70655407
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911244389.3A Active CN111178389B (en) | 2019-12-06 | 2019-12-06 | Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111178389B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753549B (en) * | 2020-05-22 | 2023-07-21 | 江苏大学 | Multi-mode emotion feature learning and identifying method based on attention mechanism |
CN111786979B (en) * | 2020-06-24 | 2022-07-22 | 杭州电子科技大学 | Power attack identification method based on multi-mode learning |
CN112199504B (en) * | 2020-10-30 | 2022-06-03 | 福州大学 | Visual angle level text emotion classification method and system integrating external knowledge and interactive attention mechanism |
CN112329604B (en) * | 2020-11-03 | 2022-09-20 | 浙江大学 | Multi-modal emotion analysis method based on multi-dimensional low-rank decomposition |
CN112329633B (en) * | 2020-11-05 | 2022-08-23 | 南开大学 | Emotion identification method, device, medium and electronic equipment based on tensor decomposition |
CN112597841B (en) * | 2020-12-14 | 2023-04-18 | 之江实验室 | Emotion analysis method based on door mechanism multi-mode fusion |
CN112612936B (en) * | 2020-12-28 | 2022-03-08 | 杭州电子科技大学 | Multi-modal emotion classification method based on dual conversion network |
CN113064968B (en) * | 2021-04-06 | 2022-04-19 | 齐鲁工业大学 | Social media emotion analysis method and system based on tensor fusion network |
CN113208593A (en) * | 2021-04-08 | 2021-08-06 | 杭州电子科技大学 | Multi-modal physiological signal emotion classification method based on correlation dynamic fusion |
CN113469365B (en) * | 2021-06-30 | 2024-03-19 | 上海寒武纪信息科技有限公司 | Reasoning and compiling method based on neural network model and related products thereof |
CN114511494A (en) * | 2021-12-21 | 2022-05-17 | 北京医准智能科技有限公司 | Gland density grade determining method and device and computer readable storage medium |
CN116563751B (en) * | 2023-04-19 | 2024-02-06 | 湖北工业大学 | Multi-mode emotion analysis method and system based on attention mechanism |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409296A (en) * | 2018-10-30 | 2019-03-01 | 河北工业大学 | The video feeling recognition methods that facial expression recognition and speech emotion recognition are merged |
CN110287389A (en) * | 2019-05-31 | 2019-09-27 | 南京理工大学 | The multi-modal sensibility classification method merged based on text, voice and video |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090164302A1 (en) * | 2007-12-20 | 2009-06-25 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Methods and systems for specifying a cohort-linked avatar attribute |
-
2019
- 2019-12-06 CN CN201911244389.3A patent/CN111178389B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409296A (en) * | 2018-10-30 | 2019-03-01 | 河北工业大学 | The video feeling recognition methods that facial expression recognition and speech emotion recognition are merged |
CN110287389A (en) * | 2019-05-31 | 2019-09-27 | 南京理工大学 | The multi-modal sensibility classification method merged based on text, voice and video |
Non-Patent Citations (2)
Title |
---|
EmoSense: Automatically Sensing Emotions From Speech By Multi-way Classification;V Ramu Reddy等;《IEEE》;20181029;第4987-4990页 * |
基于多模态判别性嵌入空间的图像情感分析;吕光瑞;《北京邮电大学学报》;20190319;第42卷(第1期);第61-67页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111178389A (en) | 2020-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111178389B (en) | Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling | |
CN112818861B (en) | Emotion classification method and system based on multi-mode context semantic features | |
Dering et al. | A convolutional neural network model for predicting a product's function, given its form | |
Zheng et al. | An ensemble model for multi-level speech emotion recognition | |
CN112699774A (en) | Method and device for recognizing emotion of person in video, computer equipment and medium | |
CN112487949B (en) | Learner behavior recognition method based on multi-mode data fusion | |
CN112560495A (en) | Microblog rumor detection method based on emotion analysis | |
CN112508077A (en) | Social media emotion analysis method and system based on multi-modal feature fusion | |
CN114973062A (en) | Multi-modal emotion analysis method based on Transformer | |
Pandey et al. | Attention gated tensor neural network architectures for speech emotion recognition | |
CN112732921B (en) | False user comment detection method and system | |
CN111985612B (en) | Encoder network model design method for improving video text description accuracy | |
CN102663432A (en) | Kernel fuzzy c-means speech emotion identification method combined with secondary identification of support vector machine | |
CN110502757B (en) | Natural language emotion analysis method | |
CN114443899A (en) | Video classification method, device, equipment and medium | |
CN115545093A (en) | Multi-mode data fusion method, system and storage medium | |
CN112541541B (en) | Lightweight multi-modal emotion analysis method based on multi-element layering depth fusion | |
Prasath | Design of an integrated learning approach to assist real-time deaf application using voice recognition system | |
CN111160124A (en) | Depth model customization method based on knowledge reorganization | |
CN109934304B (en) | Blind domain image sample classification method based on out-of-limit hidden feature model | |
Świetlicka et al. | Graph neural networks for natural language processing in human-robot interaction | |
Zheng et al. | A two-channel speech emotion recognition model based on raw stacked waveform | |
CN112465054A (en) | Multivariate time series data classification method based on FCN | |
Ghadirian et al. | Hybrid adaptive modularized tri-factor non-negative matrix factorization for community detection in complex networks | |
Wan et al. | Co-compressing and unifying deep cnn models for efficient human face and speaker recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |