CN111178389B - Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling - Google Patents

Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling Download PDF

Info

Publication number
CN111178389B
CN111178389B CN201911244389.3A CN201911244389A CN111178389B CN 111178389 B CN111178389 B CN 111178389B CN 201911244389 A CN201911244389 A CN 201911244389A CN 111178389 B CN111178389 B CN 111178389B
Authority
CN
China
Prior art keywords
modal
tensor
data
dimension
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911244389.3A
Other languages
Chinese (zh)
Other versions
CN111178389A (en
Inventor
唐佳佳
金宣妤
孔万增
张建海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201911244389.3A priority Critical patent/CN111178389B/en
Publication of CN111178389A publication Critical patent/CN111178389A/en
Application granted granted Critical
Publication of CN111178389B publication Critical patent/CN111178389B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multi-mode depth layering fusion emotion analysis method based on multi-channel tensor pooling. Based on the attention mechanism method, corresponding weights can be set for the multi-modal data, and the importance degrees of the data in different modes are divided, so that the interaction of the multi-modal data with large contribution degrees is amplified in the fusion part according to the different contribution degrees of the data in different modes to the task. And compared with the single-channel polynomial tensor pooling module, the multi-channel polynomial tensor pooling module can obtain high-robustness local high-dimensionality complex nonlinear interaction information from a fine-granularity layer. On the basis of judging the importance degree of multi-modal data, the invention can depict stable local high-dimensional complex dynamic interaction information from a fine-grained level, and is an effective supplement to a multi-modal fusion framework in the field of current emotion recognition.

Description

Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling
Technical Field
The invention belongs to the field of multi-modal emotion recognition in the cross field of natural language processing, vision and voice, and particularly relates to a method for judging a tested emotion state by performing fine-grained layered fusion on multi-modal information based on a multi-channel polynomial tensor pooling technology of an attention mechanism.
Background
How to effectively judge the emotional state of an individual is always a current research hotspot. For example, the commodity website can analyze and judge the evaluation of a specific commodity by the consumer according to the facial expression, voice or text evaluation of the consumer, so as to obtain emotional feedback (negative emotion or positive emotion) of the consumer on the commodity.
Single modality data, such as facial expressions, speech data or text data, respectively, may be used for emotional state recognition, but single modality data is not sufficient to fully characterize a certain emotional state, the multimodal data can supplement the emotion recognition task with information from multiple perspectives (e.g., from textual information analysis alone, only fuzzy emotion state discrimination may be possible, but emotion type can be further determined in conjunction with expression information. for example, an individual can say "you can really be annoying" in a full smile, from text alone "you can really be annoying" it can be determined that the individual's current valence is negative, but from the individual's facial expression a distinct opposite valence determination-positive emotion), meanwhile, interactive information among the multi-modal data can be used as a common characteristic mode contained in the multi-modal data, so that the robustness of the emotion recognition task is enhanced.
The current multi-modal data fusion method is generally analyzed from a coarse-grained perspective, and only two simple linear fusion methods, namely bilinear fusion and trilinearity fusion, are generally considered, so that only low-dimensional simple interaction information between multi-modal data can be obtained. Meanwhile, the existing tensor-based linear fusion method is to carry out integral decomposition on tensor data obtained by fusion, so that storage burden and computational complexity are increased (because the required storage capacity tends to grow exponentially along with the increase of a fusion order), and higher-order and more-complex interaction cannot be carried out. Meanwhile, the existing multi-modal interaction model considers that the importance degree of each modal data is the same during interaction, different weight information is not given to a plurality of modal data, and the final task precision is deviated.
Disclosure of Invention
The invention aims to provide a multi-mode depth layering fusion emotion analysis method based on multi-channel tensor pooling, aiming at the defects of the prior art. Firstly, an attention network is added to multi-modal data, different weight information is set for each modality, and different importance degrees are represented (the effect of the modality data with large contribution degree on an interaction part can be amplified). Secondly, multi-channel tensor pooling characterization is performed on multi-modal data obtained through the attention network (stability of data characterization is enhanced). And finally, performing deep layered cyclic fusion on the multichannel tensor pooling representation data, wherein the obtained global information representation can be used for judging the emotional task.
The technical scheme adopted by the invention is as follows:
step 1, acquiring multi-modal information data
The modality is a source or a form of information, and the multimodal information data comprises voice, video, words and other media data capable of recording human emotion information.
Step 2, multi-mode information data preprocessing
In order to avoid overlarge feature data distribution difference of each modal information data, a Long Short-Term Memory (LSTM) network or a Gated cycle unit (GRU) network is adopted to respectively extract a Short-Term Memory vector of each modal information data at each moment as a feature vector of the moment;
Figure BDA0002307123420000021
wherein
Figure BDA0002307123420000022
The characteristic vector corresponding to the t-th moment representing the m-th modal vector, namely the short-term memory vector of the LSTM network at the t-th moment, goutIs the output gate of the LSTM network, C (t) is the long-term memory unit of the LSTM network, and f is the activation function.
Step 3, multi-mode data information organization
Organizing the characteristic vectors of the modal information data preprocessed in the step 2 into a pseudo two-dimensional matrix G, wherein the first dimension is a time dimension, the second dimension is a modal dimension, and each element in the matrix represents the characteristic vector of the corresponding time modal;
Figure BDA0002307123420000023
wherein T represents the size of the data time dimension, and M represents the number of modes;
step 4, attention mechanism setting
Aiming at the pseudo two-dimensional matrix G obtained in the step 3, an attention network is set for all modal data at all moments to obtain a new pseudo two-dimensional matrix G1
Figure BDA0002307123420000024
Wherein
Figure BDA0002307123420000025
Respectively in each mode
Figure BDA0002307123420000026
The weight at the t-th moment;
Figure BDA0002307123420000027
representing a modular multiplication.
Step 5, multi-channel high-order polynomial tensor pooling operation of multi-modal information
5.1 number of initialization iterations k equals 1, size in time dimension T0=T;
5.2 at size T0In the time dimension, for a pseudo two-dimensional matrix GkSplicing all the characteristic vectors of any two modes in a time window to obtain a new characteristic vector zij(ii) a Then z is paired according to equation (4)ijCarrying out high-order (P-order) polynomial fusion operation to obtain P-order data tensor Zp
Figure BDA0002307123420000031
Wherein
Figure BDA0002307123420000032
Representing tensor product operations, i, j ∈ [1, M ]];
The length of the time window is T1Step length is s;
then to ZpPerforming C single-channel low-rank tensor pooling operation according to each dimension of the P-order tensor to finally obtain C new eigenvectors
Figure BDA0002307123420000033
Wherein the feature vector
Figure BDA0002307123420000034
H-th data element z ofhThe following were used:
Figure BDA0002307123420000035
wherein WhIs a tensor weight of the P order, i1,…,ipSubscripts for each dimension of the P-order tensor;
for the C new feature vectors
Figure BDA0002307123420000036
Performing maximum pooling to obtain local feature vector of two-mode information fusion in the time window
Figure BDA0002307123420000037
Wherein
Figure BDA0002307123420000038
H-th data element z'hThe following were used:
Figure BDA0002307123420000039
wherein C is the number of times of single-channel tensor pooling operation of modal information in the same time window, namely the number of channels of the multi-channel tensor pooling operation; whcA P-order tensor weight for the c-th channel;
for pseudo two-dimensional matrix GkAll the modal feature vectors are subjected to the two-modal fusion operation to obtain a plurality of modal feature vectors
Figure BDA00023071234200000310
The final build size is
Figure BDA00023071234200000311
Pseudo two-dimensional matrix G ofk+1
5.3 judging whether k is more than or equal to N, wherein N is the maximum iteration number, and if so, judging whether k is more than or equal to NIf yes, outputting the current pseudo two-dimensional matrix Gk+1Otherwise, resetting k to k +1,
Figure BDA00023071234200000312
and jumps to step 5.2.
Step 6, multi-modal global interaction
For the pseudo two-dimensional matrix G output in the step (5)k+1Splicing all the characteristic vectors to obtain a new characteristic vector z'; and then carrying out a high-order (P-order) polynomial fusion operation (such as formula (4)) on Z 'to obtain a P-order data tensor Z'pAnd then to Z'pAnd (3) performing multichannel low-rank tensor pooling operation according to the dimensions of the P-order tensor (such as the formula (6)), and finally obtaining the global eigenvector z.
Step 7, multi-modal information data classification
And (4) comparing the global interaction vector z obtained in the step (6) with a previous emotion class label to finally obtain a classification result.
The emotion category label is an emotion category label marked in advance when the emotion modal information data is collected in the step (1).
The invention has the beneficial effects that: according to the method, the importance degree of corresponding weight division of different modal data is set for multi-modal data in combination with an attention-based mechanism method, so that the modal data with large contribution degree are amplified in a fusion part for interaction according to the contribution degree of the different modal data to a task; secondly, the multi-channel tensor pooling operation is adopted, and the problem of unstable high-dimensional complex interaction existing in single-channel tensor pooling is solved. The method is based on the iterative fusion of different contribution degrees of multi-modal data, and the stable high-dimensional complex dynamic interaction information with strong robustness is described from a fine granularity level, so that the method is an effective supplement for a multi-modal fusion framework in the current emotion recognition field.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of the multi-channel higher-order polynomial tensor pooling operation of the multi-modal information of the present invention;
FIG. 3 is a diagram of a layered fusion framework of the present invention;
FIG. 4 is a schematic illustration of an attention mechanism;
FIG. 5 is a schematic diagram of a single channel polynomial tensor pooling module;
FIG. 6 is a schematic diagram of a multi-channel polynomial tensor pooling module.
Detailed Description
The method of the present invention will be described in detail below with reference to the accompanying drawings.
The multi-modal depth layered fusion emotion analysis method based on multi-channel tensor pooling is shown in figure 1:
step 1, obtaining three modal information data of text, video and audio of an individual by the prior art
Only fuzzy emotion state judgment can be obtained according to the text information, namely, the emotion type (such as negative emotion or positive emotion) cannot be accurately judged from the text information; the emotional valence (positive or negative) can be preliminarily determined according to the facial expression of the individual in the video; the emotional activation degree can be objectively judged according to the fluctuation state (such as amplitude value) of the sound in a certain period of time.
Step 2, multi-mode information data preprocessing
In order to avoid overlarge feature data distribution difference of each modal information data, a Long Short-Term Memory (LSTM) network or a Gated cycle unit (GRU) network is adopted to respectively extract a Short-Term Memory vector of each modal information data at each moment as a feature vector of the moment;
Figure BDA0002307123420000051
wherein
Figure BDA0002307123420000052
The characteristic vector corresponding to the t-th moment representing the m-th modal vector, namely the short-term memory vector of the LSTM network at the t-th moment, goutIs the output gate of the LSTM network, C (t) is the long-term memory unit of the LSTM network, and f is the activation function.
Step 3, multi-mode data information organization
Organizing the feature vectors of the modal information data preprocessed in the step 2 into a pseudo two-dimensional matrix G, wherein the first dimension is a time dimension (T is 8), the second dimension is a modal dimension (M is 3), and each element in the matrix represents the feature vector of the corresponding time modal;
Figure BDA0002307123420000053
wherein T represents the size of the data time dimension, and M represents the number of modes;
step 4, attention mechanism setting
Aiming at the pseudo two-dimensional matrix G obtained in the step 3, an attention network is set for all modal data at all moments to obtain a new pseudo two-dimensional matrix G1
Figure BDA0002307123420000054
Wherein
Figure BDA0002307123420000055
Respectively in each mode
Figure BDA0002307123420000056
The weight at the t-th moment;
Figure BDA0002307123420000057
representing a modular multiplication.
Figure BDA0002307123420000058
Respectively are feature vectors of three modes of text, video and audio.
Step 5, multi-channel high-order polynomial tensor pooling operation of multi-modal information: firstly, scanning a time window along a modal dimension to respectively obtain a video mode and an audio mode]Text modality, audio modality]And [ text modalityVideo modality]After the modal dimension scanning is finished, the scanning is carried out along the time dimension, so that 12 new eigenvectors can be obtained from the first layer and used as a pseudo two-dimensional matrix G of the second layer2Then, two-by-two modal information data fusion is carried out on the feature vectors of the second layer, and 6 new feature vectors can be obtained on the second layer and used as a pseudo two-dimensional matrix G of the third layer3And finally, fusing all nodes of the time window including the current layer on the third layer, and taking the finally obtained output feature vector as an emotional state judgment basis.
5.1 number of initialization iterations k equals 1, size in time dimension T0=T;
5.2 at size T0In the time dimension, e.g. FIG. 3 for a pseudo two-dimensional matrix GkSplicing all the characteristic vectors of any two modes in a time window to obtain a new characteristic vector zij(ii) a Then z is paired according to equation (4)ijCarrying out high-order (P-order) polynomial fusion operation to obtain P-order data tensor Zp
Figure BDA0002307123420000061
Wherein
Figure BDA0002307123420000062
Representing tensor product operations, i, j ∈ [1,3 ]];
The length of the time window is T1Step length is s; t is12 (including t)1And t2Time data), s is 2;
as shown in FIG. 5, the conventional is ZpPerforming single-channel low-rank tensor pooling operation according to dimensions of P-order tensor, and finally outputting a new eigenvector z as the output of each time windowij', wherein the feature vector zij' the h-th data element zhThe following were used:
Figure BDA0002307123420000063
wherein WhIs a tensor weight of the P order, i1,…,ipSubscripts for each dimension of the P-order tensor;
however, although the pooling of the single-channel high-order (P-th order) polynomial fusion tensor can obtain high-dimensional complex interaction information, the method may have an unstable model, so that the robustness of the method is stronger, the invention provides the pooling operation of the multiple single-channel high-order (P-th order) polynomial fusion tensors as shown in fig. 6, specifically:
to ZpPerforming C single-channel low-rank tensor pooling operation according to each dimension of the P-order tensor to finally obtain C new eigenvectors
Figure BDA0002307123420000064
Wherein the feature vector
Figure BDA0002307123420000065
H-th data element zhThe following were used:
Figure BDA0002307123420000066
wherein WhIs a tensor weight of the P order, i1,…,ipSubscripts for each dimension of the P-order tensor;
for the C new feature vectors
Figure BDA0002307123420000067
Performing maximum pooling to obtain local feature vector of two-mode information fusion in the time window
Figure BDA0002307123420000068
Wherein
Figure BDA0002307123420000069
H-th data element z'hThe following were used:
Figure BDA00023071234200000610
formula (where C is the number of times of single-channel tensor pooling operation of modal information in the same time window, i.e. the number of channels of multi-channel tensor pooling operation; WhcA P-order tensor weight for the c-th channel;
as shown in fig. 6, which is a schematic diagram of the multi-channel polynomial tensor pooling module of the present invention, compared to the single-channel polynomial tensor pooling module, the multi-channel pooling operation performs multiple high-order (P-order) polynomial fusion operations on the spliced data to obtain multiple P-order data tensors, and finally outputs multiple new eigenvectors in one time window, and a maximum pooling operation is performed on the multiple eigenvectors, that is, a maximum value solving operation is performed on all element sets specified by the same subscript of the multiple eigenvectors, and the obtained maximum value is used as a new element specified by the subscript, so that finally the multiple eigenvectors perform a dimension reduction operation along the channel dimension, and only one eigenvector is obtained as an output of the time window, which greatly increases robustness and reduces randomness.
For pseudo two-dimensional matrix GkAll the modal feature vectors are subjected to the two-modal fusion operation to obtain a plurality of modal feature vectors
Figure BDA0002307123420000071
The final build size is
Figure BDA0002307123420000072
Pseudo two-dimensional matrix G ofk+1
5.3, judging whether k is larger than or equal to N, wherein N is the maximum iteration number (N is 2), and if so, outputting the current pseudo two-dimensional matrix Gk+1Otherwise, resetting k to k +1,
Figure BDA0002307123420000073
and jumps to step 5.2.
Step 6, multi-modal global interaction
For the pseudo two-dimensional matrix G output in the step (5)k+1Splicing all the characteristic vectors to obtain a new characteristic vector z'; then, performing high-order (P order) polynomial fusion on zOperating (as formula (4)) to obtain P-order data tensor Z'pZ 'to'pAnd (3) performing multichannel low-rank tensor pooling operation according to the dimensions of the P-order tensor (such as the formula (6)), and finally obtaining the global eigenvector z.
Step 7, multi-modal information data classification
And (4) comparing the global interaction vector z obtained in the step (6) with a previous emotion class label to finally obtain a classification result.
As shown in Table 1, the emotion state discrimination task is performed on two multi-modal emotion databases CMU-MOSI and IEMOCAP simultaneously by the method and the four basic multi-modal fusion methods, MAE is mean square error, CORR is Pearson correlation coefficient, ACC-7 is 7 classification accuracy, and a plurality of indexes for comparing and measuring the discrimination task are known, so that the result of the method is superior to or equivalent to that of a basic model.
TABLE 1 comparison of results
Figure BDA0002307123420000074
Figure BDA0002307123420000081

Claims (1)

1. The multi-modal depth layered fusion emotion analysis method based on multi-channel tensor pooling is characterized by comprising the following steps of:
step 1, acquiring multi-modal information data
Step 2, multi-mode information data preprocessing
Respectively extracting short-term memory vectors of each moment of each modal information data as characteristic vectors of the moment by adopting a long-term and short-term memory network or a gated cyclic unit network;
Figure FDA0003333210340000011
wherein
Figure FDA0003333210340000012
The characteristic vector corresponding to the t-th moment representing the m-th modal vector, namely the short-term memory vector of the LSTM network at the t-th moment, goutIs the output gate of the LSTM network, C (t) is the long-term memory unit of the LSTM network, and f is the activation function;
step 3, multi-mode data information organization
Organizing the characteristic vectors of the modal information data preprocessed in the step 2 into a pseudo two-dimensional matrix G, wherein the first dimension is a time dimension, the second dimension is a modal dimension, and each element in the matrix represents the characteristic vector of the corresponding time modal;
Figure FDA0003333210340000013
wherein T represents the size of the data time dimension, and M represents the number of modes;
step 4, attention mechanism setting
Aiming at the pseudo two-dimensional matrix G obtained in the step 3, an attention network is set for all modal data at all moments to obtain a new pseudo two-dimensional matrix G1
Figure FDA0003333210340000014
Wherein
Figure FDA0003333210340000015
Respectively in each mode
Figure FDA0003333210340000016
The weight at the t-th moment;
Figure FDA0003333210340000017
represents a modular multiplication;
step 5, multi-channel high-order polynomial tensor pooling operation of multi-modal information
5.1 number of initialization iterations k equals 1, size in time dimension T0=T;
5.2 at size T0In the time dimension, for a pseudo two-dimensional matrix GkSplicing all the characteristic vectors of any two modes in a time window to obtain a new characteristic vector zij(ii) a Then z is paired according to equation (4)ijCarrying out high-order polynomial fusion operation to obtain P-order data tensor Zp
Figure FDA0003333210340000021
Wherein
Figure FDA0003333210340000022
Representing tensor product operations, i, j ∈ [1, M ]];
The length of the time window is T1Step length is s;
then to ZpPerforming C single-channel low-rank tensor pooling operation according to each dimension of the P-order tensor to finally obtain C new eigenvectors
Figure FDA0003333210340000023
Wherein the feature vector
Figure FDA0003333210340000024
H-th data element zhThe following were used:
Figure FDA0003333210340000025
wherein WhIs a tensor weight of the P order, i1,...,ipSubscripts for each dimension of the P-order tensor;
for the C new feature vectors
Figure FDA0003333210340000026
Performing maximum pooling to obtain local feature vector of two-mode information fusion in the time window
Figure FDA0003333210340000027
Wherein
Figure FDA0003333210340000028
H-th data element z'hThe following were used:
Figure FDA0003333210340000029
wherein C is the number of times of single-channel tensor pooling operation of modal information in the same time window, namely the number of channels of the multi-channel tensor pooling operation; whcA P-order tensor weight for the c-th channel;
for pseudo two-dimensional matrix GkAll the modal feature vectors are subjected to the two-modal fusion operation to obtain a plurality of modal feature vectors
Figure FDA00033332103400000210
The final build size is
Figure FDA00033332103400000211
Pseudo two-dimensional matrix G ofk+1
5.3 judging whether k is more than or equal to N, wherein N is the maximum iteration number, and if so, outputting the current pseudo two-dimensional matrix Gk+1Otherwise, resetting k to k +1,
Figure FDA00033332103400000212
and skipping to step 5.2;
step 6, multi-modal global interaction
For the pseudo two-dimensional matrix G output in the step (5)k+1Splicing all the characteristic vectors to obtain a new characteristic vector z'; then carrying out high-order polynomial fusion operation on Z 'to obtain P-order data tensor Z'pZ 'to'pAccording to the P-th order tensorPerforming multichannel low-rank tensor pooling operation on each dimension to finally obtain a global eigenvector z;
step 7, multi-modal information data classification
And (4) comparing the global interaction vector z obtained in the step (6) with a previous emotion class label to finally obtain a classification result.
CN201911244389.3A 2019-12-06 2019-12-06 Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling Active CN111178389B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911244389.3A CN111178389B (en) 2019-12-06 2019-12-06 Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911244389.3A CN111178389B (en) 2019-12-06 2019-12-06 Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling

Publications (2)

Publication Number Publication Date
CN111178389A CN111178389A (en) 2020-05-19
CN111178389B true CN111178389B (en) 2022-02-11

Family

ID=70655407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911244389.3A Active CN111178389B (en) 2019-12-06 2019-12-06 Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling

Country Status (1)

Country Link
CN (1) CN111178389B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753549B (en) * 2020-05-22 2023-07-21 江苏大学 Multi-mode emotion feature learning and identifying method based on attention mechanism
CN111786979B (en) * 2020-06-24 2022-07-22 杭州电子科技大学 Power attack identification method based on multi-mode learning
CN112199504B (en) * 2020-10-30 2022-06-03 福州大学 Visual angle level text emotion classification method and system integrating external knowledge and interactive attention mechanism
CN112329604B (en) * 2020-11-03 2022-09-20 浙江大学 Multi-modal emotion analysis method based on multi-dimensional low-rank decomposition
CN112329633B (en) * 2020-11-05 2022-08-23 南开大学 Emotion identification method, device, medium and electronic equipment based on tensor decomposition
CN112597841B (en) * 2020-12-14 2023-04-18 之江实验室 Emotion analysis method based on door mechanism multi-mode fusion
CN112612936B (en) * 2020-12-28 2022-03-08 杭州电子科技大学 Multi-modal emotion classification method based on dual conversion network
CN113064968B (en) * 2021-04-06 2022-04-19 齐鲁工业大学 Social media emotion analysis method and system based on tensor fusion network
CN113208593A (en) * 2021-04-08 2021-08-06 杭州电子科技大学 Multi-modal physiological signal emotion classification method based on correlation dynamic fusion
CN113469365B (en) * 2021-06-30 2024-03-19 上海寒武纪信息科技有限公司 Reasoning and compiling method based on neural network model and related products thereof
CN114511494A (en) * 2021-12-21 2022-05-17 北京医准智能科技有限公司 Gland density grade determining method and device and computer readable storage medium
CN116563751B (en) * 2023-04-19 2024-02-06 湖北工业大学 Multi-mode emotion analysis method and system based on attention mechanism

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409296A (en) * 2018-10-30 2019-03-01 河北工业大学 The video feeling recognition methods that facial expression recognition and speech emotion recognition are merged
CN110287389A (en) * 2019-05-31 2019-09-27 南京理工大学 The multi-modal sensibility classification method merged based on text, voice and video

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164302A1 (en) * 2007-12-20 2009-06-25 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Methods and systems for specifying a cohort-linked avatar attribute

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409296A (en) * 2018-10-30 2019-03-01 河北工业大学 The video feeling recognition methods that facial expression recognition and speech emotion recognition are merged
CN110287389A (en) * 2019-05-31 2019-09-27 南京理工大学 The multi-modal sensibility classification method merged based on text, voice and video

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
EmoSense: Automatically Sensing Emotions From Speech By Multi-way Classification;V Ramu Reddy等;《IEEE》;20181029;第4987-4990页 *
基于多模态判别性嵌入空间的图像情感分析;吕光瑞;《北京邮电大学学报》;20190319;第42卷(第1期);第61-67页 *

Also Published As

Publication number Publication date
CN111178389A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
CN111178389B (en) Multi-mode depth layered fusion emotion analysis method based on multi-channel tensor pooling
CN112818861B (en) Emotion classification method and system based on multi-mode context semantic features
Dering et al. A convolutional neural network model for predicting a product's function, given its form
Zheng et al. An ensemble model for multi-level speech emotion recognition
CN112699774A (en) Method and device for recognizing emotion of person in video, computer equipment and medium
CN112487949B (en) Learner behavior recognition method based on multi-mode data fusion
CN112560495A (en) Microblog rumor detection method based on emotion analysis
CN112508077A (en) Social media emotion analysis method and system based on multi-modal feature fusion
CN114973062A (en) Multi-modal emotion analysis method based on Transformer
Pandey et al. Attention gated tensor neural network architectures for speech emotion recognition
CN112732921B (en) False user comment detection method and system
CN111985612B (en) Encoder network model design method for improving video text description accuracy
CN102663432A (en) Kernel fuzzy c-means speech emotion identification method combined with secondary identification of support vector machine
CN110502757B (en) Natural language emotion analysis method
CN114443899A (en) Video classification method, device, equipment and medium
CN115545093A (en) Multi-mode data fusion method, system and storage medium
CN112541541B (en) Lightweight multi-modal emotion analysis method based on multi-element layering depth fusion
Prasath Design of an integrated learning approach to assist real-time deaf application using voice recognition system
CN111160124A (en) Depth model customization method based on knowledge reorganization
CN109934304B (en) Blind domain image sample classification method based on out-of-limit hidden feature model
Świetlicka et al. Graph neural networks for natural language processing in human-robot interaction
Zheng et al. A two-channel speech emotion recognition model based on raw stacked waveform
CN112465054A (en) Multivariate time series data classification method based on FCN
Ghadirian et al. Hybrid adaptive modularized tri-factor non-negative matrix factorization for community detection in complex networks
Wan et al. Co-compressing and unifying deep cnn models for efficient human face and speaker recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant