CN114140885A - Emotion analysis model generation method and device, electronic equipment and storage medium - Google Patents

Emotion analysis model generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114140885A
CN114140885A CN202111450929.0A CN202111450929A CN114140885A CN 114140885 A CN114140885 A CN 114140885A CN 202111450929 A CN202111450929 A CN 202111450929A CN 114140885 A CN114140885 A CN 114140885A
Authority
CN
China
Prior art keywords
emotion
feature vector
fusion
sample
analysis model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111450929.0A
Other languages
Chinese (zh)
Inventor
邱锋
谢程阳
丁彧
吕唐杰
范长杰
胡志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202111450929.0A priority Critical patent/CN114140885A/en
Publication of CN114140885A publication Critical patent/CN114140885A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method and a device for generating an emotion analysis model, electronic equipment and a computer storage medium, wherein the method for generating the emotion analysis model comprises the following steps: acquiring a multi-modal feature vector of a sample object marked with emotional state; obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to all the modalities; performing fusion processing on modal feature vectors corresponding to each modal in the multi-modal feature vectors to obtain a first emotion fusion feature vector of the sample object; fusing the private characteristic vectors of the modes to obtain a second emotion fused characteristic vector of the sample object; splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, taking the splicing result and the emotion state marked by the sample object as training samples to train a preset initial emotion analysis model, and obtaining an emotion analysis model for determining the emotion of the user.

Description

Emotion analysis model generation method and device, electronic equipment and storage medium
Technical Field
The application relates to the field of artificial intelligence, in particular to a method and a device for generating an emotion model, electronic equipment and a storage medium.
Background
With the development of artificial intelligence, intelligent interaction plays an increasingly important role in more and more fields.
Human excellence in interpreting the emotional state of interlocutors from various modal signals, including: the speaker's accent, phonetic text, facial expressions, etc. The ability to impart emotional comprehension to machines has long been a research goal of those skilled in the art. The technology can be widely applied to scenes such as interactive games, interactive movies, virtual tour guides, virtual assistants, artificial intelligence clients and the like.
At present, how to decode human emotion from a complex human-computer interaction process is still an important problem faced by those skilled in the art, and to solve the problem, the prior art generally directly learns a single-mode signal or a multi-mode signal of a sample character to perform emotion analysis, so as to obtain an emotion recognition model for recognizing the emotion of a target character. However, such schemes cannot sufficiently and effectively analyze the emotional characteristics of the person.
Therefore, how to sufficiently and effectively analyze multi-modal information becomes a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
The embodiment of the application provides a method and a device for generating an emotion analysis model, electronic equipment and a computer storage medium, so as to solve the technical problem that the emotion characteristics of a person cannot be obtained through sufficient and effective analysis in the prior art. The application also provides an emotion analysis method and a corresponding device, electronic equipment and computer storage medium thereof.
The method for generating the emotion analysis model provided by the embodiment of the application comprises the following steps:
acquiring a multi-modal feature vector of a sample object marked with emotional state;
obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to all the modalities;
performing fusion processing on modal feature vectors corresponding to the modes in the multi-modal feature vectors to obtain a first emotion fusion feature vector of the sample object;
performing fusion processing on the private characteristic vectors of the modes to obtain a second emotion fusion characteristic vector of the sample object;
and splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, and training a preset initial emotion analysis model by taking a splicing result and the emotion state marked by the sample object as training samples to obtain an emotion analysis model for determining the emotion of the user.
Optionally, the obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality includes:
obtaining the shared feature vector according to the mean value of the multi-modal feature vectors;
and decoupling each modal feature vector in the multi-modal feature vectors according to the shared features to obtain the private feature vector corresponding to each modal.
Optionally, the obtaining the multi-modal feature vector of the sample object marked with the emotional state includes:
and obtaining the multi-modal feature vector through a convolutional neural network and multi-modal information.
Optionally, the performing fusion processing on the modal feature vectors corresponding to the modalities in the multi-modal feature vector to obtain the first emotion fusion feature vector of the sample object includes:
and performing fusion processing on the modal feature vectors of each mode in the multi-mode feature vector set in an outer product mode to obtain a first emotion fusion feature vector of the sample object.
Optionally, the fusing the private feature vectors of the modalities to obtain a second emotion fused feature vector of the sample object includes:
and carrying out fusion processing on the private characteristic vectors of all the modes in an outer product mode to obtain a second emotion fusion characteristic vector of the sample object.
Optionally, the splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and training a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as a training sample to obtain an emotion analysis model for determining user emotion, including:
splicing the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector to obtain a first sample feature matrix;
and taking the first sample feature matrix and the emotional state labeled by the sample object as training samples to train the initial emotion analysis model to obtain the emotion analysis model.
Optionally, the splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and training a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as a training sample to obtain an emotion analysis model for determining user emotion, including:
splicing the first emotion fusion feature vector and the second emotion fusion feature vector to obtain a second sample feature matrix;
and taking the second sample feature matrix and the emotional state labeled by the sample object as training samples to train the initial emotion analysis model, so as to obtain the emotion analysis model.
Optionally, the splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and training a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as a training sample to obtain an emotion analysis model for determining user emotion, including:
splicing the first emotion fusion feature vector and the shared feature vector to obtain a third sample feature matrix;
and taking the third sample feature matrix and the emotional state labeled by the sample object as training samples to train the initial emotion analysis model, so as to obtain the emotion analysis model.
Optionally, the splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and training a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as a training sample to obtain an emotion analysis model for determining user emotion, including:
splicing the second emotion fusion feature vector and the shared feature vector to obtain a fourth sample feature matrix;
and taking the fourth sample feature matrix and the emotional state labeled by the sample object as training samples to train the initial emotion analysis model, so as to obtain the emotion analysis model.
Optionally, the splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and training a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as a training sample, include:
splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector to obtain a splicing result containing a sample feature matrix;
inputting the sample feature matrix into the initial emotion analysis model, and obtaining a predicted value aiming at the emotional state of the sample object through the initial emotion analysis model;
and performing classification task training and regression prediction task training on the initial emotion analysis model according to the emotion state predicted value and the emotion state labeled by the sample object, and taking the trained initial emotion analysis model as the emotion analysis model.
Optionally, the method further includes:
determining the probability of each matrix element in the sample characteristic matrix;
according to the probability of each matrix element, giving a weight to each matrix element;
adjusting the sample characteristic matrix based on the weight of each matrix element to obtain a sample characteristic matrix after weight adjustment;
and taking the sample feature matrix after the weight adjustment and the emotional state labeled by the sample object as the training sample.
Optionally, the multi-modality includes at least two of audio, text, and images.
The application also provides a method for generating the emotion analysis model, which comprises the following steps:
acquiring a multi-modal feature vector of a sample object marked with emotional state;
obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to all the modalities;
performing fusion processing on the private characteristic vectors of the modes to obtain a second emotion fusion characteristic vector of the sample object;
and taking one of the second emotion fusion feature and the shared feature and the emotion state marked by the sample object as a training sample to train a preset initial emotion analysis model, so as to obtain an emotion analysis model for determining the emotion of the user.
This application provides an emotion analysis model's generating device simultaneously, includes:
a first obtaining module, configured to obtain a multi-modal feature vector of a sample object labeled with an emotional state
The second acquisition module is used for acquiring shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to all the modalities;
the first fusion module is used for performing fusion processing on modal feature vectors corresponding to all the modalities in the multi-modal feature vectors to obtain a first emotion fusion feature vector of the sample object;
the second fusion module is used for carrying out fusion processing on the private characteristic vectors of the modes to obtain a second emotion fusion characteristic vector of the sample object;
and the first training module is used for splicing at least two of the first emotion fusion characteristic vector, the second emotion fusion characteristic vector and the shared characteristic vector, training a preset initial emotion analysis model by taking a splicing result and the emotion state marked by the sample object as training samples, and obtaining an emotion analysis model for determining the emotion of the user.
This application provides an emotion analysis model's generating device simultaneously, includes:
the third acquisition module is used for acquiring the multi-modal feature vector of the sample object marked with the emotional state;
the fourth acquisition module is used for acquiring shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to all the modalities;
the third fusion module is used for carrying out fusion processing on the private characteristic vectors of the modes to obtain a second emotion fusion characteristic vector of the sample object;
and the second training module is used for taking one of the second emotion fusion characteristic and the shared characteristic and the emotion state marked by the sample object as a training sample to train a preset initial emotion analysis model so as to obtain an emotion analysis model for determining the emotion of the user.
The application also provides an emotion analysis method, which comprises the following steps:
acquiring a multi-modal feature vector of a target object;
obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to all the modalities;
performing fusion processing on modal feature vectors corresponding to the modes in the multi-modal feature vectors to obtain a first emotion fusion feature vector of the target object;
performing fusion processing on the private characteristic vectors of the modes to obtain a second emotion fusion characteristic vector of the target object;
splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, and inputting splicing results into an emotion analysis model for determining user emotion to obtain emotion analysis results of the target user;
wherein, the emotion analysis model is obtained by any one of the generation methods of the emotion analysis models.
Optionally, an emotion analysis method is characterized by including:
acquiring a multi-modal feature vector of a target object;
obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to all the modalities;
performing fusion processing on the private characteristic vectors of the modes to obtain a second emotion fusion characteristic vector of the target object;
inputting one of the second emotion fusion feature or the shared feature into an emotion analysis model for determining user emotion to obtain an emotion analysis result of the target user;
wherein, the emotion analysis model is obtained according to the production method of any one of the emotion analysis models.
This application provides an emotion analysis device simultaneously, includes:
the fifth acquisition module is used for acquiring the multi-modal feature vector of the target object;
a sixth obtaining module, configured to obtain shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to the modalities;
the fourth fusion module is used for performing fusion processing on the modal feature vectors corresponding to the modalities in the multi-modal feature vectors to obtain a first emotion fusion feature vector of the target object;
a fifth fusion module, configured to perform fusion processing on the private feature vectors of the respective modalities to obtain a second emotion fusion feature vector of the target object;
the first analysis module is used for splicing at least two of the first emotion fusion characteristic vector, the second emotion fusion characteristic vector and the shared characteristic vector, inputting splicing results into an emotion analysis model for determining user emotion, and obtaining emotion analysis results of the target user;
wherein, the emotion analysis model is obtained according to the production method of any one of the emotion analysis models.
This application provides an emotion analysis device simultaneously, includes:
the seventh acquisition module is used for acquiring the multi-modal feature vector of the target object;
an eighth obtaining module, configured to obtain shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality;
a sixth fusion module, configured to perform fusion processing on the private feature vectors of the respective modalities to obtain a second emotion fusion feature vector of the target object;
the second analysis module is used for inputting one of the second emotion fusion feature or the shared feature into an emotion analysis model used for determining user emotion to obtain an emotion analysis result of the target user;
wherein, the emotion analysis model is obtained according to the production method of any one of the emotion analysis models.
This application provides an electronic equipment simultaneously, includes:
a processor;
a memory for storing a program of a method, which when read run by the processor, performs any of the methods described above.
The present application also provides a computer storage medium storing a computer program that, when executed, performs any of the methods described above
Compared with the prior art, the method has the following advantages:
the generating method of the emotion analysis model comprises the following steps: acquiring a multi-modal feature vector of a sample object marked with emotional state; obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to all the modalities; performing fusion processing on modal feature vectors corresponding to the modes in the multi-modal feature vectors to obtain a first emotion fusion feature vector of the sample object; performing fusion processing on the private characteristic vectors of the modes to obtain a second emotion fusion characteristic vector of the sample object; and splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, and training a preset initial emotion analysis model by taking a splicing result and the emotion state marked by the sample object as training samples to obtain an emotion analysis model for determining the emotion of the user.
According to the emotion analysis model generation method, the collected multi-modal feature vectors are decoupled and fused in different modes, all multi-modal features obtained in different modes are fused, and further the modal features obtained in different modes are spliced to obtain a training sample so as to train a neural network to obtain the emotion analysis model capable of analyzing emotion. The method fully considers the connection and difference among the modal feature vectors of the sample object, enhances the emotion characterization capability of the training sample, and can fully and effectively analyze the multi-modal information by the emotion feature analysis model obtained by the method.
Drawings
FIG. 1 is a flowchart of a method for generating an emotion analysis model according to an embodiment of the present application;
FIG. 2 is a logic diagram of emotion analysis model training provided in accordance with another embodiment of the present application;
FIG. 3 is a flowchart of a method for generating an emotion analysis model according to another embodiment of the present application;
FIG. 4 is a flowchart of a sentiment analysis method according to another embodiment of the present application;
FIG. 5 is a flowchart of a sentiment analysis method according to another embodiment of the present application;
FIG. 6 is a schematic structural diagram of an apparatus for generating an emotion analysis model according to another embodiment of the present application;
FIG. 7 is a schematic structural diagram of an apparatus for generating an emotion analysis model according to another embodiment of the present application;
FIG. 8 is a schematic structural diagram of an emotion analyzing apparatus according to another embodiment of the present application;
FIG. 9 is a schematic structural diagram of an emotion analyzing apparatus according to another embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The application provides a method and a device for generating an emotion analysis model, electronic equipment and a computer storage medium, and also provides an emotion analysis method and a device, electronic equipment and a computer storage medium. Details will be described in the following examples one by one.
The application provides a method for generating an emotion analysis model, which is characterized in that: after the multi-modal emotion feature data of the sample objects marked with the emotion states are analyzed to obtain multi-modal emotion vectors of the sample objects, the multi-modal emotion feature data are subjected to fusion processing based on different angles and integrated into the multi-modal fusion feature vector with strong robustness and strong emotion expression capability, and the multi-modal fusion feature vector and the emotion states marked by the training samples are used as initial emotion analysis models preset by training of the training samples to obtain emotion analysis models.
A first embodiment of the present application provides a method for generating an emotion analysis model, please refer to fig. 1, which is a flowchart of a method for generating an emotion analysis model according to an embodiment of the present application, the method includes: step S101 to step S105.
Step S101, obtaining a multi-modal feature vector of a sample object marked with emotional state.
In the first embodiment of the present application, the sample object marked with emotional state may be understood as data information corresponding to a person obtained from the internet or a database, and the emotional state of the person can be intuitively obtained through the person information, for example: the sample object can be a section of communication video of a person obtained from the Internet, and the communication information generated in the video, the intuitive information such as the expression of the person and the like can directly determine the emotional state of the person; another example is: the sample object can be a piece of voice conversation data of the service personnel and the user, which is collected in the manual customer service database, and the emotional state of the user in the conversation process can be directly reflected in the language text and the speaking gas of the user in the conversation.
In an embodiment of the application, the multi-modality comprises at least two of audio, text, and image. The modality refers to an interaction mode between senses (such as vision, hearing and the like) and external environments (such as human beings, machines and animals), for example: assuming that the sample object is a piece of communication video of a person obtained from the internet, a plurality of modal information such as an image, a gesture, voice, an expression, communication text and the like of the person can be extracted from the video. The multi-pass convolutional neural network and multi-modal information acquisition specifically comprises the following steps S101-1 to S101-2:
step S101-1, multi-modal information of the sample object is obtained, wherein the multi-modal information comprises at least two of audio information, text information and image information.
The multi-modal information is a set of multi-modal information of the sample object, and in a first optional embodiment of the present application, the multi-modal information includes at least two of audio information, text information, and image information of the sample object, for example: assuming that audio, language text, and images of a character can be extracted from an interactive video, the multimodal information is corresponding text audio information, text information, and facial expression information of the character. It should be understood that the facial expression information is only one kind of image of the person, and in other embodiments, the image information may be motion information of the person, for example: gestures of a person, posture information, etc.
S101-2, acquiring a modal feature vector corresponding to the multi-modal information through a convolutional neural network based on the multi-modal information; the convolutional neural network is a model which is established by utilizing a modal feature recognition technology in machine learning and is used for determining modal feature vectors corresponding to modal information, and the modal feature recognition technology in machine learning is a process of processing different forms of modal information when the model completes analysis and recognition tasks.
In a specific application process, the convolutional neural network is obtained by training Machine Learning (ML), and the Machine Learning (which is a multi-domain cross subject and relates to multi-domain subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like) is specially used for researching how a computer simulates or realizes the Learning behavior of human beings so as to obtain new knowledge or skills and reorganize the existing knowledge structure to continuously improve the new energy of the computer. Machine learning generally includes techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like. Machine learning belongs to a branch of Artificial Intelligence (AI) technology.
In specific applications, modal features corresponding to different modalities may be extracted through a convolutional neural network, for example: the audio information sent by the sample object can be identified through a Speech recognition model (Speech-Bert), and a corresponding audio feature vector is obtained; determining a voice Text in the sample object interactive audio through a Text recognition model (Text-Bert), and obtaining a corresponding Text feature vector; the facial expression of the sample object can also be determined by the expression recognition model, and the corresponding image feature vector is obtained. The present application is not limited thereto.
Step S102, obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each mode.
The purpose of step S102 is to extract features common to different modalities using characteristics common to the modalities, and to extract features unique to each modality based on the extracted common features.
Specifically, the step S102 includes the following steps S102-1 to S102-2.
And S102-1, processing the modal feature vectors corresponding to the modes in the multi-modal feature vectors in a mode of solving a vector mean value, and determining the shared feature vectors.
Specifically, in the first embodiment of the present application, the multi-modal feature vector is configured to include: audio feature vector haText feature vector htImage feature vector hvThen, the step S102-1 is implemented by the following formula (1):
Figure BDA0003385208620000101
wherein S is a shared feature vector of the multi-modal feature vector.
And S102-2, decoupling the feature vectors of each mode in the multi-mode feature vectors based on the shared feature vectors, and determining the private feature vectors of each mode.
The step S102-2 is to decouple the shared feature vector from the feature vectors of each modality in a parameter-free decoupling manner based on the extracted shared feature vector, and retain unique features of each modality as private features.
Specifically, the step S102-2 can be implemented by the following formula (2):
im=hm-S;m∈{t,v,a}____ (2)
wherein imA private feature vector corresponding to each modality, wherein iaFor audio private feature vectors, itFor text private feature vectors, ivIs an image private feature vector.
In the first embodiment provided by the application, the private feature vector and the common feature vector are obtained for the purpose of serving as a part of a sample for training an emotion analysis model to enhance the difference and common expression capability between the sample and multimodal feature information.
Further, in order to reduce the amount of calculation for the feature vectors and reduce the similarity between the private features, in a preferred embodiment of the present application, a Pooling layer (Pooling) may be introduced to perform dimension reduction on the private feature vectors of each modality.
The pooling layer is one of the common components in current convolutional neural networks, and reduces the amount of computation by sampling data in a partitioned manner to downsample a large matrix into a small matrix, while preventing overfitting. The pooling layer is generally provided with a maximum pooling layer and an average pooling layer, the maximum pooling layer selects the maximum value of each small region as a pooling result, the average pooling layer selects an average value as a pooling result, and the specific selection of the pooling layer is to process the private characteristic vector and the shared characteristic vector and can be selected according to actual needs. The present application is not limited thereto.
Step S103, carrying out fusion processing on the modal feature vectors corresponding to the modes in the multi-modal feature vectors to obtain a first emotion fusion feature vector of the sample object.
The purpose of step S103 is to perform complete multi-modal feature integration based on each single-modal feature vector, thereby avoiding the problem of modal bias.
In an optional embodiment of the present application, the feature vectors of the respective modalities in the multi-modal feature vector may be subjected to a fusion process by means of an outer product to obtain the first emotion fusion feature of the sample object. The high-order relation among the modal characteristics can be established through the calculation mode of the outer product, and meanwhile, the problem of modal bias can be avoided.
Specifically, it is still assumed that the modal feature vector includes: audio feature vector haText feature vector htImage feature vector hvThen the first emotion fusion feature is M ═ ha×ht×hvWherein M represents the first emotion fusion characteristic.
Further, similarly to the step S102, in order to reduce the similarity between the first emotion fused feature vector and other feature vectors, a pooling layer may also be introduced to perform dimensionality reduction on the first emotion fused feature vector.
And step S104, carrying out fusion processing on the private characteristic vectors of the modes to obtain a second emotion fusion characteristic vector of the sample object.
Similar to the step S103, the step S104 also fuses the private feature vectors of each modality in an outer product manner to establish a high-order relationship between the private modality features, so as to avoid the modality bias problem.
Specifically, let i denoteaFor audio private feature vectors, itFor text private feature vectors, ivIs an image private feature vector, the second modality feature I ═ Ia×it×ivWherein I represents the second emotion fusion feature.
Further, similarly to the above steps S102 and S103, in order to reduce the similarity between the second emotion fused feature vector and other feature vectors, a pooling layer may also be introduced to perform dimensionality reduction on the second emotion fused feature vector.
And step S105, splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, taking a splicing result and the emotion state marked by the sample object as training samples to train a preset initial emotion analysis model, and obtaining an emotion analysis model for determining the emotion of the user.
In an optional implementation manner of the present application, in order to improve the robustness of a training sample and improve the representation energy of the training sample on an emotional state to the maximum extent, the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector may be selected to be spliced to obtain the training sample.
Specifically, the step S105 includes the following steps S105-1 to S105-2.
And S105-1, splicing the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector to obtain a first sample feature matrix.
The purpose of the above step S105-1 is to integrate three different visual multi-modal features into one integrated multi-modal interactive feature, i.e. the first sample feature matrix, by means of fusion in a splicing manner.
For example, if the shared eigenvector S is [ S1, S2, S2], the first emotion fusion eigenvector M is [ M1, M2, M3], the second emotion fusion eigenvector I is [ I1, I2, I3], the first sample eigenvector matrix is a 3 × 3 matrix composed of the three eigenvectors, or a 1 × 9 matrix composed of the three eigenvectors.
It can be understood that the above method for obtaining the sample feature matrix by fusing three feature vectors is only an optional implementation manner given in the embodiment of the present application, and other different implementation manners may also be adopted to obtain the sample feature matrix.
Optionally, the first emotion fusion feature vector and the second emotion fusion feature vector may be spliced to obtain a second sample feature matrix, or the first emotion fusion feature vector and the shared feature vector may be spliced to obtain a third sample feature matrix, or the second emotion fusion feature vector and the shared feature vector may be spliced to obtain a fourth sample feature matrix. The above ways of splicing the sample feature matrices are only simple modifications of step S105-1 in the first embodiment of the present application, and the present application is not limited thereto.
Further, in an optional implementation manner of the present application, importance quantization processing may be performed on each element in the sample feature matrix according to an information theory, so as to further mine the ability of each feature to characterize an emotional state.
Specifically, a Norm-gate module is introduced in the process of quantifying the importance of each element in the sample matrix, and the Norm-gate module can enable each element in the sample feature matrix to be adjusted adaptively. The Norm-gate module is a module without additional learning parameters, and by using the module, overfitting of training samples caused by adaptive operation of the module in a model process can be avoided. The Norm-gate module is based on the assumption that all elements in a data set follow a normal distribution, and thus the smaller the probability of the occurrence of an element of the mean of the principle distribution. Based on this, it is known from information theory in the field of machine learning that, if a training sample set is provided to a convolutional neural network for learning, the convolutional neural network learns less training samples with smaller occurrence probability in a data set, which means that the training samples with less occurrence probability in the data set have lower importance than other training samples, but for a convolutional neural network, it is necessary to treat each training sample in the training sample set indiscriminately in order to improve the learning efficiency and learning accuracy.
Furthermore, the sample feature matrix serving as the training sample is formed by fusing feature vectors of different forms of multiple modalities, so that the emotional state of the sample object is naturally embodied in each feature vector element, and it can be understood that in the process of expressing emotion by people, some subtle and imperceptible emotion exposure modes may exist, and in the process of communication by people, a subtle action of a person may reflect the current emotional state of the person. These subtle actions are all reflected in each element of the sample feature matrix, only the number of occurrences of that element determines its importance. But this importance should not be a factor in the initial emotion analysis model training process.
Therefore, the first embodiment of the present application adopts a Norm-gate module to determine the probability of occurrence of each element in the sample feature matrix, and assigns a higher weight to an element with a lower probability of occurrence, so as to adaptively adjust the importance degree of the element and enhance the ability of the element to characterize an emotional state.
Step S105-2, training the initial emotion analysis model by taking the first sample feature matrix and the emotion state labeled by the sample object as training samples to obtain the emotion analysis model.
The initial emotion analysis model is an initial machine learning model, such as an initial convolutional neural network model. In a specific application process, training samples corresponding to a plurality of sample objects are obtained through the sample objects to train the initial machine learning model, so that internal parameters of the initial machine learning model are adjusted, an emotion fusion feature vector capable of being input according to a target user is obtained, and the current emotion state of the target user is output.
Specifically, the step S105-2 trains the initial emotion analysis model in the following manner.
Firstly, splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector to obtain a splicing result containing a sample feature matrix.
Secondly, inputting the sample characteristic matrix into the initial emotion analysis model, and obtaining a predicted value aiming at the emotional state of the sample object through the initial emotion analysis model.
And finally, performing classification task training and regression prediction task training on the initial emotion analysis model according to the emotion state predicted value and the emotion state labeled by the sample object, and taking the trained initial emotion analysis model as the emotion analysis model.
In other words, in the process of model training, the difference between the emotional state of the sample object predicted by the model and the emotional state labeled by the sample object is continuously detected, and the internal parameters of the model are continuously adjusted through a classification training task and a regression task.
In the field of machine learning, the classification task refers to an approximation task of a mapping function of input variables to discrete output variables, where the mapping function is used to predict a class for a given observation. In the first embodiment of the present application, the input variable refers to a sample feature matrix of the sample object, the output variable refers to an emotional state of the sample object predicted by a model according to the sample feature matrix, and a category given by the mapping function is a category of a preset emotional state, for example: sad, happy, angry, etc. emotion categories.
In the process of model training, along with the expansion of training samples and the continuous learning of the models, the precision of the trained models is gradually improved until the error rate of the trained emotion recognition models for emotion classification is smaller than a preset threshold.
For example, for a simple calculation method, assuming that the predetermined threshold is 2%, if 1 of the emotional states of the sample objects output by the trained emotion recognition model for the sample feature matrices of 100 sample objects is wrong, the error rate of the classification model is 1%, and it is determined that the classification accuracy of the prediction model satisfies the predetermined threshold.
In an alternative embodiment of the present application, a cross-entropy loss function L is employedCEDetermining whether a classification task of a training model is completed, wherein an error rate of the emotion recognition model is calculated through the cross entropy loss function, and specifically, the cross entropyThe loss function is embodied by the following equation (3):
Figure BDA0003385208620000141
wherein, yiFor the true value of the emotional state labeled in the ith sample object,
Figure BDA0003385208620000142
and outputting a predicted value of the emotional state of the ith sample object for the emotion recognition model, wherein N is the number of samples adopted for training the emotion recognition model.
In the field of machine learning, the regression prediction task is to find out various factors influencing a prediction target by taking a prediction correlation principle as a basis, then find out approximate expressions of functional relations between the factors and the prediction target, and find out the factors by a mathematical method. Parameters are estimated for the model using the training samples, and the model is error tested. If the model determines, the model can be used to predict the change in the value of the factor.
In an alternative embodiment of the present application, a mean square error loss function L is usedMSEAnd carrying out error detection on the model, specifically, the mean square error loss function is represented by the following formula (4):
Figure BDA0003385208620000143
likewise, yiFor the true value of the emotional state labeled in the ith sample object,
Figure BDA0003385208620000144
and outputting a predicted value of the emotional state of the ith sample object for the emotion recognition model, wherein N is the number of samples adopted for training the emotion recognition model.
The emotion analysis model for determining the emotion of the user can be obtained by adopting the methods of the steps S101 to S105.
Therefore, according to the method for generating the emotion analysis model, the collected multi-modal feature vectors are decoupled and fused in different modes, the multi-modal features obtained in different modes are fused, and the modal features obtained in different modes are further spliced to obtain a training sample so as to train the neural network to obtain the emotion analysis model capable of analyzing emotion. The method fully considers the connection and difference among the modal feature vectors of the sample object, enhances the emotion characterization capability of the training sample, and can fully and effectively analyze the multi-modal information by the emotion feature analysis model obtained by the method.
In addition, in the process of testing the emotion analysis model obtained by training, if the extraction of modal features is removed, the number of network parameters generated in the use process of the model is only about 0.3M, and the calculated amount is about 2 MFLOPs. Therefore, the emotion analysis model obtained by the method of the first embodiment of the application can save a large amount of calculation cost and machine cost while ensuring accurate analysis of emotion, and the distance between a person and a machine can be shortened by analyzing the emotion state of the person by using the model, so that a user can obtain better user experience.
Further, for further understanding of the method for generating an emotion analysis model provided in the first embodiment of the present application, a detailed description is provided below with reference to fig. 2, where fig. 2 is a logic diagram of emotion analysis model training provided in another embodiment of the present application.
The process of obtaining an emotion analysis model for determining the emotion of a user by acquiring a voice communication video of a person as a sample, extracting a multi-modal feature set from the sample, and training an initial emotion analysis model based on the multi-modal feature set is described in fig. 2.
Fig. 2 includes: a modality extraction module 201, a modality processing module 202, a sample acquisition module 203, a sample processing module 204, and a training module 205.
The emotional state of the person should be happy as can be known from the voice text information (e.g., "That's done of crazy" in fig. 2) of the person and the expression information in the image of the person in the modality extraction module 201. On this basis, after the modal extraction module 201 obtains the voice communication video of the person, firstly, the voice communication video is subjected to continuous video clip cutting, modal analysis is performed on the voice communication video based on each video clip, facial expression image information, audio information and voice text information of a task in the voice communication video are obtained, and meanwhile, a modal feature extraction model is used for obtaining feature vectors corresponding to each modality.
Specifically, as shown in FIG. 2, in FIG. 2
Figure BDA0003385208620000151
It can be understood as a convolutional neural network for obtaining feature vectors of each modality according to the multi-modality information set, where a represents audio information, t represents speech text information, v represents facial expression image information, and l represents a serial number of a video clip processed by the convolutional neural network. The modal characteristic extraction model outputs the audio characteristic vector h according to the input characteristic informationaSpeech text feature vector htAnd facial expression feature vector hv
After obtaining the modal feature vectors of the modalities, the modality processing module 202 performs fusion processing on the three feature vectors to obtain a first fusion feature vector M, and meanwhile obtains a shared feature vector S among the three feature vectors by an average processing method, and determines the private feature vectors of the three feature vectors (wherein, the facial expression feature vector h is determined based on the shared feature vector S and the three feature vectors)vIs a private feature vector ofvAudio feature vector haIs a private feature vector ofaSpeech text feature vector htIs a private feature vector oft)。
After the private characteristic vector is determined, the three private characteristic vectors are subjected to fusion processing to obtain a second fusion characteristic vector I, and the three private characteristic vectors are subjected to fusion processingBefore merging, the three private feature vectors need to be subjected to first dimension reduction, and the similarity between the private features needs to be reduced. In particular, the dimensionality reduction may be achieved by introducing a pooling layer, as shown in fig. 2, which is shown in fig. 2
Figure BDA0003385208620000161
I.e. to the private feature vector ivFeature vectors obtained after dimensionality reduction, corresponding
Figure BDA0003385208620000162
For the private feature vector iaThe feature vectors obtained after the dimensionality reduction is performed,
Figure BDA0003385208620000163
for the private feature vector itAnd (5) carrying out dimensionality reduction on the obtained feature vector.
After determining the first fusion eigenvector M, the second fusion eigenvector I and the shared eigenvector S, the sample obtaining module 204 splices the first fusion eigenvector θ, the second fusion eigenvector I and the shared eigenvector S to obtain an initial sample matrix F0And the initial sample matrix F0Sent to the sample processing module 205, where the sample processing module 205 pairs the initial sample matrix F with the Norm-gate module0The importance degree of each element in the initial emotion analysis model is processed, the sample matrix after the importance degree processing is used as a training sample to carry out classification task training and regression prediction task training on the initial emotion analysis model, and finally the emotion analysis model used for determining the emotion of the user is obtained.
Similar to the first embodiment, the second embodiment of the present application provides another emotion analysis model generation method, which is basically similar to the first embodiment, so that the description is simple, and the relevant points can be found in the partial description of the first embodiment.
Please refer to fig. 3, which is a flowchart illustrating a method for generating an emotion analysis model according to another embodiment of the present application, the method includes steps S301 to S304.
Step S301, obtaining a multi-modal feature vector of the sample object marked with the emotional state.
Step S301 is substantially the same as step S101 in the first embodiment of the present application, and for the specific explanation of this step, reference may be made to step 101 in the first embodiment of the present application, which is only briefly described here.
Optionally, the obtaining the multi-modal feature vector of the sample object marked with the emotional state includes:
and obtaining the multi-modal feature vector through a convolutional neural network and multi-modal information.
Step S302, obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality.
Step S302 is substantially the same as step S102 in the first embodiment of the present application, and for the specific explanation of this step, reference may be made to step 101 in the first embodiment of the present application, which is only briefly described here.
Optionally, the step S302 specifically includes:
obtaining the shared feature vector according to the mean value of the multi-modal feature vectors;
and decoupling each modal feature vector in the multi-modal feature vectors according to the shared features to obtain the private feature vector corresponding to each modal.
Step S303, carrying out fusion processing on the modal feature vectors corresponding to the modes in the multi-modal feature vector to obtain a first emotion fusion feature vector of the sample object.
Step S303 is substantially the same as step S104 in the first embodiment of the present application, and for the specific explanation of this step, reference may be made to step S104 in the first embodiment of the present application, which is only briefly described here.
Optionally, in the step S303, the private feature vectors of the respective modalities are fused in an outer product manner, so as to obtain a second emotion fused feature vector of the sample object.
Step S304, taking one of the second emotion fusion feature and the shared feature and the emotion state marked by the sample object as a training sample to train a preset initial emotion analysis model, and obtaining an emotion analysis model for determining the emotion of the user.
The step S304 is substantially the same as the step S105 in the first embodiment of the present application, except that in the second embodiment of the present application, any one of the second emotion fusion feature or the shared feature is adopted as the training sample to train the preset initial emotion analysis model.
Optionally, the specific training process of the model is as follows:
obtaining a predicted value aiming at the emotional state of the sample object through the initial emotion analysis model by using the second emotion fusion feature or the shared feature data of the initial emotion analysis model;
and performing classification task training and regression prediction task training on the initial emotion analysis model according to the emotion state predicted value and the emotion state labeled by the sample object, and taking the trained initial emotion analysis model as the emotion analysis model.
The first embodiment and the second embodiment of the present application respectively describe a method for generating an emotion analysis model, and a third embodiment of the present application provides an emotion analysis method corresponding to the first embodiment, please refer to fig. 4, which is a flowchart of an emotion analysis method provided in another embodiment of the present application.
As shown in fig. 4, the emotion analyzing method provided in the third embodiment of the present application includes steps S401 to S405.
Step S401, obtaining a multi-modal feature vector of a sample object marked with emotional state
The emotion analysis method provided by the third embodiment of the application is used for identifying the emotion state of a target user in the human-computer interaction process, and specifically, the emotion of a target object is analyzed through an emotion analysis model obtained by the first method embodiment provided by the application.
The target object can be understood as a main user in a human-computer interaction process, and the obtaining of the multi-modal feature set of the target object is to obtain the multi-modal feature vector set of the target object based on the current interaction behavior of the target object.
Specifically, the obtaining manner of the multi-modal feature vector set is similar to the process of obtaining the multi-modal feature vector set of the sample object in step S101 of the first method embodiment of the present application, that is, the obtaining the multi-modal feature vector set of the target object includes:
and obtaining the multi-modal feature vector through a convolutional neural network and multi-modal information.
Step S402, obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality.
Optionally, the step S402 includes:
processing the modal feature vector corresponding to each modal in the multi-modal feature vector in a mode of solving a vector mean value to determine the shared feature vector;
and decoupling the modal feature vectors corresponding to each mode in the multi-mode feature vectors based on the shared feature vectors, and determining the private feature vectors of each mode.
Step S403, performing fusion processing on the modal feature vectors corresponding to the respective modalities in the multi-modal feature vectors to obtain a first emotion fusion feature vector of the target object.
Optionally, the step S403 includes: and performing fusion processing on the modal feature vectors of each mode in the multi-mode feature vector set in an outer product mode to obtain a first emotion fusion feature vector of the sample object.
Step S404, the private characteristic vectors of all the modes are subjected to fusion processing, and a second emotion fusion characteristic vector of the target object is obtained.
Optionally, the step S404 includes: and carrying out fusion processing on the private characteristic vectors of all the modes in an outer product mode to obtain a second emotion fusion characteristic vector of the sample object.
Step S405, splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, and inputting a splicing result into an emotion analysis model for determining user emotion to obtain an emotion analysis result of the target user.
Splicing the first emotion fusion feature vector and the second emotion fusion feature vector, and inputting the spliced first emotion fusion feature vector and the second emotion fusion feature vector into an emotion analysis model for determining user emotion to obtain an emotion analysis result of the target user;
or, the first emotion fusion feature vector and the shared feature vector are spliced and input into an emotion analysis model for determining user emotion, and an emotion analysis result of the target user is obtained;
or splicing the second emotion fusion feature vector and the shared feature vector and inputting the second emotion fusion feature vector and the shared feature vector into an emotion analysis model for determining user emotion to obtain an emotion analysis result of the target user.
A fourth embodiment of the present application provides an emotion analysis method corresponding to the second embodiment, please refer to fig. 5, which is a flowchart of an emotion analysis method provided in another embodiment of the present application. Since the method is basically similar to the third embodiment, the description is simple, and reference may be made to the partial description of the third embodiment.
As shown in fig. 5, a sentiment analysis method according to another embodiment of the present application includes the following steps S501 to S504.
Step S501, multi-modal feature vectors of the target object are obtained.
Optionally, the obtaining the multi-modal feature vector of the target object includes:
and obtaining the multi-modal feature vector through a convolutional neural network and multi-modal information.
Step S502, obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality.
Optionally, step S502 includes:
processing the modal feature vector corresponding to each modal in the multi-modal feature vector in a mode of solving a vector mean value to determine the shared feature vector;
and decoupling the modal feature vectors corresponding to each mode in the multi-mode feature vectors based on the shared feature vectors, and determining the private feature vectors of each mode.
Step S503, performing fusion processing on the private feature vectors of the respective modalities to obtain a second emotion fusion feature vector of the target object.
Optionally, step S503 includes: and performing fusion processing on the feature vectors of each mode in the multi-mode feature vector set in an outer product mode to obtain a first emotion fusion feature vector of the sample object.
Step S504, inputting one of the second emotion fusion characteristics or the shared characteristics into an emotion analysis model for determining user emotion, and obtaining an emotion analysis result of the target user.
Inputting the second emotion fusion feature into an emotion analysis model for determining user emotion to obtain an emotion analysis result of the target user;
or inputting an emotion analysis model for determining user emotion into the shared characteristics to obtain an emotion analysis result of the target user.
Corresponding to the first method embodiment, another embodiment of the present application provides an apparatus for generating an emotion analysis model. Since the apparatus is basically similar to the first method embodiment of the present application, the description is simple, and the relevant point can be found in the partial description of the first method embodiment.
Please refer to fig. 6, which is a schematic structural diagram of a device for generating an emotion analysis model according to another embodiment of the present application.
The emotion analysis model generation device comprises:
a first obtaining module 601, configured to obtain a multi-modal feature vector of a sample object labeled with an emotional state
A second obtaining module 602, configured to obtain shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality;
a first fusion module 603, configured to perform fusion processing on the modal feature vectors corresponding to the modalities in the multi-modal feature vectors to obtain a first emotion fusion feature vector of the sample object;
a second fusion module 604, configured to perform fusion processing on the private feature vectors of the modalities to obtain a second emotion fusion feature vector of the sample object;
the first training module 605 is configured to splice at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, train a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as training samples, and obtain an emotion analysis model for determining an emotion of the user.
The obtaining of the shared feature vectors among the multi-modal feature vectors and the private feature vectors corresponding to the respective modalities includes:
obtaining the shared feature vector according to the mean value of the multi-modal feature vectors;
and decoupling each modal feature vector in the multi-modal feature vectors according to the shared features to obtain the private feature vector corresponding to each modal.
Optionally, the obtaining the multi-modal feature vector of the sample object marked with the emotional state includes:
and obtaining the multi-modal feature vector through a convolutional neural network and multi-modal information.
Optionally, the performing fusion processing on the modal feature vectors corresponding to the modalities in the multi-modal feature vector to obtain the first emotion fusion feature vector of the sample object includes:
and performing fusion processing on the modal feature vectors of each mode in the multi-mode feature vector set in an outer product mode to obtain a first emotion fusion feature vector of the sample object.
Optionally, the fusing the private feature vectors of the modalities to obtain a second emotion fused feature vector of the sample object includes:
and carrying out fusion processing on the private characteristic vectors of all the modes in an outer product mode to obtain a second emotion fusion characteristic vector of the sample object.
Optionally, the splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and training a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as a training sample to obtain an emotion analysis model for determining user emotion, including:
splicing the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector to obtain a first sample feature matrix;
and taking the first sample feature matrix and the emotional state labeled by the sample object as training samples to train the initial emotion analysis model to obtain the emotion analysis model.
Optionally, the splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and training a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as a training sample to obtain an emotion analysis model for determining user emotion, including:
splicing the first emotion fusion feature vector and the second emotion fusion feature vector to obtain a second sample feature matrix;
and taking the second sample feature matrix and the emotional state labeled by the sample object as training samples to train the initial emotion analysis model, so as to obtain the emotion analysis model.
Optionally, the splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and training a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as a training sample to obtain an emotion analysis model for determining user emotion, including:
splicing the first emotion fusion feature vector and the shared feature vector to obtain a third sample feature matrix;
and taking the third sample feature matrix and the emotional state labeled by the sample object as training samples to train the initial emotion analysis model, so as to obtain the emotion analysis model.
Optionally, the splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and training a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as a training sample to obtain an emotion analysis model for determining user emotion, including:
splicing the second emotion fusion feature vector and the shared feature vector to obtain a fourth sample feature matrix;
and taking the fourth sample feature matrix and the emotional state labeled by the sample object as training samples to train the initial emotion analysis model, so as to obtain the emotion analysis model.
Optionally, the splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and training a preset initial emotion analysis model by using a splicing result and an emotion state labeled by the sample object as a training sample, include:
splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector to obtain a splicing result containing a sample feature matrix;
inputting the sample feature matrix into the initial emotion analysis model, and obtaining a predicted value aiming at the emotional state of the sample object through the initial emotion analysis model;
and performing classification task training and regression prediction task training on the initial emotion analysis model according to the emotion state predicted value and the emotion state labeled by the sample object, and taking the trained initial emotion analysis model as the emotion analysis model.
Optionally, the multi-modality comprises at least two of audio, text and image.
Optionally, the apparatus further comprises:
the sample weight adjusting module is used for determining the probability of each matrix element in the sample characteristic matrix; according to the probability of each matrix element, giving a weight to each matrix element;
adjusting the sample characteristic matrix based on the weight of each matrix element to obtain a sample characteristic matrix after weight adjustment; and taking the sample feature matrix after the weight adjustment and the emotional state labeled by the sample object as the training sample.
A sixth embodiment of the present application provides an emotion analysis model generation device corresponding to the second embodiment. Since the device is basically similar to the second embodiment, the description is simple, and reference may be made to the partial description of the second embodiment.
Please refer to fig. 7, which is a schematic structural diagram of an apparatus for generating an emotion analysis model according to another embodiment of the present application.
The emotion analysis model generation device comprises:
a third obtaining module 701, configured to obtain a multi-modal feature vector of a sample object labeled with an emotional state;
a fourth obtaining module 702, configured to obtain shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality;
a third fusion module 703, configured to perform fusion processing on the private feature vectors of the respective modalities to obtain a second emotion fusion feature vector of the sample object;
a second training module 704, configured to train a preset initial emotion analysis model by using one of the second emotion fusion feature and the shared feature and an emotion state labeled by the sample object as a training sample, so as to obtain an emotion analysis model for determining user emotion.
Optionally, the obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality includes:
obtaining the shared feature vector according to the mean value of the multi-modal feature vectors;
and decoupling each modal feature vector in the multi-modal feature vectors according to the shared features to obtain the private feature vector corresponding to each modal.
Optionally, the obtaining the multi-modal feature vector of the sample object marked with the emotional state includes:
and obtaining the multi-modal feature vector through a convolutional neural network and multi-modal information.
Optionally, the fusing the private feature vectors of the modalities to obtain a second emotion fused feature vector of the sample object includes:
and carrying out fusion processing on the private characteristic vectors of all the modes in an outer product mode to obtain a second emotion fusion characteristic vector of the sample object.
Optionally, the multi-modality includes at least two of audio, text, and images.
Optionally, the apparatus further comprises:
the sample weight adjusting module is used for determining the probability of each matrix element in the sample characteristic matrix; according to the probability of the occurrence of each matrix element, giving a weight to each element; and adjusting the sample feature matrix based on the weight of each element to obtain a sample feature matrix with the adjusted weight as the training sample.
A seventh embodiment of the present application provides an emotion analyzing apparatus corresponding to the third embodiment. Since the device is basically similar to the third embodiment, the description is simple, and reference may be made to the partial description of the third embodiment.
Please refer to fig. 8, which is a schematic structural diagram of an emotion analyzing apparatus according to another embodiment of the present application.
The device includes:
a fifth obtaining module 801, configured to obtain a multi-modal feature vector of a target object;
a sixth obtaining module 802, configured to obtain shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality;
a fourth fusion module 803, configured to perform fusion processing on the modal feature vectors corresponding to the respective modalities in the multi-modal feature vectors to obtain a first emotion fusion feature vector of the target object;
a fifth fusion module 804, configured to perform fusion processing on the private feature vectors of the respective modalities to obtain a second emotion fusion feature vector of the target object;
a first analysis module 805, configured to splice at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and input a splicing result into an emotion analysis model for determining user emotion, so as to obtain an emotion analysis result of the target user.
The emotion analysis model is obtained according to the emotion analysis model generation method provided by the first embodiment of the application.
Optionally, the obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality includes:
obtaining the shared feature vector according to the mean value of the multi-modal feature vectors;
and decoupling each modal feature vector in the multi-modal feature vectors according to the shared features to obtain the private feature vector corresponding to each modal. Optionally, the obtaining the multi-modal feature vector of the target object includes:
and obtaining the multi-modal feature vector through a convolutional neural network and multi-modal information.
Optionally, the obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality includes:
obtaining the shared feature vector according to the mean value of the multi-modal feature vectors;
and decoupling each modal feature vector in the multi-modal feature vectors according to the shared features to obtain the private feature vector corresponding to each modal.
Optionally, the performing fusion processing on the modal feature vectors corresponding to the modalities in the multi-modal feature vector to obtain the first emotion fusion feature vector of the target object includes:
and performing fusion processing on the modal feature vectors of each mode in the multi-mode feature vector set in an outer product mode to obtain a first emotion fusion feature vector of the sample object.
Optionally, the fusing the private feature vectors of the modalities to obtain a second emotion fused feature vector of the target object includes:
and carrying out fusion processing on the private characteristic vectors of all the modes in an outer product mode to obtain a second emotion fusion characteristic vector of the sample object.
An eighth embodiment of the present application provides an emotion analyzing apparatus corresponding to the fourth embodiment. Since the device is basically similar to the fourth embodiment, the description is simple, and reference may be made to the partial description of the fourth embodiment.
Please refer to fig. 9, which is a schematic structural diagram of an emotion analyzing apparatus according to another embodiment of the present application.
As shown in fig. 9, the apparatus includes:
a seventh obtaining module 901, configured to obtain a multi-modal feature vector of the target object;
an eighth obtaining module 902, configured to obtain shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality;
a sixth fusion module, configured to perform fusion processing on the private feature vectors of the respective modalities to obtain a second emotion fusion feature vector of the target object;
and the second analysis module is used for inputting one of the second emotion fusion characteristics or the shared characteristics into an emotion analysis model used for determining user emotion to obtain an emotion analysis result of the target user.
The emotion analysis model is obtained by the emotion analysis model generation method provided by the second embodiment of the application.
Optionally, the obtaining the multi-modal feature vector of the target object includes:
and obtaining the multi-modal feature vector through a convolutional neural network and multi-modal information.
Optionally, the obtaining shared feature vectors among the multi-modal feature vectors and private feature vectors corresponding to each modality includes:
obtaining the shared feature vector according to the mean value of the multi-modal feature vectors;
and decoupling each modal feature vector in the multi-modal feature vectors according to the shared features to obtain the private feature vector corresponding to each modal.
Optionally, the fusing the private feature vectors of the modalities to obtain a second emotion fused feature vector of the target object includes:
and carrying out fusion processing on the private characteristic vectors of all the modes in an outer product mode to obtain a second emotion fusion characteristic vector of the sample object.
Please refer to fig. 10, which is a schematic structural diagram of an electronic device according to another embodiment of the present application.
The point-on-device comprises: processor 1001
The memory 1002 is used for storing a program of the method, and the program is read by the processor 1001 and executed to execute any one of the methods of the above embodiments.
Another embodiment of the present application also provides a computer storage medium storing a computer program that, when executed, performs any one of the methods of the embodiments.
It should be noted that, for the detailed description of the electronic device and the computer storage medium provided in the embodiments of the present application, reference may be made to the related description of the foregoing method embodiments provided in the present application, and details are not repeated here.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transmitter 7 media), such as modulated data signals and carrier waves.
2. It will be apparent to those skilled in the art that embodiments of the present application may be provided as a system or an electronic device. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (21)

1.一种情感分析模型的生成方法,其特征在于,包括:1. a generation method of sentiment analysis model, is characterized in that, comprises: 获取标注有情感状态的样本对象的多模态特征向量;Obtain multimodal feature vectors of sample objects annotated with emotional states; 获取所述多模态特征向量之间的共享特征向量、以及各模态对应的私有特征向量;Obtaining the shared eigenvectors between the multimodal eigenvectors and the private eigenvectors corresponding to each modal; 对所述多模态特征向量中各模态对应的模态特征向量进行融合处理,获得所述样本对象的第一情感融合特征向量;Perform fusion processing on the modal feature vectors corresponding to each modal in the multi-modal feature vector to obtain the first emotion fusion feature vector of the sample object; 对所述各模态的所述私有特征向量进行融合处理,获得所述样本对象的第二情感融合特征向量;Perform fusion processing on the private feature vectors of the respective modalities to obtain the second emotion fusion feature vector of the sample object; 对所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量中的至少两个进行拼接,将拼接结果和所述样本对象标注的情感状态作为训练样本对预设的初始情感分析模型进行训练,获得用于确定用户情感的情感分析模型。Splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and using the splicing result and the emotional state marked by the sample object as the training sample for the preset initial The sentiment analysis model is trained to obtain a sentiment analysis model for determining user sentiment. 2.根据权利要求1所述的方法,其特征在于,所述获取所述多模态特征向量之间的共享特征向量、以及各模态对应的私有特征向量,包括:2. The method according to claim 1, wherein the obtaining the shared eigenvectors between the multimodal eigenvectors and the private eigenvectors corresponding to each modal comprises: 根据所述多模态特征向量的均值获得所述共享特征向量;obtaining the shared eigenvectors according to the mean of the multimodal eigenvectors; 根据所述共享特征对所述多模态特征向量中每个模态特征向量进行解耦处理,获得所述各模态对应的私有特征向量。Decoupling processing is performed on each modal eigenvector in the multimodal eigenvectors according to the shared feature to obtain a private eigenvector corresponding to each modal. 3.根据权利要求1所述的方法,其特征在于,所述获取标注有情感状态的样本对象的多模态特征向量,包括:3. The method according to claim 1, wherein the acquiring the multimodal feature vector of the sample object marked with the emotional state comprises: 通过卷积神经网络和多模态信息获得所述多模态特征向量。The multimodal feature vector is obtained through a convolutional neural network and multimodal information. 4.根据权利要求1所述的方法,其特征在于,所述对所述多模态特征向量中各模态对应的模态特征向量进行融合处理,获得所述样本对象的第一情感融合特征向量,包括:4 . The method according to claim 1 , wherein the modal feature vector corresponding to each modality in the multimodal feature vector is fused to obtain the first emotion fusion feature of the sample object. 5 . vector, including: 通过外积的方式对所述多模态特征向量集合中各模态的模态特征向量进行融合处理,获得所述样本对象的第一情感融合特征向量。The modal feature vectors of each modality in the multimodal feature vector set are fused by means of outer product to obtain the first emotion fusion feature vector of the sample object. 5.根据权利要求1所述的方法,其特征在于,所述对所述各模态的所述私有特征向量进行融合处理,获得所述样本对象的第二情感融合特征向量,包括:5. The method according to claim 1, wherein the fusion processing is performed on the private feature vectors of the respective modalities to obtain the second emotion fusion feature vector of the sample object, comprising: 通过外积的方式对所述各模态的私有特征向量进行融合处理,获得所述样本对象的第二情感融合特征向量。The private feature vectors of the respective modalities are fused by means of outer product to obtain the second emotion fusion feature vector of the sample object. 6.根据权利要求1所述的方法,其特征在于,所述对所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量中的至少两个进行拼接,将拼接结果和所述样本对象标注的情感状态作为训练样本对预设的初始情感分析模型进行训练,获得用于确定用户情感的情感分析模型,包括:6. The method according to claim 1, characterized in that, by splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, the splicing result is spliced. and the emotional state marked by the sample object is used as a training sample to train a preset initial emotional analysis model to obtain an emotional analysis model for determining the user's emotion, including: 拼接所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量,获得第一样本特征矩阵;splicing the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector to obtain a first sample feature matrix; 将所述第一样本特征矩阵和所述样本对象标注的情感状态作为训练样本训练所述初始情感分析模型,获得所述情感分析模型。The initial sentiment analysis model is trained by using the first sample feature matrix and the emotional states marked by the sample objects as training samples to obtain the sentiment analysis model. 7.根据权利要求1所述的方法,其特征在于,所述对所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量中的至少两个进行拼接,将拼接结果和所述样本对象标注的情感状态作为训练样本对预设的初始情感分析模型进行训练,获得用于确定用户情感的情感分析模型,包括:7. The method according to claim 1, characterized in that, splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and splicing the result. and the emotional state marked by the sample object is used as a training sample to train a preset initial emotional analysis model to obtain an emotional analysis model for determining the user's emotion, including: 拼接所述第一情感融合特征向量和所述第二情感融合特征向量,获得第二样本特征矩阵;splicing the first emotion fusion feature vector and the second emotion fusion feature vector to obtain a second sample feature matrix; 将所述第二样本特征矩阵和和所述样本对象标注的情感状态作为训练样本训练所述初始情感分析模型,获得所述情感分析模型。The initial sentiment analysis model is trained by using the second sample feature matrix and the sentiment state marked by the sample object as training samples to obtain the sentiment analysis model. 8.根据权利要求1所述的方法,其特征在于,所述对所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量中的至少两个进行拼接,将拼接结果和所述样本对象标注的情感状态作为训练样本对预设的初始情感分析模型进行训练,获得用于确定用户情感的情感分析模型,包括:8. The method according to claim 1, characterized in that, by splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, the splicing result is spliced. and the emotional state marked by the sample object is used as a training sample to train a preset initial emotional analysis model to obtain an emotional analysis model for determining the user's emotion, including: 拼接所述第一情感融合特征向量和所述共享特征向量,获得第三样本特征矩阵;splicing the first emotion fusion feature vector and the shared feature vector to obtain a third sample feature matrix; 将所述第三样本特征矩阵和和所述样本对象标注的情感状态作为训练样本训练所述初始情感分析模型,获得所述情感分析模型。The initial sentiment analysis model is trained by using the third sample feature matrix and the emotional state marked by the sample object as training samples to obtain the sentiment analysis model. 9.根据权利要求1所述的方法,其特征在于,所述对所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量中的至少两个进行拼接,将拼接结果和所述样本对象标注的情感状态作为训练样本对预设的初始情感分析模型进行训练,获得用于确定用户情感的情感分析模型,包括:9. The method according to claim 1, wherein the splicing is performed on at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, and the splicing result is spliced. and the emotional state marked by the sample object is used as a training sample to train a preset initial emotional analysis model to obtain an emotional analysis model for determining the user's emotion, including: 拼接所述第二情感融合特征向量和所述共享特征向量,获得第四样本特征矩阵;splicing the second emotion fusion feature vector and the shared feature vector to obtain a fourth sample feature matrix; 将所述第四样本特征矩阵和和所述样本对象标注的情感状态作为训练样本训练所述初始情感分析模型,获得所述情感分析模型。The initial sentiment analysis model is trained by using the fourth sample feature matrix and the sentiment state marked by the sample object as training samples to obtain the sentiment analysis model. 10.根据权利要求1所述的方法,其特征在于,所述对所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量中的至少两个进行拼接,将拼接结果和所述样本对象标注的情感状态作为训练样本对预设的初始情感分析模型进行训练,包括:10. The method according to claim 1, wherein, by splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, the splicing result is spliced. and the emotional state marked by the sample object as a training sample to train a preset initial emotional analysis model, including: 对所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量中的至少两个进行拼接,获得包含样本特征矩阵的拼接结果;Splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector to obtain a splicing result that includes a sample feature matrix; 将所述样本特征矩阵输入所述初始情感分析模型,通过所述初始情感分析模型获得针对所述样本对象情感状态的预测值;Inputting the sample feature matrix into the initial sentiment analysis model, and obtaining a predicted value for the emotional state of the sample object through the initial sentiment analysis model; 根据所述情感状态预测值和所述样本对象标注的情感状态,对所述初始情感分析模型进行分类任务训练和回归预测任务训练,将训练后的所述初始情感分析模型作为所述情感分析模型。According to the predicted value of the emotional state and the emotional state marked by the sample object, the initial emotional analysis model is trained for classification tasks and regression prediction tasks, and the trained initial emotional analysis model is used as the emotional analysis model . 11.根据权利要求10所述的方法,其特征在于,所述方法还包括:11. The method of claim 10, wherein the method further comprises: 确定所述样本特征矩阵中各个矩阵元素出现的概率;determining the probability of each matrix element appearing in the sample feature matrix; 根据所述各矩阵元素出现的概率,为所述各个矩阵元素赋予权重;assigning a weight to each matrix element according to the probability of occurrence of each matrix element; 基于所述各矩阵元素的权重对所述样本特征矩阵进行调整,获得权重调整后的样本特征矩阵;Adjusting the sample feature matrix based on the weight of each matrix element to obtain a weight-adjusted sample feature matrix; 将所述权重调整后的样本特征矩阵和所述样本对象标注的情感状态作为所述训练样本。The weight-adjusted sample feature matrix and the emotional state marked by the sample object are used as the training sample. 12.根据权利要求1所述的方法,其特征在于,所述多模态包括音频、文本、图像中的至少两种。12. The method of claim 1, wherein the multimodality comprises at least two of audio, text, and image. 13.一种情感分析模型的生成方法,其特征在于,包括:13. A method for generating a sentiment analysis model, comprising: 获取标注有情感状态的样本对象的多模态特征向量;Obtain multimodal feature vectors of sample objects annotated with emotional states; 获取所述多模态特征向量之间的共享特征向量、以及各模态对应的私有特征向量;Obtaining the shared eigenvectors between the multimodal eigenvectors and the private eigenvectors corresponding to each modal; 对所述各模态的所述私有特征向量进行融合处理,获得所述样本对象的第二情感融合特征向量;Perform fusion processing on the private feature vectors of the respective modalities to obtain the second emotion fusion feature vector of the sample object; 将所述第二情感融合特征和所述共享特征中的一个,和所述样本对象标注的情感状态作为训练样本对预设的初始情感分析模型进行训练,获得用于确定用户情感的情感分析模型。Using one of the second emotional fusion feature and the shared feature, and the emotional state marked by the sample object as a training sample to train a preset initial emotional analysis model to obtain an emotional analysis model for determining the user's emotion . 14.一种情感分析方法,其特征在于,包括:14. A sentiment analysis method, comprising: 获取目标对象的多模态特征向量;Obtain the multimodal feature vector of the target object; 获取所述多模态特征向量之间的共享特征向量、以及各模态对应的私有特征向量;Obtaining the shared eigenvectors between the multimodal eigenvectors and the private eigenvectors corresponding to each modal; 对所述多模态特征向量中各模态对应的模态特征向量进行融合处理,获得所述目标对象的第一情感融合特征向量;Perform fusion processing on the modal feature vectors corresponding to each mode in the multi-modal feature vector to obtain the first emotion fusion feature vector of the target object; 对所述各模态的所述私有特征向量进行融合处理,获得所述目标对象的第二情感融合特征向量;Fusion processing is performed on the private feature vectors of the respective modalities to obtain the second emotion fusion feature vector of the target object; 对所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量中的至少两个进行拼接,并将拼接结果输入用于确定用户情感的情感分析模型,获得所述目标用户的情感分析结果;Splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector, and the shared feature vector, and inputting the splicing result into an emotion analysis model for determining user emotion, and obtaining the target user sentiment analysis results; 其中,所述情感分析模型的根据权利要求1-12任意一项所述的方法获得的。Wherein, the sentiment analysis model is obtained according to the method of any one of claims 1-12. 15.一种情感分析方法,其特征在于,包括:15. A sentiment analysis method, comprising: 获取目标对象的多模态特征向量;Obtain the multimodal feature vector of the target object; 获取所述多模态特征向量之间的共享特征向量、以及各模态对应的私有特征向量;Obtaining the shared eigenvectors between the multimodal eigenvectors and the private eigenvectors corresponding to each modal; 对所述各模态的所述私有特征向量进行融合处理,获得所述目标对象的第二情感融合特征向量;Fusion processing is performed on the private feature vectors of the respective modalities to obtain the second emotion fusion feature vector of the target object; 将所述第二情感融合特征或所述共享特征中的一个输入用于确定用户情感的情感分析模型,获得所述目标用户的情感分析结果;An input of the second emotion fusion feature or the shared feature is used to determine the emotion analysis model of the user's emotion, and the emotion analysis result of the target user is obtained; 其中,所述情感分析模型的根据权利要求13所述的方法获得的。Wherein, the sentiment analysis model is obtained according to the method of claim 13 . 16.一种情感分析模型的生成装置,其特征在于,包括:16. A device for generating sentiment analysis models, comprising: 第一获取模块,用于获取标注有情感状态的样本对象的多模态特征向量The first acquisition module is used to acquire the multimodal feature vector of the sample object marked with the emotional state 第二获取模块,用于获取所述多模态特征向量之间的共享特征向量、以及各模态对应的私有特征向量;The second acquisition module is used to acquire the shared eigenvectors between the multimodal eigenvectors and the private eigenvectors corresponding to each modal; 第一融合模块,用于对所述多模态特征向量中各模态对应的模态特征向量进行融合处理,获得所述样本对象的第一情感融合特征向量;a first fusion module, configured to perform fusion processing on the modal feature vectors corresponding to each modal in the multi-modal feature vector to obtain the first emotion fusion feature vector of the sample object; 第二融合模块,用于对所述各模态的所述私有特征向量进行融合处理,获得所述样本对象的第二情感融合特征向量;A second fusion module, configured to perform fusion processing on the private feature vectors of the respective modalities to obtain the second emotion fusion feature vector of the sample object; 第一训练模块,用于对所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量中的至少两个进行拼接,将拼接结果和所述样本对象标注的情感状态作为训练样本对预设的初始情感分析模型进行训练,获得用于确定用户情感的情感分析模型。The first training module is used for splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, and the splicing result and the emotional state marked by the sample object are used as The training sample trains a preset initial sentiment analysis model to obtain a sentiment analysis model for determining the user's sentiment. 17.一种情感分析模型的生成装置,其特征在于,包括:17. A device for generating sentiment analysis models, comprising: 第三获取模块,用于获取标注有情感状态的样本对象的多模态特征向量;The third acquisition module is used to acquire the multimodal feature vector of the sample object marked with the emotional state; 第四获取模块,用于获取所述多模态特征向量之间的共享特征向量、以及各模态对应的私有特征向量;The fourth acquisition module is used to acquire the shared eigenvectors between the multimodal eigenvectors and the private eigenvectors corresponding to each modal; 第三融合模块,用于对所述各模态的所述私有特征向量进行融合处理,获得所述样本对象的第二情感融合特征向量;a third fusion module, configured to perform fusion processing on the private feature vectors of the respective modalities to obtain the second emotion fusion feature vector of the sample object; 第二训练模块,用于将所述第二情感融合特征和所述共享特征中的一个和所述样本对象标注的情感状态作为训练样本对预设的初始情感分析模型进行训练,获得用于确定用户情感的情感分析模型。The second training module is configured to use one of the second emotional fusion feature and the shared feature and the emotional state marked by the sample object as a training sample to train a preset initial emotional analysis model, and obtain a method for determining A sentiment analysis model for user sentiment. 18.一种情感分析装置,其特征在于,包括:18. A sentiment analysis device, comprising: 第五获取模块,用于获取目标对象的多模态特征向量;the fifth acquisition module, used for acquiring the multimodal feature vector of the target object; 第六获取模块,用于获取所述多模态特征向量之间的共享特征向量、以及各模态对应的私有特征向量;The sixth acquisition module is used to acquire the shared eigenvectors between the multimodal eigenvectors and the private eigenvectors corresponding to each modal; 第四融合模块,用于对所述多模态特征向量中各模态对应的模态特征向量进行融合处理,获得所述目标对象的第一情感融合特征向量;a fourth fusion module, configured to perform fusion processing on the modal feature vectors corresponding to each mode in the multi-modal feature vector to obtain the first emotion fusion feature vector of the target object; 第五融合模块,用于对所述各模态的所述私有特征向量进行融合处理,获得所述目标对象的第二情感融合特征向量;a fifth fusion module, configured to perform fusion processing on the private feature vectors of the respective modalities to obtain the second emotion fusion feature vector of the target object; 第一分析模块,用于对所述第一情感融合特征向量、第二情感融合特征向量以及所述共享特征向量中的至少两个进行拼接,并将拼接结果输入用于确定用户情感的情感分析模型,获得所述目标用户的情感分析结果;The first analysis module is used for splicing at least two of the first emotion fusion feature vector, the second emotion fusion feature vector and the shared feature vector, and inputting the splicing result into the emotion analysis used to determine the user's emotion model to obtain the sentiment analysis result of the target user; 其中,所述情感分析模型是根据权利要求14所述的装置获得的。Wherein, the sentiment analysis model is obtained according to the apparatus of claim 14 . 19.一种情感分析装置,其特征在于,包括:19. A sentiment analysis device, comprising: 第七获取模块,用于获取目标对象的多模态特征向量;The seventh acquisition module is used to acquire the multimodal feature vector of the target object; 第八获取模块,用于获取所述多模态特征向量之间的共享特征向量、以及各模态对应的私有特征向量;the eighth acquisition module, for acquiring the shared feature vector between the multimodal feature vectors and the private feature vector corresponding to each modality; 第六融合模块,用于对所述各模态的所述私有特征向量进行融合处理,获得所述目标对象的第二情感融合特征向量;a sixth fusion module, configured to perform fusion processing on the private feature vectors of the respective modalities to obtain the second emotion fusion feature vector of the target object; 第二分析模块,用于将所述第二情感融合特征或所述共享特征中的一个输入用于确定用户情感的情感分析模型,获得所述目标用户的情感分析结果;The second analysis module is used for inputting one of the second emotion fusion feature or the shared feature to determine the emotion analysis model of the user's emotion, and obtains the emotion analysis result of the target user; 其中,所述情感分析模型是根据权利要求15所述的装置获得的。Wherein, the sentiment analysis model is obtained according to the apparatus of claim 15 . 20.一种电子设备,其特征在于,包括:20. An electronic device, comprising: 处理器;processor; 存储器,用于存储方法的程序,所述程序在被处理器读取运行时,执行权力要求1-15任一项所述的方法。The memory is used for storing the program of the method, when the program is read and executed by the processor, the program executes the method according to any one of claims 1-15. 21.一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机程序,所述程序在被执行时执行权力要求1-15任一项所述的方法。21. A computer storage medium, characterized in that the computer storage medium stores a computer program, the program, when executed, executes the method of any one of claims 1-15.
CN202111450929.0A 2021-11-30 2021-11-30 Emotion analysis model generation method and device, electronic equipment and storage medium Pending CN114140885A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111450929.0A CN114140885A (en) 2021-11-30 2021-11-30 Emotion analysis model generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111450929.0A CN114140885A (en) 2021-11-30 2021-11-30 Emotion analysis model generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114140885A true CN114140885A (en) 2022-03-04

Family

ID=80386786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111450929.0A Pending CN114140885A (en) 2021-11-30 2021-11-30 Emotion analysis model generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114140885A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114861660A (en) * 2022-05-26 2022-08-05 北京百度网讯科技有限公司 Training method of neural network for processing text and method of processing text
CN114926837A (en) * 2022-05-26 2022-08-19 东南大学 Emotion recognition method based on human-object space-time interaction behavior
CN115497510A (en) * 2022-10-10 2022-12-20 网易(杭州)网络有限公司 A speech emotion recognition method, device, electronic equipment and storage medium
CN115880698A (en) * 2023-03-08 2023-03-31 南昌航空大学 Depression emotion recognition method based on microblog posting content and social behavior characteristics
CN116758462A (en) * 2023-08-22 2023-09-15 江西师范大学 Emotion polarity analysis method and device, electronic equipment and storage medium
CN117576520A (en) * 2024-01-16 2024-02-20 中国科学技术大学 Training method of target detection model, target detection method and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110755092A (en) * 2019-09-02 2020-02-07 中国航天员科研训练中心 A non-contact emotion monitoring method with cross-media information fusion function
CN113095357A (en) * 2021-03-04 2021-07-09 山东大学 Multi-mode emotion recognition method and system based on attention mechanism and GMN
CN113128284A (en) * 2019-12-31 2021-07-16 上海汽车集团股份有限公司 Multi-mode emotion recognition method and device
CN113312530A (en) * 2021-06-09 2021-08-27 哈尔滨工业大学 Multi-mode emotion classification method taking text as core

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110755092A (en) * 2019-09-02 2020-02-07 中国航天员科研训练中心 A non-contact emotion monitoring method with cross-media information fusion function
CN113128284A (en) * 2019-12-31 2021-07-16 上海汽车集团股份有限公司 Multi-mode emotion recognition method and device
CN113095357A (en) * 2021-03-04 2021-07-09 山东大学 Multi-mode emotion recognition method and system based on attention mechanism and GMN
CN113312530A (en) * 2021-06-09 2021-08-27 哈尔滨工业大学 Multi-mode emotion classification method taking text as core

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114861660A (en) * 2022-05-26 2022-08-05 北京百度网讯科技有限公司 Training method of neural network for processing text and method of processing text
CN114926837A (en) * 2022-05-26 2022-08-19 东南大学 Emotion recognition method based on human-object space-time interaction behavior
CN114926837B (en) * 2022-05-26 2023-08-04 东南大学 Emotion recognition method based on human-object space-time interaction behavior
CN115497510A (en) * 2022-10-10 2022-12-20 网易(杭州)网络有限公司 A speech emotion recognition method, device, electronic equipment and storage medium
CN115880698A (en) * 2023-03-08 2023-03-31 南昌航空大学 Depression emotion recognition method based on microblog posting content and social behavior characteristics
CN116758462A (en) * 2023-08-22 2023-09-15 江西师范大学 Emotion polarity analysis method and device, electronic equipment and storage medium
CN117576520A (en) * 2024-01-16 2024-02-20 中国科学技术大学 Training method of target detection model, target detection method and electronic equipment
CN117576520B (en) * 2024-01-16 2024-05-17 中国科学技术大学 Target detection model training method, target detection method and electronic device

Similar Documents

Publication Publication Date Title
US11093734B2 (en) Method and apparatus with emotion recognition
CN114140885A (en) Emotion analysis model generation method and device, electronic equipment and storage medium
CN112560830A (en) Multi-mode dimension emotion recognition method
WO2023050708A1 (en) Emotion recognition method and apparatus, device, and readable storage medium
Ristea et al. Emotion recognition system from speech and visual information based on convolutional neural networks
Phan et al. Consensus-based sequence training for video captioning
KR20190128933A (en) Emotion recognition apparatus and method based on spatiotemporal attention
Agrawal et al. Multimodal personality recognition using cross-attention transformer and behaviour encoding
Zhu et al. Multimodal deep denoise framework for affective video content analysis
CN116975602A (en) AR interactive emotion recognition method and system based on multi-modal information double fusion
Hu et al. Speech Emotion Recognition Model Based on Attention CNN Bi-GRU Fusing Visual Information.
Huang et al. Learning long-term temporal contexts using skip RNN for continuous emotion recognition
Tang et al. Acoustic feature learning via deep variational canonical correlation analysis
Pei et al. Continuous affect recognition with weakly supervised learning
Hussain et al. Deep learning for audio visual emotion recognition
CN115775565A (en) Multi-mode-based emotion recognition method and related equipment
Yi et al. Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation
Jothimani et al. A new spatio-temporal neural architecture with bi-LSTM for multimodal emotion recognition
CN116932788A (en) Cover image extraction method, device, equipment and computer storage medium
CN118093936B (en) Video tag processing method, device, computer equipment and storage medium
KR102783881B1 (en) Lightweight multimodal fusion method and apparatus using extended bottleneck transformer and dynamic restrained adaptive loss
CN118965263A (en) A multimodal sentiment analysis method and device
Mai et al. Meta-learn unimodal signals with weak supervision for multimodal sentiment analysis
Benavent-Lledo et al. Predicting human-object interactions in egocentric videos
WO2024239755A1 (en) Method and apparatus for determining picture generation model, method and apparatus for picture generation, computing device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination