WO2021135457A1 - Procédé de reconnaissance d'émotions basé sur un réseau neuronal récurrent, appareil et support d'enregistrement - Google Patents

Procédé de reconnaissance d'émotions basé sur un réseau neuronal récurrent, appareil et support d'enregistrement Download PDF

Info

Publication number
WO2021135457A1
WO2021135457A1 PCT/CN2020/118498 CN2020118498W WO2021135457A1 WO 2021135457 A1 WO2021135457 A1 WO 2021135457A1 CN 2020118498 W CN2020118498 W CN 2020118498W WO 2021135457 A1 WO2021135457 A1 WO 2021135457A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
speaker
feature
text
emotion recognition
Prior art date
Application number
PCT/CN2020/118498
Other languages
English (en)
Chinese (zh)
Inventor
王彦
张加语
马骏
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021135457A1 publication Critical patent/WO2021135457A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence recognition, and in particular to a method, device and storage medium for emotion recognition based on cyclic neural network.
  • models such as DialogueRNN, KET, DialogueGCN in the prior art input contextual sentence vectors into a cyclic neural network or Transformer to model the mutual influence between speakers.
  • DialogueRNN and DialogueGCN respectively use recurrent neural networks and graph convolutional networks to capture the speaker's self-dependence to model the interaction between all sentences belonging to the same speaker.
  • the method in the prior art still has the following problems: on the one hand, it ignores the speaker switching information in the dialogue, and cannot detect whether the speaker has changed, which affects the dependence of the model on the speaker.
  • the understanding of the speaker's own dependence relationship limits the improvement of the accuracy of emotion recognition; on the other hand, the models used in these methods to model the speaker's own dependence relationship are more complex, difficult to implement, and affect calculation efficiency.
  • the purpose of this application is to provide an emotion recognition method, device, and storage medium based on a cyclic neural network, so as to solve the technical problems of low emotion recognition accuracy and low calculation efficiency in the prior art.
  • a method for emotion recognition based on a recurrent neural network including:
  • For each sentence determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
  • the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  • Another technical solution of the present application is as follows: Provide an emotion recognition device based on a cyclic neural network, the device including:
  • Sentence encoder used to obtain the text features of each sentence in the dialogue content
  • a context encoder which is used to encode the text features of each sentence to obtain the context features of each sentence;
  • the speaker encoder is used to update the speaker status feature of the speaker of the sentence based on the text feature of the sentence for each sentence, wherein the speaker status feature before the update is based on the speech Obtained from the text features of all previous sentences of the person;
  • the speaker conversion module is used to determine, for each sentence, the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
  • the emotion recognition module is used to obtain the emotion recognition result of the sentence based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence for each sentence .
  • an emotion recognition device based on cyclic neural network the device includes a processor, and a memory coupled to the processor, and the memory stores the memory used to implement the above-mentioned cycle-based Program instructions of a neural network emotion recognition method; when the processor is used to execute the program instructions stored in the memory, the following steps are implemented:
  • For each sentence determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
  • the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  • a storage medium stores program instructions capable of realizing the above-mentioned cyclic neural network-based emotion recognition method, and when the program instructions are executed by a processor, the following steps are implemented :
  • For each sentence determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
  • the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  • the method, device and storage medium for emotion recognition based on cyclic neural network of the present application obtain the text feature of each sentence in the dialogue content, and encode the text feature of the sentence to obtain the context feature of each sentence; For each sentence, update the speaker status feature of the speaker of the sentence based on the text feature of the sentence; then for each sentence, determine the speaking of the sentence based on the speaker of the sentence and the speaker of the previous sentence Person switching state; Finally, for each sentence, the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the sentence's speaker, and the speaker switching state of the sentence; through the above method , When calculating the probability of emotion label, the switch embedding formed by the speaker switching state strengthens the context characteristics of the sentence and the speaker state characteristics, and improves the accuracy of emotion recognition.
  • Fig. 1 is a flowchart of a method for emotion recognition based on a recurrent neural network according to a first embodiment of the application
  • FIG. 2 is a schematic diagram of the model in the method for emotion recognition based on recurrent neural network according to the first embodiment of this application;
  • FIG. 3 is a flowchart of a method for emotion recognition based on a recurrent neural network according to a second embodiment of the application;
  • FIG. 4 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a third embodiment of the application.
  • FIG. 5 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a fourth embodiment of this application;
  • FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the application.
  • first”, “second”, and “third” in this application are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with “first”, “second”, and “third” may explicitly or implicitly include at least one of the features.
  • "a plurality of” means at least two, such as two, three, etc., unless otherwise specifically defined. All directional indicators (such as up, down, left, right, front, back%) in the embodiments of this application are only used to explain the relative positional relationship between the components in a specific posture (as shown in the drawings) , Movement status, etc., if the specific posture changes, the directional indication will also change accordingly.
  • FIG. 1 is a schematic flowchart of a method for emotion recognition based on a recurrent neural network according to a first embodiment of the present application. It should be noted that if there is substantially the same result, the method of the present application is not limited to the sequence of the process shown in FIG. 1. As shown in Figure 1 and Figure 2, the method for emotion recognition based on recurrent neural network includes the following steps:
  • S101 Acquire the text feature of each sentence in the dialogue content.
  • step S101 first, a natural language tool, such as a word segmentation tool provided by a deep learning framework, is used to segment each sentence in the dialogue content to obtain the word sequence of each sentence.
  • an appropriate vector transformation model for example, select the GloVe model that represents words as real-valued vectors, and convert each word in the word sequence of the sentence into real-valued vectors.
  • V is the dimension of the word vector.
  • V can be 300.
  • a word sequence resulting sentence u i is ⁇ x1, x2, ..., xt ⁇ , where, t is the sentence length sentence u i is, X j is V dimension word in the sentence u i in the j-th word corresponding vector.
  • the convolutional neural network is used as the sentence encoder of this embodiment to extract the text features of the sentence by using each word vector of the sentence.
  • the convolutional neural network includes a Convolutional layer, a pooling layer and a fully connected layer.
  • the convolutional layer uses a variety of convolutional filters of different sizes to extract the n-gram features of the sentence's word sequence, assuming Ui ⁇ R(t ⁇ V) represents the input of a sentence, t is the length of the sentence, V is the dimension of the word vector, x j is the V-dimensional word vector corresponding to the jth word in the sentence, Wa ⁇ R K1 ⁇ V is Convolution filter, K1 is the length of n-gram, that is, the length of the sliding window on the sentence, which is used to extract features in different positions of the sentence.
  • three convolution filters with heights of 3, 4, and 5 are respectively used, the number of each convolution filter is 100, and each convolution filter corresponds to 100 feature maps ( feature map).
  • the feature map output by the convolutional layer is input to the collection layer.
  • the maximum pooling operation (maximum operation) is performed to extract the strongest feature of each feature map, minimize the amount of parameters, and then use a modified linear unit Relu: Rectified Linear Unit)
  • This activation function performs processing, and outputs the processing result to the fully connected layer, and the fully connected layer outputs the text feature of the sentence, that is, the sentence vector u t .
  • S102 Encoding the text feature of each sentence to obtain the context feature of each sentence.
  • a long and short-term memory network is used as a context encoder.
  • the input of the encoder is the sentence vectors of all sentences in the dialogue, these vectors are generated by the sentence encoder, and the output is the sentence encoding fused with context information, that is, the context feature c t of the current sentence.
  • context information that is, the context feature c t of the current sentence.
  • processing is performed through a long and short-term memory model to obtain the context feature vector of the sentence.
  • the forward long-short-term memory feature vector and the backward long-short-term memory feature vector of the sentence are obtained; then, the forward long-short-term memory feature vector of the sentence is obtained.
  • the concatenation of the short-term memory feature vector and the backward long- and short-term memory feature vector obtains the context feature of the sentence.
  • the text feature vector of the first sentence is obtained; the text feature vector of the first sentence is input into the first long- and short-term memory network model to obtain the first Output result; obtain the text feature vector of the second sentence adjacent to the first sentence; input the first output result and the text feature vector of the second sentence into the first long short-term memory
  • the second output result is obtained; for each sentence, the output result of the previous round and the text feature vector of the sentence are input into the first long and short-term memory network model to obtain the context of the sentence Feature vector; repeat the above steps until the context feature of each sentence is obtained.
  • This embodiment adopts the context encoder composed of the first long short-term memory network (LSTM) model, where the LSTM consists of three gates, "forgetting gate, input gate, output gate”.
  • the forgetting gate decides to let those information pass through a cell , Can also be called a cell), the input gate determines how much new information is added to the cell, and the output gate determines what value to output.
  • the cell the neuron of the LSTM
  • the input of this gate is the input x t at the current moment and the output h t-1 at the previous moment.
  • the formula of the forget gate is as follows:
  • ft is the cyclic weight of the forget gate, which is used to indicate how much information the current network has to forget at time t;
  • is the activation function (sigmoid function), which is used to control the value range between 0 and 1;
  • W f Is the input weight of the forgetting gate, and
  • b f is the bias of the forgetting gate.
  • the formula for the input gate is as follows:
  • I t is a cyclic input gates heavy weights, used to represent the time t step, how much information to be input into the network; [sigma] is the activation function (Sigmoid Function) for controlling the range of values between 0 and 1; W i is the input weight of the input gate, and bf is the bias of the left input gate.
  • C t ' tanh(W c ⁇ [h t-1 ,x t ]+b c ), where Ct' is the cell candidate, W c is the input weight of the cell candidate, and x t is the input at the current moment x t , h t-1 are the output at the previous moment, b c is the bias of the cell candidates; tanh is a hyperbolic function, used to control the value range between -1 and 1.
  • the cell state is updated to obtain a new cell state, which is calculated from the selective forgetting of the old cell state and the candidate cell state:
  • C f t * C t -1 + i t * C t ', wherein t where C is the new state value of a cell, i.e., the network output step time t, the current long-term memory for storing a network; F t is forgotten door cycle weight, C t-1 is a cell state value at the previous time, i.e., time t-1 output step network, for storing long-term memory before the time step t; i t is a cyclic right input gates weight, It is used to indicate how much information needs to be entered into the network at time t; C t 'is the cell candidates at the current moment, that is, updates, used to indicate how much information the current network needs to update at time t.
  • the output gate plays a role to determine the output vector h t of the hidden layer at the current moment.
  • o t is the weight of the input gate
  • is the activation function (sigmoid function)
  • W o is the connection weight of the output gate
  • b o is the bias of the output gate
  • x t is the input at the current moment, that is, the time t-step network H t-1 is the output of the previous moment, that is, the output of the network at time t-1, which is used to store the short memory before time t.
  • the output of the hidden layer at the current moment is the activated cell state, which is output through the output gate:
  • o t is the weight of the input gate, which is used to indicate how much information the current network should output at time t;
  • C t is the updated cell state value at the current time, and
  • h t is the output at the current time, that is, time t step
  • the output of the network is used to store the short memory of the current network;
  • tanh is a hyperbolic function used to control the value range between -1 and 1.
  • W f , W i , W c , W o , b f , b i , b c , b o are the parameters of the network, and the network trains these parameters to make the performance better.
  • step S103 in order to model the self-dependence of the speaker in the dialogue, this embodiment uses another long and short-term memory network as the speaker encoder, and sets corresponding parameters for each participant (speaker) of the dialogue. Speaker status, the speaker status of each conversation participant (speaker) is updated only by the sentence said by the participant (speaker).
  • the historical sentence of each speaker is modeled separately as a memory unit, and then the memory of each speaker is merged with the representation of the current sentence through the attention mechanism, thereby simulating the state of the speaker. For each sentence in the conversation content, use the sentence to update the speaker status of the speaker corresponding to the sentence.
  • the sentence and the sentence vector of the sentence use the same symbol It means that the status feature of the speaker of the current sentence is updated by the status feature of the speaker of the current sentence at the previous moment and the text feature u t of the current sentence.
  • the state characteristic s q, t at time t is updated by the following formula:
  • s q,0 is initialized as a zero vector.
  • the speaker encoder of this embodiment is simpler to implement and the effect is also excellent. For other speakers, their speaker status characteristics are not updated.
  • the speaker state can be generated in the following manner: obtain multiple sentences of the speaker in the dialogue content, and input the text features of the multiple sentences into the second long-term short-term memory network model , To extract the status characteristics of the speaker. Specifically, for the first sentence of the speaker, the text feature vector of the first sentence is obtained; the initialization feature of the speaker and the text feature vector of the first sentence are input into the second long short-term memory network model , Obtain the first output feature; obtain the text feature vector of the second sentence adjacent to the first sentence; input the first output feature and the text feature vector of the second sentence into the second long and short term In the memory network model, the second output feature is obtained; the above steps are repeated until the current sentence of the speaker, and the output feature of the previous round and the text feature vector of the current sentence are input to the second long and short-term memory In the network model, the state characteristic st of the speaker is obtained.
  • the speaker’s state features include the speaker’s speech emotion information, and the speaker’s emotional changes are sensed through the speaker’s state features, which is beneficial to the emotional recognition of the speaker’s sentence in the dialogue content. .
  • the speaker’s state features include not only the emotional information of the utterance, but also the attribute information of the speaker.
  • the attribute information includes age, gender, hobbies, speaking style, and place of attribution. And one or more of the educational level.
  • S104 For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence.
  • step S104 in order to more accurately model the dependence between speakers (step S102) and the self-dependence of the speakers (step S103), it is necessary to make the model perceive the speaker switching.
  • the concept of speaker switching state is proposed in this embodiment.
  • the speaker switching state depends on the speaker q(u t ) at time t (the t-th sentence) and the speaker at time t-1 (the t-th sentence).
  • an embedding layer G is used to embed the speaker switching state into a 100-dimensional space, and the parameters of the embedding layer G are updated during model training.
  • S105 For each sentence, obtain an emotion recognition result of the sentence based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  • the emotion label categories include happiness, sadness, neutrality, excitement, anger, and frustration.
  • switch the context feature c t of the current sentence, the state feature st of the speaker of the current sentence, and the speaker switch The states are connected together to form a new vector, the new vector is input to a fully connected layer, and the normalized exponential function is used to output the probability of each emotion label category of the current sentence, and finally the output of each sentence in the dialogue content Probability distribution of sentiment label category.
  • FIG. 3 is a schematic flowchart of a method for emotion recognition based on a recurrent neural network according to the first embodiment of the present application. It should be noted that if there are substantially the same results, the method of the present application is not limited to the sequence of the process shown in FIG. 3. As shown in Figure 3, the method for emotion recognition based on recurrent neural network includes the following steps:
  • step S205 corresponding summary information is obtained based on the text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence, specifically, the summary information
  • the text feature of the sentence, the context feature of the sentence, the speaker state feature of the sentence, and the speaker switching feature of the sentence are hashed, for example, obtained by using the sha256s algorithm.
  • Uploading summary information to the blockchain can ensure its security and fairness and transparency to users.
  • the user equipment can download the summary information from the blockchain to verify whether the text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence have been tampered with .
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • FIG. 4 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a third embodiment of the present application.
  • the device 30 includes a sentence encoder 31, a context encoder 32, a speaker encoder 33, a speaker conversion module 34, and an emotion recognition module 35.
  • the sentence encoder 31 is used to obtain each conversation content.
  • the sentence encoder 31 is used to use natural language tools to segment each sentence in the dialogue content to obtain the word sequence of each sentence; using the GloVe model, separate each word in the word sequence of the sentence Convert it into a corresponding word vector; input the word sequence of the sentence into a convolutional neural network to obtain the sentence vector of each sentence, and use the sentence vector as the text feature.
  • the context encoder 32 is configured to process the first long short-term memory model based on the text feature of each sentence to obtain the context feature of each sentence.
  • the context encoder 32 is used to obtain the text feature of the first sentence in the dialogue content, and input the text feature of the first sentence into the first long-short-term memory network model to obtain the first output Result; obtain the text feature of the second sentence adjacent to the first sentence, and input the first output result and the text feature of the second sentence into the first long-short-term memory network model , Obtain the second output result; for each of the sentences, repeat the above steps until the current sentence, input the output result of the previous round of sentences and the text features of the current sentence into the first long and short-term memory In the network model, the context feature of the current sentence is obtained; the above steps are repeated until the context feature of each sentence is obtained.
  • the speaker encoder 33 is also used to obtain multiple sentences of the speaker in the dialogue content, input the text features of the multiple sentences into the second long and short-term memory network model, and extract the speaker State characteristics. Furthermore, the speaker encoder 33 is configured to obtain the text feature vector of the first sentence of the speaker, and input the initialization feature of the speaker and the text feature vector of the first sentence into the second long and short-term memory. In the network model, the first output feature is obtained; the text feature vector of the second sentence adjacent to the first sentence is obtained, and the text feature vector of the first output feature and the second sentence are input to the first sentence 2.
  • the speaker's state characteristics are obtained.
  • the first value is used as the speaker switching state of the current sentence; when the speaker of the sentence is the same as the speaker of the previous sentence
  • the second value is used as the speaker switching state of the sentence.
  • the first value may be 1, and the second value may be zero.
  • FIG. 5 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a fourth embodiment of the present application.
  • the emotion recognition device 40 based on the cyclic neural network includes a processor 41 and a memory 42 coupled to the processor 41.
  • the memory 42 stores program instructions for realizing the emotion recognition based on the recurrent neural network in any of the above embodiments.
  • the processor 41 is configured to execute program instructions stored in the memory 42 to perform emotion recognition based on the recurrent neural network.
  • the processor 41 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 41 may be an integrated circuit chip with signal processing capabilities.
  • the processor 41 may also be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component .
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the application.
  • the storage medium of the embodiment of the present application stores program instructions 51 that can implement all the above methods, and the storage medium may be non-volatile or volatile.
  • the program instructions 51 may be stored in the above-mentioned storage medium in the form of a software product, and include several instructions for making a computer device (may be a personal computer, a server, or a network device, etc.) or a processor to execute the program. Apply for all or part of the steps of the method described in each embodiment.
  • the aforementioned storage devices include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes.
  • terminal devices such as computers, servers, mobile phones, and tablets.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. The above are only implementations of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly applied to other related technical fields, The same reason is included in the scope of patent protection of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé de reconnaissance d'émotions basé sur un réseau neuronal récurrent, un appareil et un support d'enregistrement. Le procédé consiste à : obtenir des caractéristiques de texte de chaque phrase dans le contenu d'une conversation (S101) ; coder les caractéristiques de texte de chacune desdites phrases pour obtenir les caractéristiques contextuelles de chaque phrase (S102) ; pour chacune des phrases, mettre à jour une caractéristique d'état de locuteur sur la base de la caractéristique de texte de la phrase, la caractéristique d'état de locuteur avant la mise à jour étant obtenue sur la base des caractéristiques de texte de l'ensemble des phrases précédentes dudit locuteur (S103) ; pour chacune des phrases, déterminer un état de commutation de locuteur sur la base du locuteur de la phrase et du locuteur de la phrase précédente (S104) ; pour chacune des phrases, sur la base des caractéristiques de contexte de la phrase, des caractéristiques d'état de locuteur du locuteur de la phrase et de l'état de commutation de locuteur de la phrase, obtenir un résultat de reconnaissance d'émotions de la phrase (S105) ; par le moyen décrit, la précision de la reconnaissance d'émotions est accrue, et les relations de dépendance entre les locuteurs et les locuteurs eux-mêmes sont plus précisément modélisés, ce qui simplifie le processus de calcul, et accroît l'efficacité de calcul sans affecter la précision de la reconnaissance d'émotions.
PCT/CN2020/118498 2020-08-06 2020-09-28 Procédé de reconnaissance d'émotions basé sur un réseau neuronal récurrent, appareil et support d'enregistrement WO2021135457A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010785309.1 2020-08-06
CN202010785309.1A CN111950275B (zh) 2020-08-06 2020-08-06 基于循环神经网络的情绪识别方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2021135457A1 true WO2021135457A1 (fr) 2021-07-08

Family

ID=73331721

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118498 WO2021135457A1 (fr) 2020-08-06 2020-09-28 Procédé de reconnaissance d'émotions basé sur un réseau neuronal récurrent, appareil et support d'enregistrement

Country Status (2)

Country Link
CN (1) CN111950275B (fr)
WO (1) WO2021135457A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114153973A (zh) * 2021-12-07 2022-03-08 内蒙古工业大学 基于t-m bert预训练模型的蒙古语多模态情感分析方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989822B (zh) * 2021-04-16 2021-08-27 北京世纪好未来教育科技有限公司 识别对话中句子类别的方法、装置、电子设备和存储介质
CN113609289A (zh) * 2021-07-06 2021-11-05 河南工业大学 一种基于多模态对话文本的情感识别方法
CN114020897A (zh) * 2021-12-31 2022-02-08 苏州浪潮智能科技有限公司 一种对话情感识别方法及相关装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153642A (zh) * 2017-05-16 2017-09-12 华北电力大学 一种基于神经网络识别文本评论情感倾向的分析方法
CN107609009A (zh) * 2017-07-26 2018-01-19 北京大学深圳研究院 文本情感分析方法、装置、存储介质和计算机设备
CN109299267A (zh) * 2018-10-16 2019-02-01 山西大学 一种文本对话的情绪识别与预测方法
WO2019100350A1 (fr) * 2017-11-24 2019-05-31 Microsoft Technology Licensing, Llc Fourniture d'un résumé d'un document multimédia dans une session
CN110162636A (zh) * 2019-05-30 2019-08-23 中森云链(成都)科技有限责任公司 基于d-lstm的文本情绪原因识别方法
CN110704588A (zh) * 2019-09-04 2020-01-17 平安科技(深圳)有限公司 基于长短期记忆网络的多轮对话语义分析方法和系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI221574B (en) * 2000-09-13 2004-10-01 Agi Inc Sentiment sensing method, perception generation method and device thereof and software
DE60213195T8 (de) * 2002-02-13 2007-10-04 Sony Deutschland Gmbh Verfahren, System und Computerprogramm zur Sprach-/Sprechererkennung unter Verwendung einer Emotionszustandsänderung für die unüberwachte Anpassung des Erkennungsverfahrens
CN101930735B (zh) * 2009-06-23 2012-11-21 富士通株式会社 语音情感识别设备和进行语音情感识别的方法
CN104036776A (zh) * 2014-05-22 2014-09-10 毛峡 一种应用于移动终端的语音情感识别方法
JP6671020B2 (ja) * 2016-06-23 2020-03-25 パナソニックIpマネジメント株式会社 対話行為推定方法、対話行為推定装置及びプログラム
JP6910002B2 (ja) * 2016-06-23 2021-07-28 パナソニックIpマネジメント株式会社 対話行為推定方法、対話行為推定装置及びプログラム
CN110956953B (zh) * 2019-11-29 2023-03-10 中山大学 基于音频分析与深度学习的争吵识别方法
WO2021134417A1 (fr) * 2019-12-31 2021-07-08 深圳市优必选科技股份有限公司 Procédé de prédiction de comportement interactif, dispositif intelligent et support d'informations lisible par ordinateur
CN111274390B (zh) * 2020-01-15 2023-10-27 深圳前海微众银行股份有限公司 一种基于对话数据的情感原因确定方法及装置
CN111339440B (zh) * 2020-02-19 2024-01-23 东南大学 面向新闻文本基于层级状态神经网络的社会情绪排序方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153642A (zh) * 2017-05-16 2017-09-12 华北电力大学 一种基于神经网络识别文本评论情感倾向的分析方法
CN107609009A (zh) * 2017-07-26 2018-01-19 北京大学深圳研究院 文本情感分析方法、装置、存储介质和计算机设备
WO2019100350A1 (fr) * 2017-11-24 2019-05-31 Microsoft Technology Licensing, Llc Fourniture d'un résumé d'un document multimédia dans une session
CN109299267A (zh) * 2018-10-16 2019-02-01 山西大学 一种文本对话的情绪识别与预测方法
CN110162636A (zh) * 2019-05-30 2019-08-23 中森云链(成都)科技有限责任公司 基于d-lstm的文本情绪原因识别方法
CN110704588A (zh) * 2019-09-04 2020-01-17 平安科技(深圳)有限公司 基于长短期记忆网络的多轮对话语义分析方法和系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114153973A (zh) * 2021-12-07 2022-03-08 内蒙古工业大学 基于t-m bert预训练模型的蒙古语多模态情感分析方法

Also Published As

Publication number Publication date
CN111950275A (zh) 2020-11-17
CN111950275B (zh) 2023-01-17

Similar Documents

Publication Publication Date Title
CN109241255B (zh) 一种基于深度学习的意图识别方法
WO2021135457A1 (fr) Procédé de reconnaissance d'émotions basé sur un réseau neuronal récurrent, appareil et support d'enregistrement
WO2018133761A1 (fr) Procédé et dispositif de dialogue homme-machine
KR102668530B1 (ko) 음성 인식 방법, 장치 및 디바이스, 및 저장 매체
CN111897933B (zh) 情感对话生成方法、装置及情感对话模型训练方法、装置
CN110083693B (zh) 机器人对话回复方法及装置
JP6951712B2 (ja) 対話装置、対話システム、対話方法、およびプログラム
CN111966800B (zh) 情感对话生成方法、装置及情感对话模型训练方法、装置
CN110390017B (zh) 基于注意力门控卷积网络的目标情感分析方法及系统
CN106448670A (zh) 基于深度学习和强化学习的自动回复对话系统
CN109887484A (zh) 一种基于对偶学习的语音识别与语音合成方法及装置
CN112528637B (zh) 文本处理模型训练方法、装置、计算机设备和存储介质
CN109271493A (zh) 一种语言文本处理方法、装置和存储介质
CN108228576B (zh) 文本翻译方法及装置
CN109344242B (zh) 一种对话问答方法、装置、设备及存储介质
CN112214585B (zh) 回复消息生成方法、系统、计算机设备及存储介质
CN113094478B (zh) 表情回复方法、装置、设备及存储介质
CN112632244A (zh) 一种人机通话的优化方法、装置、计算机设备及存储介质
CN111598979A (zh) 虚拟角色的面部动画生成方法、装置、设备及存储介质
CN111368066B (zh) 获取对话摘要的方法、装置和计算机可读存储介质
CN113590078A (zh) 虚拟形象合成方法、装置、计算设备及存储介质
CN110597968A (zh) 一种回复选择方法及装置
CN113299277A (zh) 一种语音语义识别方法及系统
CN116049387A (zh) 一种基于图卷积的短文本分类方法、装置、介质
CN112632248A (zh) 问答方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911130

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911130

Country of ref document: EP

Kind code of ref document: A1