WO2021135457A1 - Recurrent neural network-based emotion recognition method, apparatus, and storage medium - Google Patents

Recurrent neural network-based emotion recognition method, apparatus, and storage medium Download PDF

Info

Publication number
WO2021135457A1
WO2021135457A1 PCT/CN2020/118498 CN2020118498W WO2021135457A1 WO 2021135457 A1 WO2021135457 A1 WO 2021135457A1 CN 2020118498 W CN2020118498 W CN 2020118498W WO 2021135457 A1 WO2021135457 A1 WO 2021135457A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
speaker
feature
text
emotion recognition
Prior art date
Application number
PCT/CN2020/118498
Other languages
French (fr)
Chinese (zh)
Inventor
王彦
张加语
马骏
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021135457A1 publication Critical patent/WO2021135457A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence recognition, and in particular to a method, device and storage medium for emotion recognition based on cyclic neural network.
  • models such as DialogueRNN, KET, DialogueGCN in the prior art input contextual sentence vectors into a cyclic neural network or Transformer to model the mutual influence between speakers.
  • DialogueRNN and DialogueGCN respectively use recurrent neural networks and graph convolutional networks to capture the speaker's self-dependence to model the interaction between all sentences belonging to the same speaker.
  • the method in the prior art still has the following problems: on the one hand, it ignores the speaker switching information in the dialogue, and cannot detect whether the speaker has changed, which affects the dependence of the model on the speaker.
  • the understanding of the speaker's own dependence relationship limits the improvement of the accuracy of emotion recognition; on the other hand, the models used in these methods to model the speaker's own dependence relationship are more complex, difficult to implement, and affect calculation efficiency.
  • the purpose of this application is to provide an emotion recognition method, device, and storage medium based on a cyclic neural network, so as to solve the technical problems of low emotion recognition accuracy and low calculation efficiency in the prior art.
  • a method for emotion recognition based on a recurrent neural network including:
  • For each sentence determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
  • the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  • Another technical solution of the present application is as follows: Provide an emotion recognition device based on a cyclic neural network, the device including:
  • Sentence encoder used to obtain the text features of each sentence in the dialogue content
  • a context encoder which is used to encode the text features of each sentence to obtain the context features of each sentence;
  • the speaker encoder is used to update the speaker status feature of the speaker of the sentence based on the text feature of the sentence for each sentence, wherein the speaker status feature before the update is based on the speech Obtained from the text features of all previous sentences of the person;
  • the speaker conversion module is used to determine, for each sentence, the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
  • the emotion recognition module is used to obtain the emotion recognition result of the sentence based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence for each sentence .
  • an emotion recognition device based on cyclic neural network the device includes a processor, and a memory coupled to the processor, and the memory stores the memory used to implement the above-mentioned cycle-based Program instructions of a neural network emotion recognition method; when the processor is used to execute the program instructions stored in the memory, the following steps are implemented:
  • For each sentence determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
  • the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  • a storage medium stores program instructions capable of realizing the above-mentioned cyclic neural network-based emotion recognition method, and when the program instructions are executed by a processor, the following steps are implemented :
  • For each sentence determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
  • the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  • the method, device and storage medium for emotion recognition based on cyclic neural network of the present application obtain the text feature of each sentence in the dialogue content, and encode the text feature of the sentence to obtain the context feature of each sentence; For each sentence, update the speaker status feature of the speaker of the sentence based on the text feature of the sentence; then for each sentence, determine the speaking of the sentence based on the speaker of the sentence and the speaker of the previous sentence Person switching state; Finally, for each sentence, the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the sentence's speaker, and the speaker switching state of the sentence; through the above method , When calculating the probability of emotion label, the switch embedding formed by the speaker switching state strengthens the context characteristics of the sentence and the speaker state characteristics, and improves the accuracy of emotion recognition.
  • Fig. 1 is a flowchart of a method for emotion recognition based on a recurrent neural network according to a first embodiment of the application
  • FIG. 2 is a schematic diagram of the model in the method for emotion recognition based on recurrent neural network according to the first embodiment of this application;
  • FIG. 3 is a flowchart of a method for emotion recognition based on a recurrent neural network according to a second embodiment of the application;
  • FIG. 4 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a third embodiment of the application.
  • FIG. 5 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a fourth embodiment of this application;
  • FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the application.
  • first”, “second”, and “third” in this application are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with “first”, “second”, and “third” may explicitly or implicitly include at least one of the features.
  • "a plurality of” means at least two, such as two, three, etc., unless otherwise specifically defined. All directional indicators (such as up, down, left, right, front, back%) in the embodiments of this application are only used to explain the relative positional relationship between the components in a specific posture (as shown in the drawings) , Movement status, etc., if the specific posture changes, the directional indication will also change accordingly.
  • FIG. 1 is a schematic flowchart of a method for emotion recognition based on a recurrent neural network according to a first embodiment of the present application. It should be noted that if there is substantially the same result, the method of the present application is not limited to the sequence of the process shown in FIG. 1. As shown in Figure 1 and Figure 2, the method for emotion recognition based on recurrent neural network includes the following steps:
  • S101 Acquire the text feature of each sentence in the dialogue content.
  • step S101 first, a natural language tool, such as a word segmentation tool provided by a deep learning framework, is used to segment each sentence in the dialogue content to obtain the word sequence of each sentence.
  • an appropriate vector transformation model for example, select the GloVe model that represents words as real-valued vectors, and convert each word in the word sequence of the sentence into real-valued vectors.
  • V is the dimension of the word vector.
  • V can be 300.
  • a word sequence resulting sentence u i is ⁇ x1, x2, ..., xt ⁇ , where, t is the sentence length sentence u i is, X j is V dimension word in the sentence u i in the j-th word corresponding vector.
  • the convolutional neural network is used as the sentence encoder of this embodiment to extract the text features of the sentence by using each word vector of the sentence.
  • the convolutional neural network includes a Convolutional layer, a pooling layer and a fully connected layer.
  • the convolutional layer uses a variety of convolutional filters of different sizes to extract the n-gram features of the sentence's word sequence, assuming Ui ⁇ R(t ⁇ V) represents the input of a sentence, t is the length of the sentence, V is the dimension of the word vector, x j is the V-dimensional word vector corresponding to the jth word in the sentence, Wa ⁇ R K1 ⁇ V is Convolution filter, K1 is the length of n-gram, that is, the length of the sliding window on the sentence, which is used to extract features in different positions of the sentence.
  • three convolution filters with heights of 3, 4, and 5 are respectively used, the number of each convolution filter is 100, and each convolution filter corresponds to 100 feature maps ( feature map).
  • the feature map output by the convolutional layer is input to the collection layer.
  • the maximum pooling operation (maximum operation) is performed to extract the strongest feature of each feature map, minimize the amount of parameters, and then use a modified linear unit Relu: Rectified Linear Unit)
  • This activation function performs processing, and outputs the processing result to the fully connected layer, and the fully connected layer outputs the text feature of the sentence, that is, the sentence vector u t .
  • S102 Encoding the text feature of each sentence to obtain the context feature of each sentence.
  • a long and short-term memory network is used as a context encoder.
  • the input of the encoder is the sentence vectors of all sentences in the dialogue, these vectors are generated by the sentence encoder, and the output is the sentence encoding fused with context information, that is, the context feature c t of the current sentence.
  • context information that is, the context feature c t of the current sentence.
  • processing is performed through a long and short-term memory model to obtain the context feature vector of the sentence.
  • the forward long-short-term memory feature vector and the backward long-short-term memory feature vector of the sentence are obtained; then, the forward long-short-term memory feature vector of the sentence is obtained.
  • the concatenation of the short-term memory feature vector and the backward long- and short-term memory feature vector obtains the context feature of the sentence.
  • the text feature vector of the first sentence is obtained; the text feature vector of the first sentence is input into the first long- and short-term memory network model to obtain the first Output result; obtain the text feature vector of the second sentence adjacent to the first sentence; input the first output result and the text feature vector of the second sentence into the first long short-term memory
  • the second output result is obtained; for each sentence, the output result of the previous round and the text feature vector of the sentence are input into the first long and short-term memory network model to obtain the context of the sentence Feature vector; repeat the above steps until the context feature of each sentence is obtained.
  • This embodiment adopts the context encoder composed of the first long short-term memory network (LSTM) model, where the LSTM consists of three gates, "forgetting gate, input gate, output gate”.
  • the forgetting gate decides to let those information pass through a cell , Can also be called a cell), the input gate determines how much new information is added to the cell, and the output gate determines what value to output.
  • the cell the neuron of the LSTM
  • the input of this gate is the input x t at the current moment and the output h t-1 at the previous moment.
  • the formula of the forget gate is as follows:
  • ft is the cyclic weight of the forget gate, which is used to indicate how much information the current network has to forget at time t;
  • is the activation function (sigmoid function), which is used to control the value range between 0 and 1;
  • W f Is the input weight of the forgetting gate, and
  • b f is the bias of the forgetting gate.
  • the formula for the input gate is as follows:
  • I t is a cyclic input gates heavy weights, used to represent the time t step, how much information to be input into the network; [sigma] is the activation function (Sigmoid Function) for controlling the range of values between 0 and 1; W i is the input weight of the input gate, and bf is the bias of the left input gate.
  • C t ' tanh(W c ⁇ [h t-1 ,x t ]+b c ), where Ct' is the cell candidate, W c is the input weight of the cell candidate, and x t is the input at the current moment x t , h t-1 are the output at the previous moment, b c is the bias of the cell candidates; tanh is a hyperbolic function, used to control the value range between -1 and 1.
  • the cell state is updated to obtain a new cell state, which is calculated from the selective forgetting of the old cell state and the candidate cell state:
  • C f t * C t -1 + i t * C t ', wherein t where C is the new state value of a cell, i.e., the network output step time t, the current long-term memory for storing a network; F t is forgotten door cycle weight, C t-1 is a cell state value at the previous time, i.e., time t-1 output step network, for storing long-term memory before the time step t; i t is a cyclic right input gates weight, It is used to indicate how much information needs to be entered into the network at time t; C t 'is the cell candidates at the current moment, that is, updates, used to indicate how much information the current network needs to update at time t.
  • the output gate plays a role to determine the output vector h t of the hidden layer at the current moment.
  • o t is the weight of the input gate
  • is the activation function (sigmoid function)
  • W o is the connection weight of the output gate
  • b o is the bias of the output gate
  • x t is the input at the current moment, that is, the time t-step network H t-1 is the output of the previous moment, that is, the output of the network at time t-1, which is used to store the short memory before time t.
  • the output of the hidden layer at the current moment is the activated cell state, which is output through the output gate:
  • o t is the weight of the input gate, which is used to indicate how much information the current network should output at time t;
  • C t is the updated cell state value at the current time, and
  • h t is the output at the current time, that is, time t step
  • the output of the network is used to store the short memory of the current network;
  • tanh is a hyperbolic function used to control the value range between -1 and 1.
  • W f , W i , W c , W o , b f , b i , b c , b o are the parameters of the network, and the network trains these parameters to make the performance better.
  • step S103 in order to model the self-dependence of the speaker in the dialogue, this embodiment uses another long and short-term memory network as the speaker encoder, and sets corresponding parameters for each participant (speaker) of the dialogue. Speaker status, the speaker status of each conversation participant (speaker) is updated only by the sentence said by the participant (speaker).
  • the historical sentence of each speaker is modeled separately as a memory unit, and then the memory of each speaker is merged with the representation of the current sentence through the attention mechanism, thereby simulating the state of the speaker. For each sentence in the conversation content, use the sentence to update the speaker status of the speaker corresponding to the sentence.
  • the sentence and the sentence vector of the sentence use the same symbol It means that the status feature of the speaker of the current sentence is updated by the status feature of the speaker of the current sentence at the previous moment and the text feature u t of the current sentence.
  • the state characteristic s q, t at time t is updated by the following formula:
  • s q,0 is initialized as a zero vector.
  • the speaker encoder of this embodiment is simpler to implement and the effect is also excellent. For other speakers, their speaker status characteristics are not updated.
  • the speaker state can be generated in the following manner: obtain multiple sentences of the speaker in the dialogue content, and input the text features of the multiple sentences into the second long-term short-term memory network model , To extract the status characteristics of the speaker. Specifically, for the first sentence of the speaker, the text feature vector of the first sentence is obtained; the initialization feature of the speaker and the text feature vector of the first sentence are input into the second long short-term memory network model , Obtain the first output feature; obtain the text feature vector of the second sentence adjacent to the first sentence; input the first output feature and the text feature vector of the second sentence into the second long and short term In the memory network model, the second output feature is obtained; the above steps are repeated until the current sentence of the speaker, and the output feature of the previous round and the text feature vector of the current sentence are input to the second long and short-term memory In the network model, the state characteristic st of the speaker is obtained.
  • the speaker’s state features include the speaker’s speech emotion information, and the speaker’s emotional changes are sensed through the speaker’s state features, which is beneficial to the emotional recognition of the speaker’s sentence in the dialogue content. .
  • the speaker’s state features include not only the emotional information of the utterance, but also the attribute information of the speaker.
  • the attribute information includes age, gender, hobbies, speaking style, and place of attribution. And one or more of the educational level.
  • S104 For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence.
  • step S104 in order to more accurately model the dependence between speakers (step S102) and the self-dependence of the speakers (step S103), it is necessary to make the model perceive the speaker switching.
  • the concept of speaker switching state is proposed in this embodiment.
  • the speaker switching state depends on the speaker q(u t ) at time t (the t-th sentence) and the speaker at time t-1 (the t-th sentence).
  • an embedding layer G is used to embed the speaker switching state into a 100-dimensional space, and the parameters of the embedding layer G are updated during model training.
  • S105 For each sentence, obtain an emotion recognition result of the sentence based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  • the emotion label categories include happiness, sadness, neutrality, excitement, anger, and frustration.
  • switch the context feature c t of the current sentence, the state feature st of the speaker of the current sentence, and the speaker switch The states are connected together to form a new vector, the new vector is input to a fully connected layer, and the normalized exponential function is used to output the probability of each emotion label category of the current sentence, and finally the output of each sentence in the dialogue content Probability distribution of sentiment label category.
  • FIG. 3 is a schematic flowchart of a method for emotion recognition based on a recurrent neural network according to the first embodiment of the present application. It should be noted that if there are substantially the same results, the method of the present application is not limited to the sequence of the process shown in FIG. 3. As shown in Figure 3, the method for emotion recognition based on recurrent neural network includes the following steps:
  • step S205 corresponding summary information is obtained based on the text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence, specifically, the summary information
  • the text feature of the sentence, the context feature of the sentence, the speaker state feature of the sentence, and the speaker switching feature of the sentence are hashed, for example, obtained by using the sha256s algorithm.
  • Uploading summary information to the blockchain can ensure its security and fairness and transparency to users.
  • the user equipment can download the summary information from the blockchain to verify whether the text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence have been tampered with .
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • FIG. 4 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a third embodiment of the present application.
  • the device 30 includes a sentence encoder 31, a context encoder 32, a speaker encoder 33, a speaker conversion module 34, and an emotion recognition module 35.
  • the sentence encoder 31 is used to obtain each conversation content.
  • the sentence encoder 31 is used to use natural language tools to segment each sentence in the dialogue content to obtain the word sequence of each sentence; using the GloVe model, separate each word in the word sequence of the sentence Convert it into a corresponding word vector; input the word sequence of the sentence into a convolutional neural network to obtain the sentence vector of each sentence, and use the sentence vector as the text feature.
  • the context encoder 32 is configured to process the first long short-term memory model based on the text feature of each sentence to obtain the context feature of each sentence.
  • the context encoder 32 is used to obtain the text feature of the first sentence in the dialogue content, and input the text feature of the first sentence into the first long-short-term memory network model to obtain the first output Result; obtain the text feature of the second sentence adjacent to the first sentence, and input the first output result and the text feature of the second sentence into the first long-short-term memory network model , Obtain the second output result; for each of the sentences, repeat the above steps until the current sentence, input the output result of the previous round of sentences and the text features of the current sentence into the first long and short-term memory In the network model, the context feature of the current sentence is obtained; the above steps are repeated until the context feature of each sentence is obtained.
  • the speaker encoder 33 is also used to obtain multiple sentences of the speaker in the dialogue content, input the text features of the multiple sentences into the second long and short-term memory network model, and extract the speaker State characteristics. Furthermore, the speaker encoder 33 is configured to obtain the text feature vector of the first sentence of the speaker, and input the initialization feature of the speaker and the text feature vector of the first sentence into the second long and short-term memory. In the network model, the first output feature is obtained; the text feature vector of the second sentence adjacent to the first sentence is obtained, and the text feature vector of the first output feature and the second sentence are input to the first sentence 2.
  • the speaker's state characteristics are obtained.
  • the first value is used as the speaker switching state of the current sentence; when the speaker of the sentence is the same as the speaker of the previous sentence
  • the second value is used as the speaker switching state of the sentence.
  • the first value may be 1, and the second value may be zero.
  • FIG. 5 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a fourth embodiment of the present application.
  • the emotion recognition device 40 based on the cyclic neural network includes a processor 41 and a memory 42 coupled to the processor 41.
  • the memory 42 stores program instructions for realizing the emotion recognition based on the recurrent neural network in any of the above embodiments.
  • the processor 41 is configured to execute program instructions stored in the memory 42 to perform emotion recognition based on the recurrent neural network.
  • the processor 41 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 41 may be an integrated circuit chip with signal processing capabilities.
  • the processor 41 may also be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component .
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the application.
  • the storage medium of the embodiment of the present application stores program instructions 51 that can implement all the above methods, and the storage medium may be non-volatile or volatile.
  • the program instructions 51 may be stored in the above-mentioned storage medium in the form of a software product, and include several instructions for making a computer device (may be a personal computer, a server, or a network device, etc.) or a processor to execute the program. Apply for all or part of the steps of the method described in each embodiment.
  • the aforementioned storage devices include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes.
  • terminal devices such as computers, servers, mobile phones, and tablets.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. The above are only implementations of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly applied to other related technical fields, The same reason is included in the scope of patent protection of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

Provided are a recurrent neural network-based emotion recognition method, apparatus, and storage medium. The method comprises: obtaining text features of each sentence in the content of a conversation (S101); encoding the text features of each of said sentences to obtain the contextual features of each sentence (S102); for each of the sentences, updating a speaker status feature on the basis of the text feature of the sentence, the speaker status feature before the update being obtained on the basis of the text features of all previous sentences of said speaker (S103); for each of the sentences, determining a speaker switching state on the basis of the speaker of the sentence and the speaker of the previous sentence (S104); for each of the sentences, on the basis of the context features of the sentence, the speaker status features of the speaker of the sentence, and the speaker switching status of the sentence, obtaining an emotion recognition result of the sentence (S105); by the described means, the accuracy of emotion recognition is increased, and the dependency relationships between speakers and of the speakers themselves is more accurately modeled, simplifying the calculation process, and increasing calculation efficiency without affecting the accuracy of emotion recognition.

Description

基于循环神经网络的情绪识别方法、装置及存储介质Emotion recognition method, device and storage medium based on cyclic neural network
本申请要求于2020年08月06日提交中国专利局、申请号为202010785309.1,发明名称为“基于循环神经网络的情绪识别方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on August 6, 2020, with the application number 202010785309.1, and the invention titled "Recurrent Neural Network-based Emotion Recognition Method, Apparatus, and Storage Medium". The entire content is approved The reference is incorporated in this application.
【技术领域】【Technical Field】
本申请涉及人工智能识别领域,尤其涉及一种基于循环神经网络的情绪识别方法、装置及存储介质。This application relates to the field of artificial intelligence recognition, and in particular to a method, device and storage medium for emotion recognition based on cyclic neural network.
【背景技术】【Background technique】
在自然语言处理领域,对话情绪识别技术受到越来越多的关注。随着社交媒体平台和会话代理的普及,对话语料愈发增长,从对话中挖掘说话人整体的动态情绪变化变得可行,且通过将对话情绪识别被广泛应用于意见挖掘、医疗健康、呼叫中心等场景,不仅可以挖掘获得说话人的情绪观点,还有助于构建具有情感的智能机器人对话系统。In the field of natural language processing, dialogue emotion recognition technology has received more and more attention. With the popularity of social media platforms and conversational agents, conversational materials have grown, and it has become feasible to mine the overall dynamic emotional changes of the speaker from conversations, and it has been widely used in opinion mining, medical health, and call centers through the use of conversational emotion recognition. Waiting for the scene, not only can mine the emotional point of view of the speaker, but also help to build an emotional intelligent robot dialogue system.
早期的对话情绪识别研究主要基于呼叫中心的对话语料,利用基于词典的方法和音频特征识别情绪。近年来,对话情绪识别的研究主要基于卷积神经网络、循环神经网络、图卷积网络、Transformer等深度学习算法,利用纯文本语料或包含文本、音频、视频数据在内的多模态数据训练情绪识别模型。在这些对话情绪识别方法中,基于词典的方法只根据对话中的单句识别情绪。一些基于深度学习的方法,利用卷及神经网络或其他模型作为句子编码器,为当前句子生成句向量并直接输入全连接网络,最后利用softmax函数输出情绪标签的概率分布。但是,这些方法忽略了对话上下文信息,无法从全局角度建模句子和说话人之间的依赖关系,从而限制了情绪分类准确率的提升。Early research on dialogue emotion recognition was mainly based on call center dialogue data, using dictionary-based methods and audio features to recognize emotions. In recent years, the research on dialogue emotion recognition is mainly based on deep learning algorithms such as convolutional neural networks, recurrent neural networks, graph convolutional networks, Transformers, etc., using plain text corpus or multimodal data training including text, audio, and video data. Emotion recognition model. Among these dialog emotion recognition methods, the dictionary-based method only recognizes emotions based on a single sentence in the dialog. Some methods based on deep learning use volumes and neural networks or other models as sentence encoders to generate sentence vectors for the current sentence and directly input them into the fully connected network, and finally use the softmax function to output the probability distribution of sentiment labels. However, these methods ignore the dialogue context information and cannot model the dependency relationship between the sentence and the speaker from a global perspective, which limits the improvement of the accuracy of emotion classification.
为了解决忽略上下文信息的问题,现有技术中的DialogueRNN,KET,DialogueGCN等模型把上下文的句向量输入循环神经网络或Transformer,以建模说话人之间的相互影响。不仅如此,DialogueRNN和DialogueGCN分别利用循环神经网络和图卷积网络捕获说话人的自身依赖,以建模属于同一说话人的所有句子之间的相互影响。但是,发明人意识到现有技术中的方法还存在如下问题:一方面,忽略了对话中的说话人切换信息,不能察觉说话人是否发生变化,影响了模型对说话人之间的依赖关系和说话人自身依赖关系的理解,从而限制了情绪识别准确率的提升;另一方面,这些方法建模说话人自身依赖关系时所使用的模型比较复杂,不易实现且影响计算效率。In order to solve the problem of ignoring context information, models such as DialogueRNN, KET, DialogueGCN in the prior art input contextual sentence vectors into a cyclic neural network or Transformer to model the mutual influence between speakers. Not only that, DialogueRNN and DialogueGCN respectively use recurrent neural networks and graph convolutional networks to capture the speaker's self-dependence to model the interaction between all sentences belonging to the same speaker. However, the inventor realizes that the method in the prior art still has the following problems: on the one hand, it ignores the speaker switching information in the dialogue, and cannot detect whether the speaker has changed, which affects the dependence of the model on the speaker. The understanding of the speaker's own dependence relationship limits the improvement of the accuracy of emotion recognition; on the other hand, the models used in these methods to model the speaker's own dependence relationship are more complex, difficult to implement, and affect calculation efficiency.
因此,有必要提供一种新的基于循环神经网络的情绪识别方法、装置及存储介质。Therefore, it is necessary to provide a new method, device and storage medium for emotion recognition based on recurrent neural network.
【发明内容】[Summary of the invention]
本申请的目的在于提供一种基于循环神经网络的情绪识别方法、装置及存储介质,以解 决现有技术中情绪识别准确率低以及计算效率低的技术问题。The purpose of this application is to provide an emotion recognition method, device, and storage medium based on a cyclic neural network, so as to solve the technical problems of low emotion recognition accuracy and low calculation efficiency in the prior art.
本申请的技术方案如下:提供一种基于循环神经网络的情绪识别方法,包括:The technical solution of the present application is as follows: A method for emotion recognition based on a recurrent neural network is provided, including:
获取对话内容中每个句子的文本特征;Obtain the text features of each sentence in the dialogue content;
将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征;Encoding the text feature of each sentence to obtain the context feature of each sentence;
针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的;For each sentence, update the speaker status feature of the speaker of the sentence based on the text feature of the sentence, wherein the speaker status feature before the update is based on the text of all previous sentences of the speaker Acquired by characteristics;
针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态;For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态,获取所述句子的情绪识别结果。For each sentence, the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
本申请的另一技术方案如下:提供一种基于循环神经网络的情绪识别装置,所述装置包括:Another technical solution of the present application is as follows: Provide an emotion recognition device based on a cyclic neural network, the device including:
句子编码器,用于获取对话内容中每个句子的文本特征;Sentence encoder, used to obtain the text features of each sentence in the dialogue content;
上下文编码器,用于将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征;A context encoder, which is used to encode the text features of each sentence to obtain the context features of each sentence;
说话人编码器,用于针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的;The speaker encoder is used to update the speaker status feature of the speaker of the sentence based on the text feature of the sentence for each sentence, wherein the speaker status feature before the update is based on the speech Obtained from the text features of all previous sentences of the person;
说话人转换模块,用于针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态;以及The speaker conversion module is used to determine, for each sentence, the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence; and
情绪识别模块,用于针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态,获取所述句子的情绪识别结果。The emotion recognition module is used to obtain the emotion recognition result of the sentence based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence for each sentence .
本申请的另一技术方案如下:提供一种基于循环神经网络的情绪识别装置,所述装置包括处理器、以及与所述处理器耦接的存储器,所述存储器存储有用于实现上述的基于循环神经网络的情绪识别方法的程序指令;所述处理器用于执行所述存储器存储的所述程序指令时实现以下步骤:Another technical solution of the present application is as follows: Provide an emotion recognition device based on cyclic neural network, the device includes a processor, and a memory coupled to the processor, and the memory stores the memory used to implement the above-mentioned cycle-based Program instructions of a neural network emotion recognition method; when the processor is used to execute the program instructions stored in the memory, the following steps are implemented:
获取对话内容中每个句子的文本特征;Obtain the text features of each sentence in the dialogue content;
将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征;Encoding the text feature of each sentence to obtain the context feature of each sentence;
针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的;For each sentence, update the speaker status feature of the speaker of the sentence based on the text feature of the sentence, wherein the speaker status feature before the update is based on the text of all previous sentences of the speaker Acquired by characteristics;
针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态;For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征 以及所述句子的说话人切换状态,获取所述句子的情绪识别结果。For each sentence, the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
本申请的另一技术方案如下:提供一种存储介质,所述存储介质内存储有能够实现上述的基于循环神经网络的情绪识别方法的程序指令,所述程序指令被处理器执行时实现以下步骤:Another technical solution of the present application is as follows: a storage medium is provided, and the storage medium stores program instructions capable of realizing the above-mentioned cyclic neural network-based emotion recognition method, and when the program instructions are executed by a processor, the following steps are implemented :
获取对话内容中每个句子的文本特征;Obtain the text features of each sentence in the dialogue content;
将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征;Encoding the text feature of each sentence to obtain the context feature of each sentence;
针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的;For each sentence, update the speaker status feature of the speaker of the sentence based on the text feature of the sentence, wherein the speaker status feature before the update is based on the text of all previous sentences of the speaker Acquired by characteristics;
针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态;For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态,获取所述句子的情绪识别结果。For each sentence, the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
本申请的基于循环神经网络的情绪识别方法、装置及存储介质,获取对话内容中每个句子的文本特征,并将所述句子的文本特征进行编码,以获取每个句子的上下文特征;然后针对每个句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征;然后针对每个句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态;最后针对每个句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态获取所述句子的情绪识别结果;通过上述方式,在计算情绪标签概率时,通过说话人切换状态形成的切换嵌入强化了句子的上下文特征和说话人状态特征,提高了情绪识别准确率,同时,通过感知对话中的说话人切换状态,能够更加准确地建模说话人之间和说话人自身的依赖关系;基于说话人自身的句子建模说话人状态特征,简化了计算过程,在并未影响情绪识别准确率的前提下提高了计算效率。The method, device and storage medium for emotion recognition based on cyclic neural network of the present application obtain the text feature of each sentence in the dialogue content, and encode the text feature of the sentence to obtain the context feature of each sentence; For each sentence, update the speaker status feature of the speaker of the sentence based on the text feature of the sentence; then for each sentence, determine the speaking of the sentence based on the speaker of the sentence and the speaker of the previous sentence Person switching state; Finally, for each sentence, the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the sentence's speaker, and the speaker switching state of the sentence; through the above method , When calculating the probability of emotion label, the switch embedding formed by the speaker switching state strengthens the context characteristics of the sentence and the speaker state characteristics, and improves the accuracy of emotion recognition. At the same time, it can be more improved by perceiving the speaker switching state in the dialogue. Accurately model the dependence relationship between speakers and the speaker itself; modeling speaker state characteristics based on the speaker's own sentence, simplifying the calculation process, and improving the calculation efficiency without affecting the accuracy of emotion recognition.
【附图说明】【Explanation of the drawings】
图1为本申请第一实施例的基于循环神经网络的情绪识别方法的流程图;Fig. 1 is a flowchart of a method for emotion recognition based on a recurrent neural network according to a first embodiment of the application;
图2为本申请第一实施例的基于循环神经网络的情绪识别方法中的模型原理图;2 is a schematic diagram of the model in the method for emotion recognition based on recurrent neural network according to the first embodiment of this application;
图3为本申请第二实施例的基于循环神经网络的情绪识别方法的流程图;3 is a flowchart of a method for emotion recognition based on a recurrent neural network according to a second embodiment of the application;
图4为本申请第三实施例的基于循环神经网络的情绪识别装置的结构示意图;4 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a third embodiment of the application;
图5为本申请第四实施例的基于循环神经网络的情绪识别装置的结构示意图;FIG. 5 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a fourth embodiment of this application;
图6为本申请实施例的存储介质的结构示意图。FIG. 6 is a schematic structural diagram of a storage medium according to an embodiment of the application.
【具体实施方式】【Detailed ways】
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例, 都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present application in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
本申请中的术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”、“第三”的特征可以明示或者隐含地包括至少一个该特征。本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。本申请实施例中所有方向性指示(诸如上、下、左、右、前、后……)仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等,如果该特定姿态发生改变时,则该方向性指示也相应地随之改变。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", and "third" in this application are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, the features defined with “first”, “second”, and “third” may explicitly or implicitly include at least one of the features. In the description of this application, "a plurality of" means at least two, such as two, three, etc., unless otherwise specifically defined. All directional indicators (such as up, down, left, right, front, back...) in the embodiments of this application are only used to explain the relative positional relationship between the components in a specific posture (as shown in the drawings) , Movement status, etc., if the specific posture changes, the directional indication will also change accordingly. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
图1是本申请第一实施例的基于循环神经网络的情绪识别方法的流程示意图。需注意的是,若有实质上相同的结果,本申请的方法并不以图1所示的流程顺序为限。如图1和图2所示,该基于循环神经网络的情绪识别方法包括步骤:FIG. 1 is a schematic flowchart of a method for emotion recognition based on a recurrent neural network according to a first embodiment of the present application. It should be noted that if there is substantially the same result, the method of the present application is not limited to the sequence of the process shown in FIG. 1. As shown in Figure 1 and Figure 2, the method for emotion recognition based on recurrent neural network includes the following steps:
S101,获取对话内容中每个句子的文本特征。S101: Acquire the text feature of each sentence in the dialogue content.
在步骤S101中,首先,采用自然语言工具,比如深度学习框架提供的分词工具,将对话内容中的每个句子进行分词,得到每个句子的词序列。然后选择合适的向量转化模型,例如选择将词表征为实数值向量的GloVe模型,将句子的词序列中的每个词分别转换为实数值向量,V是词向量的维度,例如,V可以为300。具体地,所得句子u i的词序列为{x1,x2,…,xt},其中,t为句子u i的句子长度,x j为句子u i中第j个单词所对应的V维的词向量。 In step S101, first, a natural language tool, such as a word segmentation tool provided by a deep learning framework, is used to segment each sentence in the dialogue content to obtain the word sequence of each sentence. Then select an appropriate vector transformation model, for example, select the GloVe model that represents words as real-valued vectors, and convert each word in the word sequence of the sentence into real-valued vectors. V is the dimension of the word vector. For example, V can be 300. Specifically, a word sequence resulting sentence u i is {x1, x2, ..., xt }, where, t is the sentence length sentence u i is, X j is V dimension word in the sentence u i in the j-th word corresponding vector.
随后,将句子u i的词序列输入至卷积神经网络中,该卷积神经网络作为本实施例的句子编码器,利用句子的各词向量提取句子的文本特征,该卷积神经网络包括一个卷积层、一个汇集层(poolinglayer)以及一个全连接层(fully connected layer),卷积层使用多种尺寸不同的卷积过滤器对句子的词序列的n-gram特征进行提取,假设Ui∈R(t×V)表示一个句子的输入,t是句子的长度,V是词向量的维度,x j为句子中第j个单词所对应的V维的词向量,Wa∈R K1×V为卷积过滤器,K1是n-gram的长度,即句子上的滑动窗口的长度,用于在句子的不同位置抽取特征。在一个可选的实施方式中,分别采用高度分别为3、4和5的三种卷积过滤器,每种卷积过滤器的数量为100,每种卷积过滤器对应100个特征图(feature map)。卷积层输出的特征图被输入至汇集层中,首先进行最大池化操作(取最大值操作),提取出每个特征图最强的特征,最大限度减少参数量,随后利用修正线性单元(Relu:Rectified  Linear Unit)这个激活函数进行处理,将处理结果输出至全连接层,全连接层输出该句子的文本特征,即句向量u tSubsequently, the word sequence of the sentence u i is input into the convolutional neural network. The convolutional neural network is used as the sentence encoder of this embodiment to extract the text features of the sentence by using each word vector of the sentence. The convolutional neural network includes a Convolutional layer, a pooling layer and a fully connected layer. The convolutional layer uses a variety of convolutional filters of different sizes to extract the n-gram features of the sentence's word sequence, assuming Ui ∈ R(t×V) represents the input of a sentence, t is the length of the sentence, V is the dimension of the word vector, x j is the V-dimensional word vector corresponding to the jth word in the sentence, Wa∈R K1×V is Convolution filter, K1 is the length of n-gram, that is, the length of the sliding window on the sentence, which is used to extract features in different positions of the sentence. In an alternative embodiment, three convolution filters with heights of 3, 4, and 5 are respectively used, the number of each convolution filter is 100, and each convolution filter corresponds to 100 feature maps ( feature map). The feature map output by the convolutional layer is input to the collection layer. First, the maximum pooling operation (maximum operation) is performed to extract the strongest feature of each feature map, minimize the amount of parameters, and then use a modified linear unit Relu: Rectified Linear Unit) This activation function performs processing, and outputs the processing result to the fully connected layer, and the fully connected layer outputs the text feature of the sentence, that is, the sentence vector u t .
S102,将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征。S102: Encoding the text feature of each sentence to obtain the context feature of each sentence.
在步骤S102中,为了建模对话上下文对当前句子的影响即说话人之间的依赖关系,使用了一个长短期记忆网络作为上下文编码器。该编码器的输入是对话中所有句子的句向量,这些向量由句子编码器产生,输出是融合了上下文信息的句子编码,即当前句子的上下文特征c t。实验表明,由长短期记忆网络构成的上下文编码器可以有效捕获上下文信息,建模了当前句子和对话中其他句子之间的依赖关系。也就是说,在步骤S102中,将对话内容中的句向量进行上下文编码,得到的上下文特征向量为包含句子之间语义关系的文本表示向量。 In step S102, in order to model the influence of the dialogue context on the current sentence, that is, the dependence relationship between speakers, a long and short-term memory network is used as a context encoder. The input of the encoder is the sentence vectors of all sentences in the dialogue, these vectors are generated by the sentence encoder, and the output is the sentence encoding fused with context information, that is, the context feature c t of the current sentence. Experiments show that the context encoder composed of long and short-term memory networks can effectively capture contextual information and model the dependency between the current sentence and other sentences in the dialogue. That is, in step S102, the sentence vector in the dialogue content is context-encoded, and the obtained context feature vector is a text representation vector containing the semantic relationship between sentences.
在一个可选的实施方式中,基于所述句子的文本特征向量,通过长短期记忆模型进行处理,得到所述句子的上下文特征向量。首先,根据所述句子的文本特征向量,通过所述长短期记忆模型,得到所述句子的前向长短期记忆特征向量与后向长短期记忆特征向量;然后,通过所述句子的前向长短期记忆特征向量与后向长短期记忆特征向量的拼接,得到所述句子的上下文特征。In an optional implementation manner, based on the text feature vector of the sentence, processing is performed through a long and short-term memory model to obtain the context feature vector of the sentence. First, according to the text feature vector of the sentence, through the long short-term memory model, the forward long-short-term memory feature vector and the backward long-short-term memory feature vector of the sentence are obtained; then, the forward long-short-term memory feature vector of the sentence is obtained. The concatenation of the short-term memory feature vector and the backward long- and short-term memory feature vector obtains the context feature of the sentence.
具体地,对于对话内容中的第一个句子,获取所述第一个句子的文本特征向量;将所述第一个句子的文本特征向量输入至第一长短期记忆网络模型中,得到第一输出结果;获取与所述第一个句子相邻的第二个句子的文本特征向量;将所述第一输出结果和所述第二个句子的文本特征向量输入至所述第一长短期记忆网络模型中,得到第二输出结果;针对每个句子,,将上一轮的输出结果以及所述句子的文本特征向量输入至所述第一长短期记忆网络模型中,得到所述句子的上下文特征向量;重复上述步骤,直至得到每个所述句子的上下文特征。Specifically, for the first sentence in the dialogue content, the text feature vector of the first sentence is obtained; the text feature vector of the first sentence is input into the first long- and short-term memory network model to obtain the first Output result; obtain the text feature vector of the second sentence adjacent to the first sentence; input the first output result and the text feature vector of the second sentence into the first long short-term memory In the network model, the second output result is obtained; for each sentence, the output result of the previous round and the text feature vector of the sentence are input into the first long and short-term memory network model to obtain the context of the sentence Feature vector; repeat the above steps until the context feature of each sentence is obtained.
本实施例采用第一长短期记忆网络(LSTM)模型构成的上下文编码器,其中LSTM由三个门组成,“遗忘门,输入门,输出门”,遗忘门决定让那些信息通过一个单元(cell,也可以称为细胞),输入门决定让多少新的信息加入到单元(cell),输出门决定输出什么值。具体地,LSTM在时刻t接收前一时刻的信息时,细胞(LSTM的神经元)首先要决定遗忘掉部分信息,遗忘门控制着遗忘的参数。该门的输入是当前时刻的输入x t和前一时刻的输出h t-1,遗忘门的公式如下所示: This embodiment adopts the context encoder composed of the first long short-term memory network (LSTM) model, where the LSTM consists of three gates, "forgetting gate, input gate, output gate". The forgetting gate decides to let those information pass through a cell , Can also be called a cell), the input gate determines how much new information is added to the cell, and the output gate determines what value to output. Specifically, when the LSTM receives the information from the previous moment at time t, the cell (the neuron of the LSTM) must first decide to forget part of the information, and the forgetting gate controls the parameters of the forgetting. The input of this gate is the input x t at the current moment and the output h t-1 at the previous moment. The formula of the forget gate is as follows:
f t=σ(W f·[h t-1,x t]+b f) f t =σ(W f ·[h t-1 ,x t ]+b f )
其中ft是遗忘门的循环权重,即用来表示在时间t步,当前网络要遗忘多少信息;σ是激活函数(sigmoid函数),用于把值的范围控制在0到1之间;W f是遗忘门的输入权重,b f是遗忘门的偏置。 Where ft is the cyclic weight of the forget gate, which is used to indicate how much information the current network has to forget at time t; σ is the activation function (sigmoid function), which is used to control the value range between 0 and 1; W f Is the input weight of the forgetting gate, and b f is the bias of the forgetting gate.
在丢弃无用的信息之后,细胞需要决定吸收哪些新输入的信息,输入门的公式如下所示:After discarding the useless information, the cell needs to decide which new input information to absorb. The formula for the input gate is as follows:
i t=σ(W i·[h t-1,x t]+b i) i t =σ(W i ·[h t-1 ,x t ]+b i )
其中i t是输入门的循环权重,用来表示在时间t步,要往网络中输入多少信息;σ是激 活函数(sigmoid函数),用于把值的范围控制在0到1之间;W i是输入门的输入权重,bf是遗输入门的偏置。 Where I t is a cyclic input gates heavy weights, used to represent the time t step, how much information to be input into the network; [sigma] is the activation function (Sigmoid Function) for controlling the range of values between 0 and 1; W i is the input weight of the input gate, and bf is the bias of the left input gate.
当前时刻细胞候选项:Cell candidates at the current moment:
C t’=tanh(W c·[h t-1,x t]+b c),其中,Ct’为细胞的候选项,W c是细胞候选项的输入权重,x t是当前时刻的输入x t,h t-1是前一时刻的输出,b c是细胞候选项的偏置;tanh是双曲函数,用于把值的范围控制在-1到1之间。 C t '=tanh(W c ·[h t-1 ,x t ]+b c ), where Ct' is the cell candidate, W c is the input weight of the cell candidate, and x t is the input at the current moment x t , h t-1 are the output at the previous moment, b c is the bias of the cell candidates; tanh is a hyperbolic function, used to control the value range between -1 and 1.
对细胞状态进行更新,得到新的细胞状态,由旧的细胞状态选择性遗忘和候选细胞状态计算得来:The cell state is updated to obtain a new cell state, which is calculated from the selective forgetting of the old cell state and the candidate cell state:
C t=f t*C t-1+i t*C t’,其中,其中C t是新的细胞状态值,即时间t步网络的输出,用来存储当前网络的长记忆;f t是遗忘门的循环权重,C t-1是上一时刻的细胞状态值,即是时间t-1步网络的输出,用来存储t时间步之前的长记忆;i t是输入门的循环权重,用来表示在时间t步,要往网络中输入多少信息;C t’是当前时刻的细胞候选项,即更新们,用来表示在时间t步,当前网络要更新多少信息。 C t = f t * C t -1 + i t * C t ', wherein t where C is the new state value of a cell, i.e., the network output step time t, the current long-term memory for storing a network; F t is forgotten door cycle weight, C t-1 is a cell state value at the previous time, i.e., time t-1 output step network, for storing long-term memory before the time step t; i t is a cyclic right input gates weight, It is used to indicate how much information needs to be entered into the network at time t; C t 'is the cell candidates at the current moment, that is, updates, used to indicate how much information the current network needs to update at time t.
最后由输出门发挥作用,决定当前时刻隐藏层的输出向量h t,输出门的定义: Finally, the output gate plays a role to determine the output vector h t of the hidden layer at the current moment. The definition of the output gate:
o t=σ(W o·[h t-1,x t]+b o) o t =σ(W o ·[h t-1 ,x t ]+b o )
其中,o t是输入门的权重,σ是激活函数(sigmoid函数),W o是输出门的连接权重,b o是输出门的偏置,x t是当前时刻的输入,即时间t步网络的输入;h t-1是前一时刻的输出,即时间t-1步网络的输出,用来存储t时间步之前的短记忆。 Among them, o t is the weight of the input gate, σ is the activation function (sigmoid function), W o is the connection weight of the output gate, b o is the bias of the output gate, and x t is the input at the current moment, that is, the time t-step network H t-1 is the output of the previous moment, that is, the output of the network at time t-1, which is used to store the short memory before time t.
当前时刻隐藏层的输出是激活后的细胞状态经由输出门向外输出:The output of the hidden layer at the current moment is the activated cell state, which is output through the output gate:
h t=o t*tanh(C t) h t =o t *tanh(C t )
其中,o t是输入门的权重,用来表示在时间t步,当前网络要输出多少信息;C t是更新后的当前时刻的细胞状态值,h t是当前时刻的输出,即时间t步网络的输出,用来存储当前网络的短记忆;tanh是双曲函数,用于把值的范围控制在-1到1之间。 Among them, o t is the weight of the input gate, which is used to indicate how much information the current network should output at time t; C t is the updated cell state value at the current time, and h t is the output at the current time, that is, time t step The output of the network is used to store the short memory of the current network; tanh is a hyperbolic function used to control the value range between -1 and 1.
其中,W f,W i,W c,W o,b f,b i,b c,b o,是网络的参数,网络通过训练这些参数让性能更优。 Among them, W f , W i , W c , W o , b f , b i , b c , b o are the parameters of the network, and the network trains these parameters to make the performance better.
S103,针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的。S103. For each sentence, update the speaker status feature of the speaker of the sentence based on the text feature of the sentence, wherein the speaker status feature before the update is based on all previous sentences of the speaker Of the text features.
在步骤S103中,为了建模对话中说话人的自身依赖关系,本实施例使用了另一个长短期记忆网络作为说话人编码器,为对话的每个参与者(说话人)都设置了相应的说话人状态,每个对话参与者(说话人)的说话人状态仅由该参与者(说话人)自己所说的句子更新。在本实施例中,分别建模每个说话人的历史句子,作为记忆单元,然后通过注意力机制将每个说话人的记忆与当前句子的表示进行融合,从而模拟说话人的状态。针对对话内容中的每个 句子,利用该句子对该句子对应的说话人的说话人状态进行更新,具体地,对于当前句子u t,为了描述简单,句子及该句子的句向量均用同一符号表示,当前句子说话人的状态特征由当前句子说话人前一时刻的状态特征以及当前句子的文本特征u t进行更新,该句子u t的说话人为q=q(u t),则说话人q在t时刻的状态特征s q,t由以下公式更新: In step S103, in order to model the self-dependence of the speaker in the dialogue, this embodiment uses another long and short-term memory network as the speaker encoder, and sets corresponding parameters for each participant (speaker) of the dialogue. Speaker status, the speaker status of each conversation participant (speaker) is updated only by the sentence said by the participant (speaker). In this embodiment, the historical sentence of each speaker is modeled separately as a memory unit, and then the memory of each speaker is merged with the representation of the current sentence through the attention mechanism, thereby simulating the state of the speaker. For each sentence in the conversation content, use the sentence to update the speaker status of the speaker corresponding to the sentence. Specifically, for the current sentence u t , in order to simplify the description, the sentence and the sentence vector of the sentence use the same symbol It means that the status feature of the speaker of the current sentence is updated by the status feature of the speaker of the current sentence at the previous moment and the text feature u t of the current sentence. The speaker of the sentence u t is q=q(u t ), then the speaker q is in The state characteristic s q, t at time t is updated by the following formula:
s q,t=LSTM(u t) s q,t = LSTM(u t )
其中,s q,0初始化为零向量。不同于DialogueRNN和DialogueGCN中相对复杂的、需要考虑其他人所说句子的说话人编码器,本实施例的说话人编码器实现更加简单,效果同样出色。对于其他说话人,其说话人状态特征不更新。 Among them, s q,0 is initialized as a zero vector. Different from the relatively complex speaker encoders in DialogueRNN and DialogueGCN, which need to consider sentences spoken by other people, the speaker encoder of this embodiment is simpler to implement and the effect is also excellent. For other speakers, their speaker status characteristics are not updated.
对于一个说话人,其说话人状态的生成可以采用下述方式实现:获取该说话人在对话内容中的多个句子,将所述多个句子的文本特征输入至第二长短期记忆网络模型中,提取该说话人的状态特征。具体地,对于该说话人的第一句子,获取所述第一句子的文本特征向量;将该说话人的初始化特征以及所述第一句子的文本特征向量输入至第二长短期记忆网络模型中,得到第一输出特征;获取与所述第一句子相邻的第二句子的文本特征向量;将所述第一输出特征和所述第二句子的文本特征向量输入至所述第二长短期记忆网络模型中,得到第二输出特征;重复执行上述步骤,直至所述说话人的当前句子,将上一轮的输出特征以及所述当前句子的文本特征向量输入至所述第二长短期记忆网络模型中,得到该说话人的状态特征s tFor a speaker, the speaker state can be generated in the following manner: obtain multiple sentences of the speaker in the dialogue content, and input the text features of the multiple sentences into the second long-term short-term memory network model , To extract the status characteristics of the speaker. Specifically, for the first sentence of the speaker, the text feature vector of the first sentence is obtained; the initialization feature of the speaker and the text feature vector of the first sentence are input into the second long short-term memory network model , Obtain the first output feature; obtain the text feature vector of the second sentence adjacent to the first sentence; input the first output feature and the text feature vector of the second sentence into the second long and short term In the memory network model, the second output feature is obtained; the above steps are repeated until the current sentence of the speaker, and the output feature of the previous round and the text feature vector of the current sentence are input to the second long and short-term memory In the network model, the state characteristic st of the speaker is obtained.
在一个可选的实施方式中,该说话人的状态特征包含说话人的话语情感信息,通过说话人的状态特征感知该说话人的情绪变化,有利于对话内容中该说话人的句子的情绪识别。In an optional embodiment, the speaker’s state features include the speaker’s speech emotion information, and the speaker’s emotional changes are sensed through the speaker’s state features, which is beneficial to the emotional recognition of the speaker’s sentence in the dialogue content. .
在另一个可选的实施方式中,该说话人的状态特征除了包括话语情感信息,还可以包括该说话人的属性信息,例如,该属性信息包括年龄、性别、兴趣爱好、说话风格、归属地及受教育程度中的一种或多种。In another optional embodiment, the speaker’s state features include not only the emotional information of the utterance, but also the attribute information of the speaker. For example, the attribute information includes age, gender, hobbies, speaking style, and place of attribution. And one or more of the educational level.
S104,针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态。S104: For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence.
在步骤S104中,为了更加准确地建模说话人之间的依赖(步骤S102)和说话人的自身依赖(步骤S103),需要让模型感知到说话人的切换。基于此目的,本实施例中提出了说话人切换状态的概念。对于对话内容中的当前句子u t,当前句子为第t个句子,其说话人切换状态依赖于t时刻(第t个句子)的说话人q(u t)和t-1时刻(第t-1个句子)的说话人q(u t-1):如果这两个说话人相同,则t时刻(第t个句子)的说话人切换状态为第一数值,否则为第二数值,具体地,第一数值可以为1,第二数值可以为0,以下公式描述了t时刻说话人切换状态b t的计算: In step S104, in order to more accurately model the dependence between speakers (step S102) and the self-dependence of the speakers (step S103), it is necessary to make the model perceive the speaker switching. For this purpose, the concept of speaker switching state is proposed in this embodiment. For the current sentence u t in the dialogue content, the current sentence is the t-th sentence, and the speaker switching state depends on the speaker q(u t ) at time t (the t-th sentence) and the speaker at time t-1 (the t-th sentence). 1 sentence) speaker q(u t-1 ): If the two speakers are the same, the speaker switching state at time t (t sentence) is the first value, otherwise it is the second value, specifically , The first value can be 1, and the second value can be 0. The following formula describes the calculation of the speaker switching state b t at time t:
Figure PCTCN2020118498-appb-000001
Figure PCTCN2020118498-appb-000001
本实施例中使用一个嵌入层G将说话人切换状态嵌入到一个100维的空间,该嵌入层G的参数在模型训练期间更新。In this embodiment, an embedding layer G is used to embed the speaker switching state into a 100-dimensional space, and the parameters of the embedding layer G are updated during model training.
S105,针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态获取所述句子的情绪识别结果。S105: For each sentence, obtain an emotion recognition result of the sentence based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
在步骤S105中,情绪标签类别包括快乐、悲伤、中立、兴奋、愤怒以及沮丧,对于当前句子而言,将当前句子的上下文特征c t、当前句子的说话人的状态特征s t以及说话人切换状态连接在一起形成一个新的向量,将该新的向量输入至一个全连接层,并通过归一化指数函数输出当前句子的每种情绪标签类别的概率,最终输出对话内容中每一句话的情绪标签类别概率分布。 In step S105, the emotion label categories include happiness, sadness, neutrality, excitement, anger, and frustration. For the current sentence, switch the context feature c t of the current sentence, the state feature st of the speaker of the current sentence, and the speaker switch The states are connected together to form a new vector, the new vector is input to a fully connected layer, and the normalized exponential function is used to output the probability of each emotion label category of the current sentence, and finally the output of each sentence in the dialogue content Probability distribution of sentiment label category.
图3是本申请第一实施例的基于循环神经网络的情绪识别方法的流程示意图。需注意的是,若有实质上相同的结果,本申请的方法并不以图3所示的流程顺序为限。如图3所示,该基于循环神经网络的情绪识别方法包括步骤:FIG. 3 is a schematic flowchart of a method for emotion recognition based on a recurrent neural network according to the first embodiment of the present application. It should be noted that if there are substantially the same results, the method of the present application is not limited to the sequence of the process shown in FIG. 3. As shown in Figure 3, the method for emotion recognition based on recurrent neural network includes the following steps:
S201,获取对话内容中每个句子的文本特征。S201: Acquire the text feature of each sentence in the dialogue content.
S202,将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征。S202: Encoding the text feature of each sentence to obtain the context feature of each sentence.
S203,针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的。S203. For each sentence, update the speaker status feature of the speaker of the sentence based on the text feature of the sentence, wherein the speaker status feature before the update is based on all previous sentences of the speaker Of the text features.
S204,针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态。S204: For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence.
S205,将所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征上传至区块链中,以使得所述区块链对所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征进行加密存储。S205. Upload the text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence to the blockchain, so that the blockchain can compare all The text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence are encrypted and stored.
S206,针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态获取所述句子的情绪识别结果。S206: For each sentence, obtain the emotion recognition result of the sentence based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
在步骤S205中,分别基于所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征得到对应的摘要信息,具体来说,摘要信息由所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征进行散列处理得到,比如利用sha256s算法处理得到。将摘要信息上传至区块链可保证其安全性和对用户的公正透明性。用户设备可以从区块链中下载得该摘要信息,以便查证所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征是否被篡改。本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中 心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。In step S205, corresponding summary information is obtained based on the text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence, specifically, the summary information The text feature of the sentence, the context feature of the sentence, the speaker state feature of the sentence, and the speaker switching feature of the sentence are hashed, for example, obtained by using the sha256s algorithm. Uploading summary information to the blockchain can ensure its security and fairness and transparency to users. The user equipment can download the summary information from the blockchain to verify whether the text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence have been tampered with . The blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
其他步骤具体参见第一实施例的说明,在此不进行一一赘述。For details of other steps, refer to the description of the first embodiment, which will not be repeated here.
图4是本申请第三实施例的基于循环神经网络的情绪识别装置的结构示意图。如图4所示,该装置30包括句子编码器31、上下文编码器32、说话人编码器33、说话人转换模块34以及情绪识别模块35,其中,句子编码器31用于获取对话内容中每个句子的文本特征;上下文编码器32用于将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征;说话人编码器33用于针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的;说话人转换模块34用于针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态;情绪识别模块35用于针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态获取所述句子的情绪识别结果。FIG. 4 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a third embodiment of the present application. As shown in FIG. 4, the device 30 includes a sentence encoder 31, a context encoder 32, a speaker encoder 33, a speaker conversion module 34, and an emotion recognition module 35. The sentence encoder 31 is used to obtain each conversation content. The text feature of each sentence; the context encoder 32 is used to encode the text feature of each sentence to obtain the context feature of each sentence; the speaker encoder 33 is used for each sentence based on the The text feature of the sentence updates the speaker status feature of the speaker of the sentence, wherein the speaker status feature before the update is obtained based on the text features of all previous sentences of the speaker; speaker conversion module 34 For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence; the emotion recognition module 35 is used for each sentence, based on the sentence The context feature of the sentence, the speaker status feature of the speaker of the sentence, and the speaker switching status of the sentence are used to obtain the emotion recognition result of the sentence.
进一步地,句子编码器31用于采用自然语言工具,将对话内容中的每个句子进行分词,得到每个句子的词序列;利用GloVe模型,将所述句子的词序列中的每个词分别转换为对应的词向量;将所述句子的词序列输入至卷积神经网络中,获取每个句子的句向量,将所述句向量作为所述文本特征。进一步地,上下文编码器32用于基于每个所述句子的文本特征,通过第一长短期记忆模型进行处理,得到每个所述句子的上下文特征。更进一步地,上下文编码器32用于获取所述对话内容中的第一个句子的文本特征,将所述第一个句子的文本特征输入至第一长短期记忆网络模型中,得到第一输出结果;获取与所述第一个句子相邻的第二个句子的文本特征,将所述第一输出结果和所述第二个句子的文本特征输入至所述第一长短期记忆网络模型中,得到第二输出结果;针对每个所述句子,重复执行上述步骤,直至所述当前句子,将上一轮句子的输出结果以及所述当前句子的文本特征输入至所述第一长短期记忆网络模型中,得到所述当前句子的上下文特征;重复上述步骤,直至得到每个所述句子的上下文特征。Further, the sentence encoder 31 is used to use natural language tools to segment each sentence in the dialogue content to obtain the word sequence of each sentence; using the GloVe model, separate each word in the word sequence of the sentence Convert it into a corresponding word vector; input the word sequence of the sentence into a convolutional neural network to obtain the sentence vector of each sentence, and use the sentence vector as the text feature. Further, the context encoder 32 is configured to process the first long short-term memory model based on the text feature of each sentence to obtain the context feature of each sentence. Furthermore, the context encoder 32 is used to obtain the text feature of the first sentence in the dialogue content, and input the text feature of the first sentence into the first long-short-term memory network model to obtain the first output Result; obtain the text feature of the second sentence adjacent to the first sentence, and input the first output result and the text feature of the second sentence into the first long-short-term memory network model , Obtain the second output result; for each of the sentences, repeat the above steps until the current sentence, input the output result of the previous round of sentences and the text features of the current sentence into the first long and short-term memory In the network model, the context feature of the current sentence is obtained; the above steps are repeated until the context feature of each sentence is obtained.
进一步地,说话人编码器33还用于获取所述说话人在对话内容中的多个句子,将所述多个句子的文本特征输入至第二长短期记忆网络模型中,提取所述说话人的状态特征。更进一步地,说话人编码器33用于获取所述说话人的第一句子的文本特征向量,将所述说话人的初始化特征以及所述第一句子的文本特征向量输入至第二长短期记忆网络模型中,得到第一输出特征;获取与所述第一句子相邻的第二句子的文本特征向量,将所述第一输出特征和所述第二句子的文本特征向量输入至所述第二长短期记忆网络模型中,得到第二输出特征;重复执行上述步骤,直至所述说话人的当前句子,将上一轮的输出特征以及所述当前句子的文 本特征向量输入至所述第二长短期记忆网络模型中,得到所述说话人的状态特征。Further, the speaker encoder 33 is also used to obtain multiple sentences of the speaker in the dialogue content, input the text features of the multiple sentences into the second long and short-term memory network model, and extract the speaker State characteristics. Furthermore, the speaker encoder 33 is configured to obtain the text feature vector of the first sentence of the speaker, and input the initialization feature of the speaker and the text feature vector of the first sentence into the second long and short-term memory. In the network model, the first output feature is obtained; the text feature vector of the second sentence adjacent to the first sentence is obtained, and the text feature vector of the first output feature and the second sentence are input to the first sentence 2. In the long and short-term memory network model, obtain the second output feature; repeat the above steps until the speaker’s current sentence, and input the output feature of the previous round and the text feature vector of the current sentence to the second In the long and short-term memory network model, the speaker's state characteristics are obtained.
进一步地,用于当所述句子的说话人与所述上一句子的说话人相同时,将第一数值作为当前句子的说话人切换状态;当所述句子的说话人与所述上一句子的说话人不同时,将第二数值作为所述句子的说话人切换状态。更进一步地,第一数值可以为1,第二数值可以为0。Further, when the speaker of the sentence is the same as the speaker of the previous sentence, the first value is used as the speaker switching state of the current sentence; when the speaker of the sentence is the same as the speaker of the previous sentence When the speakers of are different, the second value is used as the speaker switching state of the sentence. Furthermore, the first value may be 1, and the second value may be zero.
图5是本申请第四实施例的基于循环神经网络的情绪识别装置的结构示意图。如图5所示,该基于循环神经网络的情绪识别装置40包括处理器41及和处理器41耦接的存储器42。FIG. 5 is a schematic structural diagram of an emotion recognition device based on a recurrent neural network according to a fourth embodiment of the present application. As shown in FIG. 5, the emotion recognition device 40 based on the cyclic neural network includes a processor 41 and a memory 42 coupled to the processor 41.
存储器42存储有用于实现上述任一实施例该基于循环神经网络的情绪识别的程序指令。The memory 42 stores program instructions for realizing the emotion recognition based on the recurrent neural network in any of the above embodiments.
处理器41用于执行存储器42存储的程序指令以进行基于循环神经网络的情绪识别。The processor 41 is configured to execute program instructions stored in the memory 42 to perform emotion recognition based on the recurrent neural network.
其中,处理器41还可以称为CPU(Central Processing Unit,中央处理单元)。处理器41可能是一种集成电路芯片,具有信号的处理能力。处理器41还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 41 may also be referred to as a CPU (Central Processing Unit, central processing unit). The processor 41 may be an integrated circuit chip with signal processing capabilities. The processor 41 may also be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component . The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
参阅图6,图6为本申请实施例的存储介质的结构示意图。本申请实施例的存储介质存储有能够实现上述所有方法的程序指令51,所述存储介质可以是非易失性,也可以是易失性。其中,该程序指令51可以以软件产品的形式存储在上述存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式所述方法的全部或部分步骤。而前述的存储装置包括:U盘、移动硬盘、只读存储器(ROM,Read-OnlyMemory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质,或者是计算机、服务器、手机、平板等终端设备。Refer to FIG. 6, which is a schematic structural diagram of a storage medium according to an embodiment of the application. The storage medium of the embodiment of the present application stores program instructions 51 that can implement all the above methods, and the storage medium may be non-volatile or volatile. Wherein, the program instructions 51 may be stored in the above-mentioned storage medium in the form of a software product, and include several instructions for making a computer device (may be a personal computer, a server, or a network device, etc.) or a processor to execute the program. Apply for all or part of the steps of the method described in each embodiment. The aforementioned storage devices include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. Or terminal devices such as computers, servers, mobile phones, and tablets.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。以上仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. The above are only implementations of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly applied to other related technical fields, The same reason is included in the scope of patent protection of this application.
以上所述的仅是本申请的实施方式,在此应当指出,对于本领域的普通技术人员来说, 在不脱离本申请创造构思的前提下,还可以做出改进,但这些均属于本申请的保护范围。The above are only the implementation manners of this application. It should be pointed out here that for those of ordinary skill in the art, improvements can be made without departing from the creative concept of this application, but these all belong to this application. The scope of protection.

Claims (20)

  1. 一种基于循环神经网络的情绪识别方法,其中,包括:An emotion recognition method based on recurrent neural network, which includes:
    获取对话内容中每个句子的文本特征;Obtain the text features of each sentence in the dialogue content;
    将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征;Encoding the text feature of each sentence to obtain the context feature of each sentence;
    针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的;For each of the sentences, the speaker status feature of the speaker of the sentence is updated based on the text feature of the sentence, wherein the speaker status feature before the update is based on the text of all previous sentences of the speaker Acquired by characteristics;
    针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态;For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
    针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态,获取所述句子的情绪识别结果。For each sentence, the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  2. 根据权利要求1所述的基于循环神经网络的情绪识别方法,其中,所述获取对话内容中每个句子的文本特征,包括:The method for emotion recognition based on recurrent neural network according to claim 1, wherein said obtaining the text feature of each sentence in the dialogue content comprises:
    采用自然语言工具,将对话内容中的每个句子进行分词,得到每个句子的词序列;Use natural language tools to segment each sentence in the dialogue content to obtain the word sequence of each sentence;
    利用GloVe模型,将所述句子的词序列中的每个词分别转换为对应的词向量;Using the GloVe model, each word in the word sequence of the sentence is converted into a corresponding word vector;
    将所述句子的词序列输入至卷积神经网络中,获取每个句子的句向量,将所述句向量作为所述文本特征。The word sequence of the sentence is input into the convolutional neural network, the sentence vector of each sentence is obtained, and the sentence vector is used as the text feature.
  3. 根据权利要求1所述的基于循环神经网络的情绪识别方法,其中,所述将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征,包括:The method for emotion recognition based on recurrent neural network according to claim 1, wherein said encoding the text feature of each sentence to obtain the context feature of each sentence comprises:
    基于每个所述句子的文本特征,通过第一长短期记忆模型进行处理,得到每个所述句子的上下文特征。Based on the text feature of each sentence, the first long short-term memory model is used for processing to obtain the context feature of each sentence.
  4. 根据权利要求3所述的基于循环神经网络的情绪识别方法,其中,所述基于每个所述句子的文本特征,通过第一长短期记忆模型进行处理,得到每个所述句子的上下文特征,包括:The method for emotion recognition based on a recurrent neural network according to claim 3, wherein the text feature based on each of the sentences is processed by a first long short-term memory model to obtain the context feature of each sentence, include:
    获取所述对话内容中的第一个句子的文本特征,将所述第一个句子的文本特征输入至第一长短期记忆网络模型中,得到第一输出结果;Acquiring the text feature of the first sentence in the dialogue content, and inputting the text feature of the first sentence into the first long short-term memory network model to obtain a first output result;
    获取与所述第一个句子相邻的第二个句子的文本特征,将所述第一输出结果和所述第二个句子的文本特征输入至所述第一长短期记忆网络模型中,得到第二输出结果;Obtain the text features of the second sentence adjacent to the first sentence, and input the first output result and the text features of the second sentence into the first long short-term memory network model to obtain The second output result;
    针对每个所述句子,将上一句子的输出结果以及所述句子的文本特征输入至所述第一长短期记忆网络模型中,得到所述句子的上下文特征;For each sentence, input the output result of the previous sentence and the text feature of the sentence into the first long-term short-term memory network model to obtain the context feature of the sentence;
    重复上述步骤,直至得到每个所述句子的上下文特征。Repeat the above steps until the context feature of each sentence is obtained.
  5. 根据权利要求1所述的基于循环神经网络的情绪识别方法,其中,所述说话人状态特征是通过如下步骤获取的:The method for emotion recognition based on recurrent neural network according to claim 1, wherein the speaker state feature is obtained through the following steps:
    获取所述说话人在对话内容中的多个句子,将所述多个句子的文本特征输入至第二长短期记忆网络模型中,提取所述说话人的状态特征。Acquire multiple sentences of the speaker in the dialogue content, input the text features of the multiple sentences into the second long and short-term memory network model, and extract the status features of the speaker.
  6. 根据权利要求5所述的基于循环神经网络的情绪识别方法,其中,所述获取所述说话人在对话内容中的多个句子,将所述多个句子的文本特征输入至第二长短期记忆网络模型中,提取所述说话人的状态特征,包括:The method for emotion recognition based on recurrent neural network according to claim 5, wherein said acquiring a plurality of sentences of said speaker in conversation content, and inputting text features of said plurality of sentences into a second long and short-term memory In the network model, extracting the speaker's state characteristics includes:
    获取所述说话人的第一句子的文本特征向量,将所述说话人的初始化特征以及所述第一句子的文本特征向量输入至第二长短期记忆网络模型中,得到第一输出特征;Acquiring the text feature vector of the first sentence of the speaker, and inputting the initialization feature of the speaker and the text feature vector of the first sentence into the second long short-term memory network model to obtain the first output feature;
    获取与所述第一句子相邻的第二句子的文本特征向量,将所述第一输出特征和所述第二句子的文本特征向量输入至所述第二长短期记忆网络模型中,得到第二输出特征;Acquire the text feature vector of the second sentence adjacent to the first sentence, and input the first output feature and the text feature vector of the second sentence into the second long short-term memory network model to obtain the first Two output characteristics;
    重复执行上述步骤,直至所述说话人的当前句子,将上一轮的输出特征以及所述当前句子的文本特征向量输入至所述第二长短期记忆网络模型中,得到所述说话人的状态特征。Repeat the above steps until the speaker’s current sentence, input the output features of the previous round and the text feature vector of the current sentence into the second long short-term memory network model to obtain the speaker’s state feature.
  7. 根据权利要求1所述的基于循环神经网络的情绪识别方法,其中,所述基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态,包括:The method for emotion recognition based on recurrent neural network according to claim 1, wherein the determining the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence comprises:
    当所述句子的说话人与所述上一句子的说话人相同时,将第一数值作为所述句子的说话人切换状态;When the speaker of the sentence is the same as the speaker of the previous sentence, use the first value as the speaker switch state of the sentence;
    当所述句子的说话人与所述上一句子的说话人不同时,将第二数值作为所述句子的说话人切换状态;When the speaker of the sentence is different from the speaker of the previous sentence, use the second value as the speaker switch state of the sentence;
    所述基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态之后,还包括:After the speaker switch state of the sentence is determined based on the speaker of the sentence and the speaker of the previous sentence, the method further includes:
    将所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征上传至区块链中,以使得所述区块链对所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征进行加密存储。Upload the text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence to the blockchain, so that the blockchain can respond to the sentence The text feature of the sentence, the context feature of the sentence, the speaker state feature of the sentence, and the speaker switching feature of the sentence are encrypted and stored.
  8. 一种基于循环神经网络的情绪识别装置,其中,所述装置包括:An emotion recognition device based on cyclic neural network, wherein the device includes:
    句子编码器,用于获取对话内容中每个句子的文本特征;Sentence encoder, used to obtain the text features of each sentence in the dialogue content;
    上下文编码器,用于将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征;A context encoder, which is used to encode the text features of each sentence to obtain the context features of each sentence;
    说话人编码器,用于针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的;The speaker encoder is used to update the speaker status feature of the speaker of the sentence based on the text feature of the sentence for each sentence, wherein the speaker status feature before the update is based on the speech Obtained from the text features of all previous sentences of the person;
    说话人转换模块,用于针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态;以及The speaker conversion module is used to determine, for each sentence, the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence; and
    情绪识别模块,用于针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态,获取所述句子的情绪识别结果。The emotion recognition module is used to obtain the emotion recognition result of the sentence based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence for each sentence .
  9. 一种基于循环神经网络的情绪识别装置,其中,所述装置包括处理器、以及与所述处理器耦接的存储器,所述存储器存储有可被所述处理器执行的程序指令;所述处理器执行所述存储器存储的所述程序指令时实现以下步骤:An emotion recognition device based on a recurrent neural network, wherein the device includes a processor, and a memory coupled to the processor, and the memory stores program instructions that can be executed by the processor; When the processor executes the program instructions stored in the memory, the following steps are implemented:
    获取对话内容中每个句子的文本特征;Obtain the text features of each sentence in the dialogue content;
    将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征;Encoding the text feature of each sentence to obtain the context feature of each sentence;
    针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的;For each of the sentences, the speaker status feature of the speaker of the sentence is updated based on the text feature of the sentence, wherein the speaker status feature before the update is based on the text of all previous sentences of the speaker Acquired by characteristics;
    针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态;For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
    针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态,获取所述句子的情绪识别结果。For each sentence, the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  10. 根据权利要求9所述的基于循环神经网络的情绪识别装置,其中,所述获取对话内容中每个句子的文本特征,包括:The emotion recognition device based on recurrent neural network according to claim 9, wherein said obtaining the text feature of each sentence in the dialogue content comprises:
    采用自然语言工具,将对话内容中的每个句子进行分词,得到每个句子的词序列;Use natural language tools to segment each sentence in the dialogue content to obtain the word sequence of each sentence;
    利用GloVe模型,将所述句子的词序列中的每个词分别转换为对应的词向量;Using the GloVe model, each word in the word sequence of the sentence is converted into a corresponding word vector;
    将所述句子的词序列输入至卷积神经网络中,获取每个句子的句向量,将所述句向量作为所述文本特征。The word sequence of the sentence is input into the convolutional neural network, the sentence vector of each sentence is obtained, and the sentence vector is used as the text feature.
  11. 根据权利要求9所述的基于循环神经网络的情绪识别装置,其中,所述将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征,包括:The emotion recognition device based on a recurrent neural network according to claim 9, wherein said encoding the text feature of each sentence to obtain the context feature of each sentence comprises:
    基于每个所述句子的文本特征,通过第一长短期记忆模型进行处理,得到每个所述句子的上下文特征。Based on the text feature of each sentence, the first long short-term memory model is used for processing to obtain the context feature of each sentence.
  12. 根据权利要求11所述的基于循环神经网络的情绪识别装置,其中,所述基于每个所述句子的文本特征,通过第一长短期记忆模型进行处理,得到每个所述句子的上下文特征,包括:The emotion recognition device based on recurrent neural network according to claim 11, wherein the text feature based on each of the sentences is processed by a first long and short-term memory model to obtain the context feature of each of the sentences, include:
    获取所述对话内容中的第一个句子的文本特征,将所述第一个句子的文本特征输入至第一长短期记忆网络模型中,得到第一输出结果;Acquiring the text feature of the first sentence in the dialogue content, and inputting the text feature of the first sentence into the first long short-term memory network model to obtain a first output result;
    获取与所述第一个句子相邻的第二个句子的文本特征,将所述第一输出结果和所述第二个句子的文本特征输入至所述第一长短期记忆网络模型中,得到第二输出结果;Obtain the text features of the second sentence adjacent to the first sentence, and input the first output result and the text features of the second sentence into the first long short-term memory network model to obtain The second output result;
    针对每个所述句子,将上一句子的输出结果以及所述句子的文本特征输入至所述第一长短期记忆网络模型中,得到所述句子的上下文特征;For each sentence, input the output result of the previous sentence and the text feature of the sentence into the first long-term short-term memory network model to obtain the context feature of the sentence;
    重复上述步骤,直至得到每个所述句子的上下文特征。Repeat the above steps until the context feature of each sentence is obtained.
  13. 根据权利要求9所述的基于循环神经网络的情绪识别装置,其中,所述说话人状态特征是通过如下步骤获取的:9. The emotion recognition device based on recurrent neural network according to claim 9, wherein the speaker state feature is obtained through the following steps:
    获取所述说话人在对话内容中的多个句子,将所述多个句子的文本特征输入至第二长短期记忆网络模型中,提取所述说话人的状态特征。Acquire multiple sentences of the speaker in the dialogue content, input the text features of the multiple sentences into the second long and short-term memory network model, and extract the status features of the speaker.
  14. 根据权利要求13所述的基于循环神经网络的情绪识别装置,其中,所述获取所述说话人在对话内容中的多个句子,将所述多个句子的文本特征输入至第二长短期记忆网络模型中,提取所述说话人的状态特征,包括:The device for emotion recognition based on recurrent neural network according to claim 13, wherein said acquiring a plurality of sentences of said speaker in conversation content, and inputting text characteristics of said plurality of sentences into a second long and short-term memory In the network model, extracting the speaker's state characteristics includes:
    获取所述说话人的第一句子的文本特征向量,将所述说话人的初始化特征以及所述第一句子的文本特征向量输入至第二长短期记忆网络模型中,得到第一输出特征;Acquiring the text feature vector of the first sentence of the speaker, and inputting the initialization feature of the speaker and the text feature vector of the first sentence into the second long short-term memory network model to obtain the first output feature;
    获取与所述第一句子相邻的第二句子的文本特征向量,将所述第一输出特征和所述第二句子的文本特征向量输入至所述第二长短期记忆网络模型中,得到第二输出特征;Acquire the text feature vector of the second sentence adjacent to the first sentence, and input the first output feature and the text feature vector of the second sentence into the second long short-term memory network model to obtain the first Two output characteristics;
    重复执行上述步骤,直至所述说话人的当前句子,将上一轮的输出特征以及所述当前句子的文本特征向量输入至所述第二长短期记忆网络模型中,得到所述说话人的状态特征。Repeat the above steps until the speaker’s current sentence, input the output features of the previous round and the text feature vector of the current sentence into the second long short-term memory network model to obtain the speaker’s state feature.
  15. 根据权利要求9所述的基于循环神经网络的情绪识别装置,其中,所述基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态,包括:The emotion recognition device based on recurrent neural network according to claim 9, wherein the determining the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence comprises:
    当所述句子的说话人与所述上一句子的说话人相同时,将第一数值作为所述句子的说话人切换状态;When the speaker of the sentence is the same as the speaker of the previous sentence, use the first value as the speaker switch state of the sentence;
    当所述句子的说话人与所述上一句子的说话人不同时,将第二数值作为所述句子的说话人切换状态;When the speaker of the sentence is different from the speaker of the previous sentence, use the second value as the speaker switch state of the sentence;
    所述基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态之后,还包括:After the speaker switch state of the sentence is determined based on the speaker of the sentence and the speaker of the previous sentence, the method further includes:
    将所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征上传至区块链中,以使得所述区块链对所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征进行加密存储。Upload the text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence to the blockchain, so that the blockchain can respond to the sentence The text feature of the sentence, the context feature of the sentence, the speaker state feature of the sentence, and the speaker switching feature of the sentence are encrypted and stored.
  16. 一种存储介质,其中,所述存储介质内存储有程序指令,所述程序指令被处理器执行时实现以下步骤:A storage medium, wherein program instructions are stored in the storage medium, and the following steps are implemented when the program instructions are executed by a processor:
    获取对话内容中每个句子的文本特征;Obtain the text features of each sentence in the dialogue content;
    将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征;Encoding the text feature of each sentence to obtain the context feature of each sentence;
    针对每个所述句子,基于所述句子的文本特征更新所述句子的说话人的说话人状态特征,其中,更新前的所述说话人状态特征是基于所述说话人的之前所有句子的文本特征而获取的;For each of the sentences, the speaker status feature of the speaker of the sentence is updated based on the text feature of the sentence, wherein the speaker status feature before the update is based on the text of all previous sentences of the speaker Acquired by characteristics;
    针对每个所述句子,基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态;For each sentence, determine the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence;
    针对每个所述句子,基于所述句子的上下文特征、所述句子的说话人的说话人状态特征以及所述句子的说话人切换状态,获取所述句子的情绪识别结果。For each sentence, the emotion recognition result of the sentence is obtained based on the context feature of the sentence, the speaker state feature of the speaker of the sentence, and the speaker switching state of the sentence.
  17. 根据权利要求16所述的存储介质,其中,所述获取对话内容中每个句子的文本特 征,包括:The storage medium according to claim 16, wherein said acquiring the text characteristics of each sentence in the dialogue content comprises:
    采用自然语言工具,将对话内容中的每个句子进行分词,得到每个句子的词序列;Use natural language tools to segment each sentence in the dialogue content to obtain the word sequence of each sentence;
    利用GloVe模型,将所述句子的词序列中的每个词分别转换为对应的词向量;Using the GloVe model, each word in the word sequence of the sentence is converted into a corresponding word vector;
    将所述句子的词序列输入至卷积神经网络中,获取每个句子的句向量,将所述句向量作为所述文本特征。The word sequence of the sentence is input into the convolutional neural network, the sentence vector of each sentence is obtained, and the sentence vector is used as the text feature.
  18. 根据权利要求16所述的存储介质,其中,所述将每个所述句子的文本特征进行编码,以获取每个句子的上下文特征,包括:The storage medium according to claim 16, wherein said encoding the text feature of each sentence to obtain the context feature of each sentence comprises:
    基于每个所述句子的文本特征,通过第一长短期记忆模型进行处理,得到每个所述句子的上下文特征。Based on the text feature of each sentence, the first long short-term memory model is used for processing to obtain the context feature of each sentence.
  19. 根据权利要求16所述的存储介质,其中,所述说话人状态特征是通过如下步骤获取的:The storage medium according to claim 16, wherein the speaker status feature is obtained through the following steps:
    获取所述说话人在对话内容中的多个句子,将所述多个句子的文本特征输入至第二长短期记忆网络模型中,提取所述说话人的状态特征。Acquire multiple sentences of the speaker in the dialogue content, input the text features of the multiple sentences into the second long and short-term memory network model, and extract the status features of the speaker.
  20. 根据权利要求16所述的存储介质,其中,所述基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态,包括:The storage medium according to claim 16, wherein the determining the speaker switching state of the sentence based on the speaker of the sentence and the speaker of the previous sentence comprises:
    当所述句子的说话人与所述上一句子的说话人相同时,将第一数值作为所述句子的说话人切换状态;When the speaker of the sentence is the same as the speaker of the previous sentence, use the first value as the speaker switch state of the sentence;
    当所述句子的说话人与所述上一句子的说话人不同时,将第二数值作为所述句子的说话人切换状态;When the speaker of the sentence is different from the speaker of the previous sentence, use the second value as the speaker switch state of the sentence;
    所述基于所述句子的说话人以及上一句子的说话人确定所述句子的说话人切换状态之后,还包括:After the speaker switch state of the sentence is determined based on the speaker of the sentence and the speaker of the previous sentence, the method further includes:
    将所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征上传至区块链中,以使得所述区块链对所述句子的文本特征、所述句子的上下文特征、所述句子的说话人状态特征以及所述句子的说话人切换特征进行加密存储。Upload the text feature of the sentence, the context feature of the sentence, the speaker status feature of the sentence, and the speaker switching feature of the sentence to the blockchain, so that the blockchain can respond to the sentence The text feature of the sentence, the context feature of the sentence, the speaker state feature of the sentence, and the speaker switching feature of the sentence are encrypted and stored.
PCT/CN2020/118498 2020-08-06 2020-09-28 Recurrent neural network-based emotion recognition method, apparatus, and storage medium WO2021135457A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010785309.1A CN111950275B (en) 2020-08-06 2020-08-06 Emotion recognition method and device based on recurrent neural network and storage medium
CN202010785309.1 2020-08-06

Publications (1)

Publication Number Publication Date
WO2021135457A1 true WO2021135457A1 (en) 2021-07-08

Family

ID=73331721

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118498 WO2021135457A1 (en) 2020-08-06 2020-09-28 Recurrent neural network-based emotion recognition method, apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN111950275B (en)
WO (1) WO2021135457A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114153973A (en) * 2021-12-07 2022-03-08 内蒙古工业大学 Mongolian multi-mode emotion analysis method based on T-M BERT pre-training model

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989822B (en) * 2021-04-16 2021-08-27 北京世纪好未来教育科技有限公司 Method, device, electronic equipment and storage medium for recognizing sentence categories in conversation
CN113609289A (en) * 2021-07-06 2021-11-05 河南工业大学 Multi-mode dialog text-based emotion recognition method
CN114020897A (en) * 2021-12-31 2022-02-08 苏州浪潮智能科技有限公司 Conversation emotion recognition method and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153642A (en) * 2017-05-16 2017-09-12 华北电力大学 A kind of analysis method based on neural network recognization text comments Sentiment orientation
CN107609009A (en) * 2017-07-26 2018-01-19 北京大学深圳研究院 Text emotion analysis method, device, storage medium and computer equipment
CN109299267A (en) * 2018-10-16 2019-02-01 山西大学 A kind of Emotion identification and prediction technique of text conversation
WO2019100350A1 (en) * 2017-11-24 2019-05-31 Microsoft Technology Licensing, Llc Providing a summary of a multimedia document in a session
CN110162636A (en) * 2019-05-30 2019-08-23 中森云链(成都)科技有限责任公司 Text mood reason recognition methods based on D-LSTM
CN110704588A (en) * 2019-09-04 2020-01-17 平安科技(深圳)有限公司 Multi-round dialogue semantic analysis method and system based on long-term and short-term memory network

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI221574B (en) * 2000-09-13 2004-10-01 Agi Inc Sentiment sensing method, perception generation method and device thereof and software
DE60213195T8 (en) * 2002-02-13 2007-10-04 Sony Deutschland Gmbh Method, system and computer program for speech / speaker recognition using an emotion state change for the unsupervised adaptation of the recognition method
CN101930735B (en) * 2009-06-23 2012-11-21 富士通株式会社 Speech emotion recognition equipment and speech emotion recognition method
CN104036776A (en) * 2014-05-22 2014-09-10 毛峡 Speech emotion identification method applied to mobile terminal
JP6910002B2 (en) * 2016-06-23 2021-07-28 パナソニックIpマネジメント株式会社 Dialogue estimation method, dialogue activity estimation device and program
JP6671020B2 (en) * 2016-06-23 2020-03-25 パナソニックIpマネジメント株式会社 Dialogue act estimation method, dialogue act estimation device and program
CN110956953B (en) * 2019-11-29 2023-03-10 中山大学 Quarrel recognition method based on audio analysis and deep learning
WO2021134417A1 (en) * 2019-12-31 2021-07-08 深圳市优必选科技股份有限公司 Interactive behavior prediction method, intelligent device, and computer readable storage medium
CN111274390B (en) * 2020-01-15 2023-10-27 深圳前海微众银行股份有限公司 Emotion cause determining method and device based on dialogue data
CN111339440B (en) * 2020-02-19 2024-01-23 东南大学 Social emotion sequencing method based on hierarchical state neural network for news text

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153642A (en) * 2017-05-16 2017-09-12 华北电力大学 A kind of analysis method based on neural network recognization text comments Sentiment orientation
CN107609009A (en) * 2017-07-26 2018-01-19 北京大学深圳研究院 Text emotion analysis method, device, storage medium and computer equipment
WO2019100350A1 (en) * 2017-11-24 2019-05-31 Microsoft Technology Licensing, Llc Providing a summary of a multimedia document in a session
CN109299267A (en) * 2018-10-16 2019-02-01 山西大学 A kind of Emotion identification and prediction technique of text conversation
CN110162636A (en) * 2019-05-30 2019-08-23 中森云链(成都)科技有限责任公司 Text mood reason recognition methods based on D-LSTM
CN110704588A (en) * 2019-09-04 2020-01-17 平安科技(深圳)有限公司 Multi-round dialogue semantic analysis method and system based on long-term and short-term memory network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114153973A (en) * 2021-12-07 2022-03-08 内蒙古工业大学 Mongolian multi-mode emotion analysis method based on T-M BERT pre-training model

Also Published As

Publication number Publication date
CN111950275B (en) 2023-01-17
CN111950275A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
CN109241255B (en) Intention identification method based on deep learning
WO2021135457A1 (en) Recurrent neural network-based emotion recognition method, apparatus, and storage medium
WO2018133761A1 (en) Method and device for man-machine dialogue
KR102668530B1 (en) Speech recognition methods, devices and devices, and storage media
CN110083693B (en) Robot dialogue reply method and device
CN111897933B (en) Emotion dialogue generation method and device and emotion dialogue model training method and device
JP6951712B2 (en) Dialogue devices, dialogue systems, dialogue methods, and programs
CN111966800B (en) Emotion dialogue generation method and device and emotion dialogue model training method and device
CN110390017B (en) Target emotion analysis method and system based on attention gating convolutional network
CN106448670A (en) Dialogue automatic reply system based on deep learning and reinforcement learning
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN109271493A (en) A kind of language text processing method, device and storage medium
CN109887484A (en) A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device
CN108228576B (en) Text translation method and device
CN112214585B (en) Reply message generation method, system, computer device and storage medium
CN113987179A (en) Knowledge enhancement and backtracking loss-based conversational emotion recognition network model, construction method, electronic device and storage medium
CN113094478B (en) Expression reply method, device, equipment and storage medium
CN112632244A (en) Man-machine conversation optimization method and device, computer equipment and storage medium
CN111598979A (en) Method, device and equipment for generating facial animation of virtual character and storage medium
CN113590078A (en) Virtual image synthesis method and device, computing equipment and storage medium
CN110597968A (en) Reply selection method and device
CN116049387A (en) Short text classification method, device and medium based on graph convolution
CN112632248A (en) Question answering method, device, computer equipment and storage medium
CN110619119B (en) Intelligent text editing method and device and computer readable storage medium
CN112183106A (en) Semantic understanding method and device based on phoneme association and deep learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911130

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911130

Country of ref document: EP

Kind code of ref document: A1