WO2021139107A1 - Intelligent emotion recognition method and apparatus, electronic device, and storage medium - Google Patents

Intelligent emotion recognition method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2021139107A1
WO2021139107A1 PCT/CN2020/098962 CN2020098962W WO2021139107A1 WO 2021139107 A1 WO2021139107 A1 WO 2021139107A1 CN 2020098962 W CN2020098962 W CN 2020098962W WO 2021139107 A1 WO2021139107 A1 WO 2021139107A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
words
text
state
probability
Prior art date
Application number
PCT/CN2020/098962
Other languages
French (fr)
Chinese (zh)
Inventor
蒋江涛
马骏
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139107A1 publication Critical patent/WO2021139107A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, electronic equipment, and computer-readable storage medium for emotional intelligent recognition in natural language processing.
  • Emotion recognition is an important task of natural language processing and an important field in the field of artificial intelligence. It also has a wide range of applications in the industry, such as product review emotion recognition, text-based intelligent customer service emotion recognition, and user emotion recognition for discussion topics, etc. . Therefore, how to improve the accuracy of emotions is of great significance.
  • the inventor realizes that the current common emotion recognition methods rarely consider the context of the user’s expression, resulting in the inability to correctly recognize the user’s true emotions in the emotion recognition process, especially in the context of text-based dialogue, the intelligent customer service and the user are often right Round dialogue, in the process of dialogue, the user's emotions are affected by many aspects such as the topic of discussion and the expression of intelligent customer service.
  • This application provides an emotional intelligent recognition method, device, electronic equipment, and computer-readable storage medium.
  • An emotional intelligent recognition method provided by this application includes:
  • a pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
  • the present application also provides an electronic device that includes a memory and a processor.
  • the memory stores an emotional intelligence recognition program that can run on the processor.
  • the emotional intelligence recognition program is executed by the processor, To achieve the following steps:
  • a pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
  • the present application also provides a computer-readable storage medium having an emotional intelligence recognition program stored on the computer-readable storage medium, and the emotional intelligence recognition program can be executed by one or more processors to implement the following steps:
  • a pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
  • this application also provides an emotional intelligent recognition device, including:
  • the deduplication word segmentation module is used to obtain a text dialogue set, after performing a deduplication operation on the text dialogue set, a standard text dialogue set is obtained, and a word segmentation operation is performed on the standard text dialogue set to obtain a word set.
  • the calculation conversion module is used to calculate the importance score of the words in the word set, select corresponding words as the subject sequence set of the standard text dialogue set according to the importance score of the words, and according to a preset method, and The subject sequence set is converted into a subject vector set.
  • the sentence vector prediction module is used to convert the word set into a word vector set and input it into a pre-trained sentence vector prediction model, and output a sentence vector set corresponding to the word vector set.
  • the emotion recognition module is used to encode and decode the topic vector set and sentence vector set by using a pre-trained intelligent dialogue emotion recognition model to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
  • FIG. 1 is a schematic flowchart of an emotional intelligence recognition method provided by an embodiment of this application
  • FIG. 2 is a schematic diagram of the internal structure of an electronic device provided by an embodiment of the application.
  • FIG. 3 is a schematic diagram of modules of an emotional intelligent recognition device provided by an embodiment of the application.
  • This application provides an emotional intelligent recognition method.
  • FIG. 1 it is a schematic flowchart of an emotional intelligence recognition method provided by an embodiment of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the emotional intelligence recognition method includes:
  • the text dialogue set is a text dialogue set formed by recording the dialogue between a customer service and a user
  • the customer service may be a front desk customer service, sales customer service, after-sales customer service, etc. of Ping An of China.
  • the standard text dialog set is obtained by performing deduplication operations on the dialog text set first.
  • the deduplication operation includes:
  • n represents the number of text dialogs in the text dialog set
  • w 1j and w 2j represent any two text dialogs in the text dialog set
  • d represents the distance between any two text dialogs.
  • the preset threshold value is 0.1.
  • the word segmentation operation in this application includes: matching the words in the text dialogue set with the words in the preset dictionary through a preset strategy to obtain multiple words, and using spaces for the words Symbols are separated to obtain the set of words.
  • the preset dictionary may include a statistical dictionary and a prefix dictionary.
  • the statistical dictionary is a dictionary constructed by all possible word segmentation obtained by statistical methods.
  • the statistical dictionary counts the frequency of the contribution of adjacent characters in the corpus and calculates mutual information. When the mutual information of adjacent characters is greater than a preset threshold, it is recognized as a constituent word, wherein the threshold is 0.6.
  • the prefix dictionary includes the prefix of each word segment in the statistical dictionary.
  • the prefixes of the word “Peking University” in the statistical dictionary are "North”, “Beijing”, and “Beijing University”;
  • the prefix is "big” and so on.
  • This application uses the statistical dictionary to obtain possible word segmentation results of the text dialogue set, and obtains the final segmentation form according to the segmentation position of the word through the prefix dictionary, thereby obtaining the word set of the text dialogue set.
  • the calculation of the importance score of the words in the word set includes:
  • Dep(W i , W j ) represents the degree of dependency relationship between the words W i and W j
  • len(W i , W j ) represents the length of the dependency path between the words W i and W j
  • b is Hyperparameter
  • f grav (W i , W j ) represents the gravitational force of the words W i and W j
  • tfidf(W i ) represents the TF-IDF value of the word W i
  • tfidf(W j ) represents the TF-IDF of the word W j
  • IDF value TF means word frequency
  • IDF means inverse document frequency index
  • d is the Euclidean distance between the word vectors of words W i and W j;
  • the preset method in the present application is to select t words with the highest scores as the subject sequence set of the standard text dialogue set according to the importance score of the word.
  • the preferred embodiment of the present application converts the topic sequence set into a topic vector set (sent_topic_vec_i) through the word2vec model, where i represents the index of the sentence that completes the conversation at one time.
  • the word2vec model refers to an efficient algorithm model that represents words as real-valued vectors. It uses the idea of deep learning to simplify the processing of text content to vector operations in a K-dimensional vector space through training. The similarity in the vector space can be used to represent the semantic similarity of the text.
  • the word2vec model training process includes: selecting a window of an appropriate size as the context, the input layer of the word2vec model reads the words in the window, and the vector of the words in the window (K-dimensional, initial random) Add them together to form K nodes in the hidden layer; the output layer of the word2vec model is a huge binary tree, and the leaf nodes represent all the words in the corpus (the corpus contains V independent words, then the binary tree has
  • the word set is converted into a word vector set and then input into a pre-trained sentence vector prediction model, and a sentence vector set corresponding to the word vector set is output.
  • the sentence vector prediction model is a two-way Transformer model.
  • the Transformer model processes all word vectors in the input sequence in parallel, and at the same time uses a self-attention mechanism to combine context with distant word vectors to output corresponding sentence vectors.
  • this application trains the two-way Transformer model through a fine-turning method and a random concealment method.
  • the fine-turing method extracts the shallow features of the available neural network, modifies the parameters in the deep neural network, and builds a new neural network model to reduce the number of iterations.
  • the random masking method does not predict each word as in the word2vec model, but randomly masks part of the input word vector from the input, and its goal is to predict the original vocabulary of the masked word vector based on its context.
  • the pre-built intelligent dialogue emotion recognition model includes an input layer, a hidden layer and an output layer.
  • the training process of the intelligent dialogue emotion recognition model in this application is as follows:
  • the gradient descent algorithm is the most commonly used optimization algorithm for neural network model training.
  • This application updates the variable y along the direction opposite to the gradient vector -dL/dy, so that the gradient can be reduced the fastest until the loss converges to the minimum value.
  • the encoding and decoding operations include: receiving the topic vector set and the sentence vector set through the input layer, and using the hidden layer to perform an encoding operation on the topic vector set and the sentence vector set to obtain the The feature sequence set of the topic vector set and the sentence vector set, the dynamic programming algorithm is used to decode the probability set of the state sequence corresponding to the feature sequence set, and the state sequence corresponding to the maximum probability in the probability set of the state sequence is used as the corresponding emotion expression The output result.
  • the dynamic programming algorithm is used to find the Viterbi path-hidden state sequence that is most likely to generate the observed event sequence.
  • the dynamic programming algorithm includes:
  • V 1,k p(y 1
  • V t,k p(y t
  • V 1,k represents the probability of the output state sequence corresponding to k for the first state sequence
  • p represents the probability value
  • y 1 represents the output value of the first state sequence
  • k represents the state sequence
  • ⁇ k represents the initial state k.
  • Probability dimension; V t,k represents the probability of the state sequence corresponding to the output of the t-th state, y t represents the output value of the t-th state sequence, S represents the state space, and a x,k represents the state sequence from the state x to the state k
  • the transition probability, V t-1,x represents the probability of the state sequence corresponding to the output of the t-1 state as x.
  • the application also provides an electronic device.
  • FIG. 2 it is a schematic diagram of the internal structure of an electronic device provided by an embodiment of this application.
  • the electronic device 1 may be a PC (Personal Computer, personal computer), or a terminal device such as a smart phone, a tablet computer, or a portable computer, or a server.
  • the electronic device 1 at least includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a hard disk of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of the emotional intelligence recognition program 01, etc., but also to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip, for running program codes or processing stored in the memory 11 Data, such as the execution of emotional intelligence recognition program 01 and so on.
  • CPU central processing unit
  • controller microcontroller
  • microprocessor or other data processing chip
  • the communication bus 13 is used to realize the connection and communication between these components.
  • the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.
  • the electronic device 1 may also include a user interface.
  • the user interface may include a display (Display) and an input unit such as a keyboard (Keyboard).
  • the optional user interface may also include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • Figure 2 only shows the electronic device 1 with the components 11-14 and the emotional intelligent recognition program 01.
  • Figure 1 does not constitute a limitation on the electronic device 1, and may include ratios Fewer or more parts are shown, or some parts are combined, or different parts are arranged.
  • the memory 11 stores the emotional intelligence recognition program 01; when the processor 12 executes the emotional intelligence recognition program 01 stored in the memory 11, the following steps are implemented:
  • Step 1 Obtain a text dialogue set, perform a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and perform a word segmentation operation on the standard text dialogue set to obtain a word set.
  • the text dialogue set is a text dialogue set formed by recording the dialogue between a customer service and a user
  • the customer service may be a front desk customer service, sales customer service, after-sales customer service, etc. of Ping An of China.
  • the standard text dialog set is obtained by performing deduplication operations on the dialog text set first.
  • the deduplication operation includes:
  • n represents the number of text dialogs in the text dialog set
  • w 1j and w 2j represent any two text dialogs in the text dialog set
  • d represents the distance between any two text dialogs.
  • any one of the two text dialogs is deleted.
  • the preset threshold value is 0.1.
  • the word segmentation operation in this application includes: matching the words in the text dialogue set with the words in the preset dictionary through a preset strategy to obtain multiple words, and using spaces for the words Symbols are separated to obtain the set of words.
  • the preset dictionary may include a statistical dictionary and a prefix dictionary.
  • the statistical dictionary is a dictionary constructed by all possible word segmentation obtained by statistical methods.
  • the statistical dictionary counts the frequency of the contribution of adjacent characters in the corpus and calculates mutual information. When the mutual information of adjacent characters is greater than a preset threshold, it is recognized as a constituent word, wherein the threshold is 0.6.
  • the prefix dictionary includes the prefix of each word segment in the statistical dictionary.
  • the prefixes of the word “Peking University” in the statistical dictionary are "North”, “Beijing”, and “Beijing University”;
  • the prefix is "big” and so on.
  • This application uses the statistical dictionary to obtain possible word segmentation results of the text dialogue set, and obtains the final segmentation form according to the segmentation position of the word through the prefix dictionary, thereby obtaining the word set of the text dialogue set.
  • Step 2 Calculate the importance score of the words in the word set, select the words in the word set as the subject sequence set of the standard text dialogue set according to the importance score of the words, and select the words in the word set in a preset manner, and combine the The subject sequence set is converted to the subject vector set.
  • the calculation of the importance score of the words in the word set includes:
  • Dep(W i , W j ) represents the degree of dependency relationship between the words W i and W j
  • len(W i , W j ) represents the length of the dependency path between the words W i and W j
  • b is Hyperparameter
  • f grav (W i , W j ) represents the gravitational force of the words W i and W j
  • tfidf(W i ) represents the TF-IDF value of the word W i
  • tfidf(W j ) represents the TF-IDF of the word W j
  • IDF value TF means word frequency
  • IDF means inverse document frequency index
  • d is the Euclidean distance between the word vectors of words W i and W j;
  • the preset method in this application is to select t words with the highest scores as the subject sequence set of the standard text dialogue set according to the importance score of the word.
  • the preferred embodiment of the present application converts the topic sequence set into a topic vector set (sent_topic_vec_i) through the word2vec model, where i represents the index of the sentence that completes the conversation at one time.
  • the word2vec model refers to an efficient algorithm model that represents words as real-valued vectors. It uses the idea of deep learning to simplify the processing of text content to vector operations in a K-dimensional vector space through training. The similarity in the vector space can be used to represent the semantic similarity of the text.
  • the word2vec model training process includes: selecting a window of an appropriate size as the context, the input layer of the word2vec model reads the words in the window, and the vector of the words in the window (K-dimensional, initial random) Add them together to form K nodes in the hidden layer; the output layer of the word2vec model is a huge binary tree, and the leaf nodes represent all the words in the corpus (the corpus contains V independent words, then the binary tree has
  • Step 3 The word set is converted into a word vector set and then input into a pre-trained sentence vector prediction model, and a sentence vector set corresponding to the word vector set is output.
  • the sentence vector prediction model is a two-way Transformer model.
  • the Transformer model processes all word vectors in the input sequence in parallel, and at the same time uses a self-attention mechanism to combine context with distant word vectors to output corresponding sentence vectors.
  • this application trains the two-way Transformer model through a fine-turning method and a random concealment method.
  • the fine-turing method extracts the shallow features of the available neural network, modifies the parameters in the deep neural network, and builds a new neural network model to reduce the number of iterations.
  • the random masking method does not predict each word as in the word2vec model, but randomly masks part of the input word vector from the input, and its goal is to predict the original vocabulary of the masked word vector based on its context.
  • Step 4 Use a pre-trained intelligent dialogue emotion recognition model to perform encoding and decoding operations on the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
  • the pre-built intelligent dialogue emotion recognition model includes an input layer, a hidden layer and an output layer.
  • the training process of the intelligent dialogue emotion recognition model in this application is as follows:
  • the gradient descent algorithm is the most commonly used optimization algorithm for neural network model training.
  • This application updates the variable y along the direction opposite to the gradient vector -dL/dy, so that the gradient can be reduced the fastest until the loss converges to the minimum value.
  • the encoding and decoding operations include: receiving the topic vector set and the sentence vector set through the input layer, and using the hidden layer to perform an encoding operation on the topic vector set and the sentence vector set to obtain the The feature sequence set of the topic vector set and the sentence vector set, the dynamic programming algorithm is used to decode the probability set of the state sequence corresponding to the feature sequence set, and the state sequence corresponding to the maximum probability in the probability set of the state sequence is used as the corresponding emotion expression The output result.
  • the dynamic programming algorithm is used to find the Viterbi path-hidden state sequence that is most likely to generate the observed event sequence.
  • the dynamic programming algorithm includes:
  • V 1,k p(y 1
  • V t,k p(y t
  • V 1,k represents the probability of the output state sequence corresponding to k for the first state sequence
  • p represents the probability value
  • y 1 represents the output value of the first state sequence
  • k represents the state sequence
  • ⁇ k represents the initial state k.
  • Probability dimension; V t,k represents the probability of the state sequence corresponding to the output of the t-th state, y t represents the output value of the t-th state sequence, S represents the state space, and a x,k represents the state sequence from the state x to the state k
  • the transition probability, V t-1,x represents the probability of the state sequence corresponding to the output of the t-1 state as x.
  • the emotional intelligence recognition device 100 includes a deduplication word segmentation module 10, a calculation conversion module 20, a sentence vector prediction module 30, and an emotion recognition module 40.
  • the emotional intelligence recognition device 100 includes a deduplication word segmentation module 10, a calculation conversion module 20, a sentence vector prediction module 30, and an emotion recognition module 40.
  • the deduplication word segmentation module 10 is used to obtain a text dialogue set, perform a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and perform a word segmentation operation on the standard text dialogue set to obtain a word set.
  • the calculation conversion module 20 is configured to calculate the importance scores of words in the word set, select corresponding words according to the importance scores of the words, and select corresponding words as the topic sequence set of the standard text dialogue set in a preset manner , And convert the subject sequence set into a subject vector set.
  • the sentence vector prediction module 30 is configured to: convert the word set into a word vector set and input it into a pre-trained sentence vector prediction model, and output a sentence vector set corresponding to the word vector set.
  • the emotion recognition module 40 is configured to use a pre-trained intelligent dialogue emotion recognition model to perform encoding and decoding operations on the topic vector set and sentence vector set to obtain corresponding emotion expressions, thereby completing the emotion recognition of the text dialogue.
  • the embodiment of the present application also proposes a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores an emotional intelligence recognition program.
  • the emotional intelligence recognition program can be executed by one or more processors to achieve the following operations:
  • a pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.

Abstract

An intelligent emotion recognition method and apparatus, an electronic device, and a computer-readable storage medium, which relate to artificial intelligence technology. The method comprises: acquiring a text conversation set, and performing de-duplication and word segmentation operations on the text conversation set to obtain a word set; calculating an importance score of a word in the word set, selecting, according to a preset manner, words in the word set as a topic sequence set of the text conversation set, and converting the topic sequence set into a topic vector set; converting the word set into a word vector set, and then inputting the word vector set into a pre-trained sentence vector prediction model, and outputting a sentence vector set corresponding to the word vector set; and performing coding and decoding operations on the topic vector set and the sentence vector set by using a pre-trained intelligent conversation emotion recognition model to obtain a corresponding emotion expression, thereby completing emotion recognition of a text conversation. The method realizes more accurate emotion recognition for text conversations.

Description

情感智能识别方法、装置、电子设备及存储介质Emotional intelligent recognition method, device, electronic equipment and storage medium
本申请要求于2020年1月10日提交中国专利局、申请号为CN202010034202.3、发明名称为“情感智能识别方法、装置及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 10, 2020, the application number is CN202010034202.3, and the invention title is "Emotional Intelligent Recognition Method, Device, and Computer-readable Storage Medium", and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种自然语言处理中的情感智能识别方法、装置、电子设备及计算机可读存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method, device, electronic equipment, and computer-readable storage medium for emotional intelligent recognition in natural language processing.
背景技术Background technique
情感识别是自然语言处理的一个重要任务,也是人工智能领域的重要领域,在工业界也有着广泛的应用,如商品评论情感识别、基于文本的智能客服情感识别、以及讨论主题的用户情感识别等。因此,如何提高情感别的准确性有着重要意义。发明人意识到目前常见的情感识别方法很少考虑用户表达的上下文环境,导致在情感识别过程无法正确识别用户的真正的情感,尤其是在基于文本对话的场景下,智能客服和用户经常是对轮对话,在对话的过程中用户的情感受谈论的主题、智能客服的表达方式等多方面的影响。Emotion recognition is an important task of natural language processing and an important field in the field of artificial intelligence. It also has a wide range of applications in the industry, such as product review emotion recognition, text-based intelligent customer service emotion recognition, and user emotion recognition for discussion topics, etc. . Therefore, how to improve the accuracy of emotions is of great significance. The inventor realizes that the current common emotion recognition methods rarely consider the context of the user’s expression, resulting in the inability to correctly recognize the user’s true emotions in the emotion recognition process, especially in the context of text-based dialogue, the intelligent customer service and the user are often right Round dialogue, in the process of dialogue, the user's emotions are affected by many aspects such as the topic of discussion and the expression of intelligent customer service.
发明内容Summary of the invention
本申请提供一种情感智能识别方法、装置、电子设备及计算机可读存储介质。This application provides an emotional intelligent recognition method, device, electronic equipment, and computer-readable storage medium.
本申请提供的一种情感智能识别方法,包括:An emotional intelligent recognition method provided by this application includes:
获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集;Acquiring a text dialogue set, performing a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and performing a word segmentation operation on the standard text dialogue set to obtain a word set;
计算所述词语集中词语的重要度得分,根据所述词语的重要度得分,及按预设的方式选取词语集中的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集;Calculate the importance score of the words in the word set, select the words in the word set as the subject sequence set of the standard text dialogue set according to the importance score of the words, and select the subject sequence set in the standard text dialogue set in a preset manner, and set the subject sequence set Converted to theme vector set;
将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集;Inputting the word set into a pre-trained sentence vector prediction model after converting the word vector set, and outputting a sentence vector set corresponding to the word vector set;
利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。A pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
本申请还提供一种电子设备,该设备包括存储器和处理器,所述存储器中存储有可在所述处理器上运行的情感智能识别程序,所述情感智能识别程序被所述处理器执行时实现如下步骤:The present application also provides an electronic device that includes a memory and a processor. The memory stores an emotional intelligence recognition program that can run on the processor. When the emotional intelligence recognition program is executed by the processor, To achieve the following steps:
获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集;Acquiring a text dialogue set, performing a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and performing a word segmentation operation on the standard text dialogue set to obtain a word set;
计算所述词语集中词语的重要度得分,根据所述词语的重要度得分,及按预设的方式选取词语集中的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集;Calculate the importance score of the words in the word set, select the words in the word set as the subject sequence set of the standard text dialogue set according to the importance score of the words, and select the subject sequence set in the standard text dialogue set in a preset manner, and set the subject sequence set Converted to theme vector set;
将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集;Inputting the word set into a pre-trained sentence vector prediction model after converting the word vector set, and outputting a sentence vector set corresponding to the word vector set;
利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。A pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有情感智能 识别程序,所述情感智能识别程序可被一个或者多个处理器执行,以实现如下步骤:The present application also provides a computer-readable storage medium having an emotional intelligence recognition program stored on the computer-readable storage medium, and the emotional intelligence recognition program can be executed by one or more processors to implement the following steps:
获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集;Acquiring a text dialogue set, performing a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and performing a word segmentation operation on the standard text dialogue set to obtain a word set;
计算所述词语集中词语的重要度得分,根据所述词语的重要度得分及按预设的方式选取词语集中的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集;Calculate the importance score of the words in the word set, select the words in the word set as the subject sequence set of the standard text dialogue set according to the importance score of the words and in a preset manner, and convert the subject sequence set Is the theme vector set;
将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集;Inputting the word set into a pre-trained sentence vector prediction model after converting the word vector set, and outputting a sentence vector set corresponding to the word vector set;
利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。A pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
此外,为实现上述目的,本申请还提供一种情感智能识别装置,包括:In addition, in order to achieve the above objective, this application also provides an emotional intelligent recognition device, including:
去重分词模块,用于获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集。The deduplication word segmentation module is used to obtain a text dialogue set, after performing a deduplication operation on the text dialogue set, a standard text dialogue set is obtained, and a word segmentation operation is performed on the standard text dialogue set to obtain a word set.
计算转换模块,用于计算所述词语集中词语的重要度得分,根据所述词语的重要度得分,及按预设的方式选取对应的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集。The calculation conversion module is used to calculate the importance score of the words in the word set, select corresponding words as the subject sequence set of the standard text dialogue set according to the importance score of the words, and according to a preset method, and The subject sequence set is converted into a subject vector set.
句子向量预测模块,用于将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集。The sentence vector prediction module is used to convert the word set into a word vector set and input it into a pre-trained sentence vector prediction model, and output a sentence vector set corresponding to the word vector set.
情感识别模块,用于利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。The emotion recognition module is used to encode and decode the topic vector set and sentence vector set by using a pre-trained intelligent dialogue emotion recognition model to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
附图说明Description of the drawings
图1为本申请一实施例提供的情感智能识别方法的流程示意图;FIG. 1 is a schematic flowchart of an emotional intelligence recognition method provided by an embodiment of this application;
图2为本申请一实施例提供的电子设备的内部结构示意图;2 is a schematic diagram of the internal structure of an electronic device provided by an embodiment of the application;
图3为本申请一实施例提供的情感智能识别装置的模块示意图。FIG. 3 is a schematic diagram of modules of an emotional intelligent recognition device provided by an embodiment of the application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请提供一种情感智能识别方法。参照图1所示,为本申请一实施例提供的情感智能识别方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。This application provides an emotional intelligent recognition method. Referring to FIG. 1, it is a schematic flowchart of an emotional intelligence recognition method provided by an embodiment of this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.
在本实施例中,情感智能识别方法包括:In this embodiment, the emotional intelligence recognition method includes:
S1、获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集。S1. Obtain a text dialogue set, perform a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and perform a word segmentation operation on the standard text dialogue set to obtain a word set.
本申请较佳实施例中,所述文本对话集是记录客服与用户之间对话形成的文本对话集,所述客服可以为中国平安的前台客服、销售客服、售后客服等。In a preferred embodiment of the present application, the text dialogue set is a text dialogue set formed by recording the dialogue between a customer service and a user, and the customer service may be a front desk customer service, sales customer service, after-sales customer service, etc. of Ping An of China.
进一步地,由于所述对话文本集中通常存在重复对话文本,大量的重复数据会影响分类精度,因此,在本申请实施例优先对所述对话文本集进行去重操作得到所述标准文本对话集。其中,所述去重操作包括:Further, since there are usually repeated dialog texts in the dialog text set, a large amount of repeated data will affect the classification accuracy. Therefore, in the embodiment of the present application, the standard text dialog set is obtained by performing deduplication operations on the dialog text set first. Wherein, the deduplication operation includes:
Figure PCTCN2020098962-appb-000001
Figure PCTCN2020098962-appb-000001
其中,n表示所述文本对话集中的文本对话的数量,w 1j和w 2j表示所述文本对话集中任意两个文本对话,d表示任意两个文本对话之间的距离。 Wherein, n represents the number of text dialogs in the text dialog set, w 1j and w 2j represent any two text dialogs in the text dialog set, and d represents the distance between any two text dialogs.
较佳地,在本申请中,若任意两个文本对话之间的距离小于预设的阈值时,则删除所 述两个文本对话中的任意一个文本对话。其中,所述预设的阈值为0.1。Preferably, in this application, if the distance between any two text dialogs is less than a preset threshold, then any one of the two text dialogs is deleted. Wherein, the preset threshold value is 0.1.
较佳地,本申请中所述分词操作包括:通过预设的策略将所述文本对话集中的词语与预设的词典中的词条进行匹配,得到多个词语,并将所述词语用空格符号隔开得到所述词语集。Preferably, the word segmentation operation in this application includes: matching the words in the text dialogue set with the words in the preset dictionary through a preset strategy to obtain multiple words, and using spaces for the words Symbols are separated to obtain the set of words.
本申请较佳实施例中,所述预设的词典可以包含统计词典和前缀词典。In a preferred embodiment of the present application, the preset dictionary may include a statistical dictionary and a prefix dictionary.
所述统计词典是由统计方法得到的所有可能的分词构造的词典。所述统计词典统计相邻字在语料库中贡献的频度并计算互信息,当所述相邻字互相出现信息大于预设的阈值时,即认定为构成词,其中,所述阈值为0.6。The statistical dictionary is a dictionary constructed by all possible word segmentation obtained by statistical methods. The statistical dictionary counts the frequency of the contribution of adjacent characters in the corpus and calculates mutual information. When the mutual information of adjacent characters is greater than a preset threshold, it is recognized as a constituent word, wherein the threshold is 0.6.
所述前缀词典包括所述统计词典中每一个分词的前缀,例如所述统计词典中的词“北京大学”的前缀分别是“北”、“北京”、“北京大”;词“大学”的前缀是“大”等。本申请利用所述统计词典得到所述文本对话集的可能的分词结果,并通过所述前缀词典根据分词的切分位置,得到最终的切分形式,从而得到所述文本对话集的词语集。The prefix dictionary includes the prefix of each word segment in the statistical dictionary. For example, the prefixes of the word "Peking University" in the statistical dictionary are "North", "Beijing", and "Beijing University"; The prefix is "big" and so on. This application uses the statistical dictionary to obtain possible word segmentation results of the text dialogue set, and obtains the final segmentation form according to the segmentation position of the word through the prefix dictionary, thereby obtaining the word set of the text dialogue set.
S2、计算所述词语集中词语的重要度得分,根据所述词语的重要度得分,及按预设的方式选取词语集中的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集。S2. Calculate the importance score of the words in the word set, select the words in the word set as the subject sequence set of the standard text dialogue set according to the importance score of the words, and select the words in the word set in a preset manner, and combine the subject The sequence set is converted to the subject vector set.
本申请较佳实施例中,所述计算所述词语集中词语的重要度得分包括:In a preferred embodiment of the present application, the calculation of the importance score of the words in the word set includes:
计算所述词语集任意两个词W i和W j的依存关联度: Calculate the dependency correlation degree of any two words W i and W j in the word set:
Figure PCTCN2020098962-appb-000002
Figure PCTCN2020098962-appb-000002
其中,Dep(W i,W j)表示所述词W i和W j的依存关联度,len(W i,W j)表示所述词W i和W j之间的依存路径长度,b是超参数; Among them, Dep(W i , W j ) represents the degree of dependency relationship between the words W i and W j , len(W i , W j ) represents the length of the dependency path between the words W i and W j, and b is Hyperparameter
根据所述依存关联度计算所述词W i和W j的引力: Calculate the gravitational forces of the words W i and W j according to the dependency correlation degree:
Figure PCTCN2020098962-appb-000003
Figure PCTCN2020098962-appb-000003
其中,f grav(W i,W j)表示所述词W i和W j的引力,tfidf(W i)表示词W i的TF-IDF值,tfidf(W j)表示词W j的TF-IDF值,TF表示词频,IDF表示逆文档频率指数,d是词W i和W j的词向量之间的欧式距离; Among them, f grav (W i , W j ) represents the gravitational force of the words W i and W j , tfidf(W i ) represents the TF-IDF value of the word W i , and tfidf(W j ) represents the TF-IDF of the word W j IDF value, TF means word frequency, IDF means inverse document frequency index, d is the Euclidean distance between the word vectors of words W i and W j;
得到所述词W i和W j之间的关联强度为: The correlation strength between the words W i and W j is obtained as:
weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j) weight(W i ,W j )=Dep(W i ,W j )*f grav (W i ,W j )
根据所述关联强度计算出所述词W i的重要度得分: The words W i is calculated based on the strength of association importance score:
Figure PCTCN2020098962-appb-000004
Figure PCTCN2020098962-appb-000004
其中,
Figure PCTCN2020098962-appb-000005
是与顶点W i有关的集合,η为阻尼系数。
among them,
Figure PCTCN2020098962-appb-000005
Is the set related to the vertex W i , and η is the damping coefficient.
较佳地,本申请中所述预设的方式为根据所述词的重要度得分选取t个得分最高的词作为所述所述标准文本对话集的主题序列集。Preferably, the preset method in the present application is to select t words with the highest scores as the subject sequence set of the standard text dialogue set according to the importance score of the word.
进一步地,本申请较佳实施例通过word2vec模型将所述主题序列集转换为主题向量集(sent_topic_vec_i),其中,i表示该句话在一次完成对话的索引。所述word2vec模型指的是将词表征为实数值向量的一种高效的算法模型,其利用深度学习的思想,可以通过训练,把对文本内容的处理简化为K维向量空间中的向量运算,而向量空间上的相似度可以用来表示文本语义上的相似。详细地,所述word2vec模型训练过程包括:选取一个适当大小的窗口当做语境,所述word2vec模型的输入层读入窗口内的词,将所述窗口内词的向量(K维,初始随机)加和在一起,形成隐藏层K个节点;所述word2vec模型的输出层是一个巨大的二叉树,叶节点代表语料里所有的词(语料含有V个独立的词,则二叉树有|V|个叶节点);对于叶节点的每一个词,就会有一个全局唯一的编码,形如“010011”,可以表示左子树为1,右子树为0;由于,所述word2vec模型的隐层中每一个节点都会跟二 叉树的内节点有连边,于是对于二叉树的每一个内节点都会有K条连边,每条边上也会有权值。于是,通过训练后word2vec模型可以将上述主题序列转换为主题向量。Further, the preferred embodiment of the present application converts the topic sequence set into a topic vector set (sent_topic_vec_i) through the word2vec model, where i represents the index of the sentence that completes the conversation at one time. The word2vec model refers to an efficient algorithm model that represents words as real-valued vectors. It uses the idea of deep learning to simplify the processing of text content to vector operations in a K-dimensional vector space through training. The similarity in the vector space can be used to represent the semantic similarity of the text. In detail, the word2vec model training process includes: selecting a window of an appropriate size as the context, the input layer of the word2vec model reads the words in the window, and the vector of the words in the window (K-dimensional, initial random) Add them together to form K nodes in the hidden layer; the output layer of the word2vec model is a huge binary tree, and the leaf nodes represent all the words in the corpus (the corpus contains V independent words, then the binary tree has |V| leaves Node); for each word of the leaf node, there will be a globally unique code, like "010011", which can indicate that the left subtree is 1, and the right subtree is 0; because, in the hidden layer of the word2vec model Each node will have an edge with the inner node of the binary tree, so for each inner node of the binary tree there will be K connected edges, and each edge will also have a weight. Therefore, the word2vec model after training can convert the above topic sequence into a topic vector.
S3、将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集。S3. The word set is converted into a word vector set and then input into a pre-trained sentence vector prediction model, and a sentence vector set corresponding to the word vector set is output.
本申请较佳实施例中,所述句子向量预测模型为双向的Transformer模型。所述Transformer模型通过并行处理输入序列中的所有词向量,同时利用自注意力机制将上下文与较远的词向量结合起来,输出对应的句子向量。进一步,本申请通过微调(fine-turning)法以及随机掩盖法对所述双向的Transformer模型进行训练。所述fine-turing方法即通过可用神经网络,提取其浅层特征,并修改深度神经网络中的参数,构建新的神经网络模型,以减少迭代次数。所述随机掩盖法并不是像word2vec模型中那样去对每一个词都进行预测,而是从输入中随机地掩盖部分输入词向量,其目标是基于其上下文来预测被掩盖词向量的原始词汇。In a preferred embodiment of the present application, the sentence vector prediction model is a two-way Transformer model. The Transformer model processes all word vectors in the input sequence in parallel, and at the same time uses a self-attention mechanism to combine context with distant word vectors to output corresponding sentence vectors. Furthermore, this application trains the two-way Transformer model through a fine-turning method and a random concealment method. The fine-turing method extracts the shallow features of the available neural network, modifies the parameters in the deep neural network, and builds a new neural network model to reduce the number of iterations. The random masking method does not predict each word as in the word2vec model, but randomly masks part of the input word vector from the input, and its goal is to predict the original vocabulary of the masked word vector based on its context.
S4、利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。S4. Use a pre-trained intelligent dialogue emotion recognition model to perform encoding and decoding operations on the topic vector set and sentence vector set to obtain corresponding emotion expressions, thereby completing the emotion recognition of the text dialogue.
本申请较佳实施例中,所述预先构建的智能对话情感识别模型包括输入层、隐藏层以及输出层。In a preferred embodiment of the present application, the pre-built intelligent dialogue emotion recognition model includes an input layer, a hidden layer and an output layer.
优先地,本申请对所述智能对话情感识别模型的训练过程如下:Preferably, the training process of the intelligent dialogue emotion recognition model in this application is as follows:
a、构建损失函数。根据深度学习中基本公式,各层输入、输出为
Figure PCTCN2020098962-appb-000006
C i=f(z i)、其中
Figure PCTCN2020098962-appb-000007
为第l层网络第i个神经元的输入,Ws i-1为第l层网络第i个神经元到第l+1层网络中第j个神经元的链接,C j为输出层各单元的输出值,从而构建损失函数
Figure PCTCN2020098962-appb-000008
a. Construct a loss function. According to the basic formula in deep learning, the input and output of each layer are
Figure PCTCN2020098962-appb-000006
C i = f(z i ), where
Figure PCTCN2020098962-appb-000007
Is the input of the i-th neuron in the l-th layer network, Ws i-1 is the link from the i-th neuron in the l-th layer network to the j-th neuron in the l+1-th layer network, and C j is each unit of the output layer To construct the loss function
Figure PCTCN2020098962-appb-000008
b、用梯度下降算法进行损失函数参数值的更新。所述梯度下降算法是神经网络模型训练最常用的优化算法,在本申请实施例中,为找到损失函数
Figure PCTCN2020098962-appb-000009
的最小值,本申请沿着与梯度向量相反的方向-dL/dy更新变量y,这样可以使得梯度减少最快,直至损失收敛至最小值,参数更新公式如下:L=L-αdL/dy,α表示学习率,从而可以获取训练后的智能对话情感识别模型用于情感状态的分析。
b. Use the gradient descent algorithm to update the parameter values of the loss function. The gradient descent algorithm is the most commonly used optimization algorithm for neural network model training. In the embodiment of this application, to find the loss function
Figure PCTCN2020098962-appb-000009
This application updates the variable y along the direction opposite to the gradient vector -dL/dy, so that the gradient can be reduced the fastest until the loss converges to the minimum value. The parameter update formula is as follows: L=L-αdL/dy, α represents the learning rate, so that the trained intelligent dialogue emotion recognition model can be obtained for the analysis of the emotion state.
较佳地,所述编码和解码操作包括:通过所述输入层接收所述主题向量集和句子向量集,利用所述隐藏层对所述主题向量集和句子向量集进行编码操作,得到所述主题向量集和句子向量集的特征序列集,采用动态规划算法解码出所述特征序列集对应的状态序列的概率集,将所述状态序列的概率集中最大概率对应的状态序列作为对应的情感表达的输出结果。Preferably, the encoding and decoding operations include: receiving the topic vector set and the sentence vector set through the input layer, and using the hidden layer to perform an encoding operation on the topic vector set and the sentence vector set to obtain the The feature sequence set of the topic vector set and the sentence vector set, the dynamic programming algorithm is used to decode the probability set of the state sequence corresponding to the feature sequence set, and the state sequence corresponding to the maximum probability in the probability set of the state sequence is used as the corresponding emotion expression The output result.
本申请较佳实施例中,所述动态规划算法用于寻找最有可能产生观测事件序列的-维特比路径-隐含状态序列。其中,所述动态规划算法包括:In a preferred embodiment of the present application, the dynamic programming algorithm is used to find the Viterbi path-hidden state sequence that is most likely to generate the observed event sequence. Wherein, the dynamic programming algorithm includes:
V 1,k=p(y 1|k)·π k V 1,k = p(y 1 |k)·π k
V t,k=p(y t|k)·max x∈S(a x,k·V t-1,x) V t,k = p(y t |k)·max x∈S (a x,k ·V t-1,x )
其中,V 1,k表示第1个状态序列为k对应输出的状态序列概率,p表示概率值,y 1表示第1个状态序列的输出值,k表示状态序列,π k表示初始状态k的概率维;V t,k表示第t个状态为k对应输出的状态序列概率,y t表示第t个状态序列的输出值,S表示状态空间,a x,k表示从状态列x到状态k的转移概率,V t-1,x表示第t-1个状态为x输出对应的状态序列概率。 Among them, V 1,k represents the probability of the output state sequence corresponding to k for the first state sequence, p represents the probability value, y 1 represents the output value of the first state sequence, k represents the state sequence, and π k represents the initial state k. Probability dimension; V t,k represents the probability of the state sequence corresponding to the output of the t-th state, y t represents the output value of the t-th state sequence, S represents the state space, and a x,k represents the state sequence from the state x to the state k The transition probability, V t-1,x represents the probability of the state sequence corresponding to the output of the t-1 state as x.
本申请还提供一种电子设备。参照图2所示,为本申请一实施例提供的电子设备的内部结构示意图。The application also provides an electronic device. Referring to FIG. 2, it is a schematic diagram of the internal structure of an electronic device provided by an embodiment of this application.
在本实施例中,所述电子设备1可以是PC(Personal Computer,个人电脑),或者是智能手机、平板电脑、便携计算机等终端设备,也可以是一种服务器等。该电子设备1至少包括存储器11、处理器12,通信总线13,以及网络接口14。In this embodiment, the electronic device 1 may be a PC (Personal Computer, personal computer), or a terminal device such as a smart phone, a tablet computer, or a portable computer, or a server. The electronic device 1 at least includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.
其中,存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、 硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的硬盘。存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如情感智能识别程序01的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。The memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a hard disk of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of the emotional intelligence recognition program 01, etc., but also to temporarily store data that has been output or will be output.
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行情感智能识别程序01等。In some embodiments, the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip, for running program codes or processing stored in the memory 11 Data, such as the execution of emotional intelligence recognition program 01 and so on.
通信总线13用于实现这些组件之间的连接通信。The communication bus 13 is used to realize the connection and communication between these components.
网络接口14可选的可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子设备1与其他电子设备之间建立通信连接。The network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.
可选地,该电子设备1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may also include a user interface. The user interface may include a display (Display) and an input unit such as a keyboard (Keyboard). The optional user interface may also include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc. Among them, the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
图2仅示出了具有组件11-14以及情感智能识别程序01的电子设备1,本领域技术人员可以理解的是,图1示出的结构并不构成对电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。Figure 2 only shows the electronic device 1 with the components 11-14 and the emotional intelligent recognition program 01. Those skilled in the art can understand that the structure shown in Figure 1 does not constitute a limitation on the electronic device 1, and may include ratios Fewer or more parts are shown, or some parts are combined, or different parts are arranged.
在图2所示的电子设备1实施例中,存储器11中存储有情感智能识别程序01;处理器12执行存储器11中存储的情感智能识别程序01时实现如下步骤:In the embodiment of the electronic device 1 shown in FIG. 2, the memory 11 stores the emotional intelligence recognition program 01; when the processor 12 executes the emotional intelligence recognition program 01 stored in the memory 11, the following steps are implemented:
步骤一、获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集。Step 1: Obtain a text dialogue set, perform a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and perform a word segmentation operation on the standard text dialogue set to obtain a word set.
本申请较佳实施例中,所述文本对话集是记录客服与用户之间对话形成的文本对话集,所述客服可以为中国平安的前台客服、销售客服、售后客服等。In a preferred embodiment of the present application, the text dialogue set is a text dialogue set formed by recording the dialogue between a customer service and a user, and the customer service may be a front desk customer service, sales customer service, after-sales customer service, etc. of Ping An of China.
进一步地,由于所述对话文本集中通常存在重复对话文本,大量的重复数据会影响分类精度,因此,在本申请实施例优先对所述对话文本集进行去重操作得到所述标准文本对话集。其中,所述去重操作包括:Further, since there are usually repeated dialog texts in the dialog text set, a large amount of repeated data will affect the classification accuracy. Therefore, in the embodiment of the present application, the standard text dialog set is obtained by performing deduplication operations on the dialog text set first. Wherein, the deduplication operation includes:
Figure PCTCN2020098962-appb-000010
Figure PCTCN2020098962-appb-000010
其中,n表示所述文本对话集中的文本对话的数量,w 1j和w 2j表示所述文本对话集中任意两个文本对话,d表示任意两个文本对话之间的距离。 Wherein, n represents the number of text dialogs in the text dialog set, w 1j and w 2j represent any two text dialogs in the text dialog set, and d represents the distance between any two text dialogs.
较佳地,在本申请中,若任意两个文本对话之间的距离小于预设的阈值时,则删除所述两个文本对话中的任意一个文本对话。其中,所述预设的阈值为0.1。Preferably, in this application, if the distance between any two text dialogs is less than a preset threshold, any one of the two text dialogs is deleted. Wherein, the preset threshold value is 0.1.
较佳地,本申请中所述分词操作包括:通过预设的策略将所述文本对话集中的词语与预设的词典中的词条进行匹配,得到多个词语,并将所述词语用空格符号隔开得到所述词语集。Preferably, the word segmentation operation in this application includes: matching the words in the text dialogue set with the words in the preset dictionary through a preset strategy to obtain multiple words, and using spaces for the words Symbols are separated to obtain the set of words.
本申请较佳实施例中,所述预设的词典可以包含统计词典和前缀词典。In a preferred embodiment of the present application, the preset dictionary may include a statistical dictionary and a prefix dictionary.
所述统计词典是由统计方法得到的所有可能的分词构造的词典。所述统计词典统计相邻字在语料库中贡献的频度并计算互信息,当所述相邻字互相出现信息大于预设的阈值时,即认定为构成词,其中,所述阈值为0.6。The statistical dictionary is a dictionary constructed by all possible word segmentation obtained by statistical methods. The statistical dictionary counts the frequency of the contribution of adjacent characters in the corpus and calculates mutual information. When the mutual information of adjacent characters is greater than a preset threshold, it is recognized as a constituent word, wherein the threshold is 0.6.
所述前缀词典包括所述统计词典中每一个分词的前缀,例如所述统计词典中的词“北京大学”的前缀分别是“北”、“北京”、“北京大”;词“大学”的前缀是“大”等。本申请利用所述统计词典得到所述文本对话集的可能的分词结果,并通过所述前缀词典根据分词的切分位置,得到最终的切分形式,从而得到所述文本对话集的词语集。The prefix dictionary includes the prefix of each word segment in the statistical dictionary. For example, the prefixes of the word "Peking University" in the statistical dictionary are "North", "Beijing", and "Beijing University"; The prefix is "big" and so on. This application uses the statistical dictionary to obtain possible word segmentation results of the text dialogue set, and obtains the final segmentation form according to the segmentation position of the word through the prefix dictionary, thereby obtaining the word set of the text dialogue set.
步骤二、计算所述词语集中词语的重要度得分,根据所述词语的重要度得分,及按预设的方式选取词语集中的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集。Step 2: Calculate the importance score of the words in the word set, select the words in the word set as the subject sequence set of the standard text dialogue set according to the importance score of the words, and select the words in the word set in a preset manner, and combine the The subject sequence set is converted to the subject vector set.
本申请较佳实施例中,所述计算所述词语集中词语的重要度得分包括:In a preferred embodiment of the present application, the calculation of the importance score of the words in the word set includes:
计算所述词语集任意两个词W i和W j的依存关联度: Calculate the dependency correlation degree of any two words W i and W j in the word set:
Figure PCTCN2020098962-appb-000011
Figure PCTCN2020098962-appb-000011
其中,Dep(W i,W j)表示所述词W i和W j的依存关联度,len(W i,W j)表示所述词W i和W j之间的依存路径长度,b是超参数; Among them, Dep(W i , W j ) represents the degree of dependency relationship between the words W i and W j , len(W i , W j ) represents the length of the dependency path between the words W i and W j, and b is Hyperparameter
根据所述依存关联度计算所述词W i和W j的引力: Calculate the gravitational forces of the words W i and W j according to the dependency correlation degree:
Figure PCTCN2020098962-appb-000012
Figure PCTCN2020098962-appb-000012
其中,f grav(W i,W j)表示所述词W i和W j的引力,tfidf(W i)表示词W i的TF-IDF值,tfidf(W j)表示词W j的TF-IDF值,TF表示词频,IDF表示逆文档频率指数,d是词W i和W j的词向量之间的欧式距离; Among them, f grav (W i , W j ) represents the gravitational force of the words W i and W j , tfidf(W i ) represents the TF-IDF value of the word W i , and tfidf(W j ) represents the TF-IDF of the word W j IDF value, TF means word frequency, IDF means inverse document frequency index, d is the Euclidean distance between the word vectors of words W i and W j;
得到所述词W i和W j之间的关联强度为: The correlation strength between the words W i and W j is obtained as:
weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j) weight(W i ,W j )=Dep(W i ,W j )*f grav (W i ,W j )
根据所述关联强度计算出所述词W i的重要度得分: The words W i is calculated based on the strength of association importance score:
Figure PCTCN2020098962-appb-000013
Figure PCTCN2020098962-appb-000013
其中,
Figure PCTCN2020098962-appb-000014
是与顶点W i有关的集合,η为阻尼系数。
among them,
Figure PCTCN2020098962-appb-000014
Is the set related to the vertex W i , and η is the damping coefficient.
较佳地,本申请中所述预设的方式为根据所述词的重要度得分选取t个得分最高的词作为所述标准文本对话集的主题序列集。Preferably, the preset method in this application is to select t words with the highest scores as the subject sequence set of the standard text dialogue set according to the importance score of the word.
进一步地,本申请较佳实施例通过word2vec模型将所述主题序列集转换为主题向量集(sent_topic_vec_i),其中,i表示该句话在一次完成对话的索引。所述word2vec模型指的是将词表征为实数值向量的一种高效的算法模型,其利用深度学习的思想,可以通过训练,把对文本内容的处理简化为K维向量空间中的向量运算,而向量空间上的相似度可以用来表示文本语义上的相似。详细地,所述word2vec模型训练过程包括:选取一个适当大小的窗口当做语境,所述word2vec模型的输入层读入窗口内的词,将所述窗口内词的向量(K维,初始随机)加和在一起,形成隐藏层K个节点;所述word2vec模型的输出层是一个巨大的二叉树,叶节点代表语料里所有的词(语料含有V个独立的词,则二叉树有|V|个叶节点);对于叶节点的每一个词,就会有一个全局唯一的编码,形如“010011”,可以表示左子树为1,右子树为0;由于,所述word2vec模型的隐层中每一个节点都会跟二叉树的内节点有连边,于是对于二叉树的每一个内节点都会有K条连边,每条边上也会有权值。于是,通过训练后word2vec模型可以将上述主题序列转换为主题向量。Further, the preferred embodiment of the present application converts the topic sequence set into a topic vector set (sent_topic_vec_i) through the word2vec model, where i represents the index of the sentence that completes the conversation at one time. The word2vec model refers to an efficient algorithm model that represents words as real-valued vectors. It uses the idea of deep learning to simplify the processing of text content to vector operations in a K-dimensional vector space through training. The similarity in the vector space can be used to represent the semantic similarity of the text. In detail, the word2vec model training process includes: selecting a window of an appropriate size as the context, the input layer of the word2vec model reads the words in the window, and the vector of the words in the window (K-dimensional, initial random) Add them together to form K nodes in the hidden layer; the output layer of the word2vec model is a huge binary tree, and the leaf nodes represent all the words in the corpus (the corpus contains V independent words, then the binary tree has |V| leaves Node); for each word of the leaf node, there will be a globally unique code, like "010011", which can indicate that the left subtree is 1, and the right subtree is 0; because, in the hidden layer of the word2vec model Each node will have an edge with the inner node of the binary tree, so for each inner node of the binary tree there will be K connected edges, and each edge will also have a weight. Therefore, the word2vec model after training can convert the above topic sequence into a topic vector.
步骤三、将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集。Step 3: The word set is converted into a word vector set and then input into a pre-trained sentence vector prediction model, and a sentence vector set corresponding to the word vector set is output.
本申请较佳实施例中,所述句子向量预测模型为双向的Transformer模型。所述Transformer模型通过并行处理输入序列中的所有词向量,同时利用自注意力机制将上下文与较远的词向量结合起来,输出对应的句子向量。进一步,本申请通过微调(fine-turning)法以及随机掩盖法对所述双向的Transformer模型进行训练。所述fine-turing方法即通过可 用神经网络,提取其浅层特征,并修改深度神经网络中的参数,构建新的神经网络模型,以减少迭代次数。所述随机掩盖法并不是像word2vec模型中那样去对每一个词都进行预测,而是从输入中随机地掩盖部分输入词向量,其目标是基于其上下文来预测被掩盖词向量的原始词汇。In a preferred embodiment of the present application, the sentence vector prediction model is a two-way Transformer model. The Transformer model processes all word vectors in the input sequence in parallel, and at the same time uses a self-attention mechanism to combine context with distant word vectors to output corresponding sentence vectors. Furthermore, this application trains the two-way Transformer model through a fine-turning method and a random concealment method. The fine-turing method extracts the shallow features of the available neural network, modifies the parameters in the deep neural network, and builds a new neural network model to reduce the number of iterations. The random masking method does not predict each word as in the word2vec model, but randomly masks part of the input word vector from the input, and its goal is to predict the original vocabulary of the masked word vector based on its context.
步骤四、利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。Step 4: Use a pre-trained intelligent dialogue emotion recognition model to perform encoding and decoding operations on the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
本申请较佳实施例中,所述预先构建的智能对话情感识别模型包括输入层、隐藏层以及输出层。In a preferred embodiment of the present application, the pre-built intelligent dialogue emotion recognition model includes an input layer, a hidden layer and an output layer.
优先地,本申请对所述智能对话情感识别模型的训练过程如下:Preferably, the training process of the intelligent dialogue emotion recognition model in this application is as follows:
a、构建损失函数。根据深度学习中基本公式,各层输入、输出为
Figure PCTCN2020098962-appb-000015
C i=f(z i)、其中
Figure PCTCN2020098962-appb-000016
为第l层网络第i个神经元的输入,Ws i-1为第l层网络第i个神经元到第l+1层网络中第j个神经元的链接,C j为输出层各单元的输出值,从而构建损失函数
Figure PCTCN2020098962-appb-000017
a. Construct a loss function. According to the basic formula in deep learning, the input and output of each layer are
Figure PCTCN2020098962-appb-000015
C i = f(z i ), where
Figure PCTCN2020098962-appb-000016
Is the input of the i-th neuron in the l-th layer network, Ws i-1 is the link from the i-th neuron in the l-th layer network to the j-th neuron in the l+1-th layer network, and C j is each unit of the output layer To construct the loss function
Figure PCTCN2020098962-appb-000017
b、用梯度下降算法进行损失函数参数值的更新。所述梯度下降算法是神经网络模型训练最常用的优化算法,在本申请实施例中,为找到损失函数
Figure PCTCN2020098962-appb-000018
的最小值,本申请沿着与梯度向量相反的方向-dL/dy更新变量y,这样可以使得梯度减少最快,直至损失收敛至最小值,参数更新公式如下:L=L-αdL/dy,α表示学习率,从而可以获取训练后的智能对话情感识别模型用于情感状态的分析。
b. Use the gradient descent algorithm to update the parameter values of the loss function. The gradient descent algorithm is the most commonly used optimization algorithm for neural network model training. In the embodiment of this application, to find the loss function
Figure PCTCN2020098962-appb-000018
This application updates the variable y along the direction opposite to the gradient vector -dL/dy, so that the gradient can be reduced the fastest until the loss converges to the minimum value. The parameter update formula is as follows: L=L-αdL/dy, α represents the learning rate, so that the trained intelligent dialogue emotion recognition model can be obtained for the analysis of the emotion state.
较佳地,所述编码和解码操作包括:通过所述输入层接收所述主题向量集和句子向量集,利用所述隐藏层对所述主题向量集和句子向量集进行编码操作,得到所述主题向量集和句子向量集的特征序列集,采用动态规划算法解码出所述特征序列集对应的状态序列的概率集,将所述状态序列的概率集中最大概率对应的状态序列作为对应的情感表达的输出结果。Preferably, the encoding and decoding operations include: receiving the topic vector set and the sentence vector set through the input layer, and using the hidden layer to perform an encoding operation on the topic vector set and the sentence vector set to obtain the The feature sequence set of the topic vector set and the sentence vector set, the dynamic programming algorithm is used to decode the probability set of the state sequence corresponding to the feature sequence set, and the state sequence corresponding to the maximum probability in the probability set of the state sequence is used as the corresponding emotion expression The output result.
本申请较佳实施例中,所述动态规划算法用于寻找最有可能产生观测事件序列的-维特比路径-隐含状态序列。其中,所述动态规划算法包括:In a preferred embodiment of the present application, the dynamic programming algorithm is used to find the Viterbi path-hidden state sequence that is most likely to generate the observed event sequence. Wherein, the dynamic programming algorithm includes:
V 1,k=p(y 1|k)·π k V 1,k = p(y 1 |k)·π k
V t,k=p(y t|k)·max x∈S(a x,k·V t-1,x) V t,k = p(y t |k)·max x∈S (a x,k ·V t-1,x )
其中,V 1,k表示第1个状态序列为k对应输出的状态序列概率,p表示概率值,y 1表示第1个状态序列的输出值,k表示状态序列,π k表示初始状态k的概率维;V t,k表示第t个状态为k对应输出的状态序列概率,y t表示第t个状态序列的输出值,S表示状态空间,a x,k表示从状态列x到状态k的转移概率,V t-1,x表示第t-1个状态为x输出对应的状态序列概率。 Among them, V 1,k represents the probability of the output state sequence corresponding to k for the first state sequence, p represents the probability value, y 1 represents the output value of the first state sequence, k represents the state sequence, and π k represents the initial state k. Probability dimension; V t,k represents the probability of the state sequence corresponding to the output of the t-th state, y t represents the output value of the t-th state sequence, S represents the state space, and a x,k represents the state sequence from the state x to the state k The transition probability, V t-1,x represents the probability of the state sequence corresponding to the output of the t-1 state as x.
参照图3所示,为本申请情感智能识别装置100的示意图,该实施例中,所述情感智能识别装置100包括去重分词模块10、计算转换模块20、句子向量预测模块30以及情感识别模块40,示例性地:Referring to FIG. 3, this is a schematic diagram of the emotional intelligence recognition device 100 of this application. In this embodiment, the emotional intelligence recognition device 100 includes a deduplication word segmentation module 10, a calculation conversion module 20, a sentence vector prediction module 30, and an emotion recognition module 40. Illustratively:
所述去重分词模块10用于:获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集。The deduplication word segmentation module 10 is used to obtain a text dialogue set, perform a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and perform a word segmentation operation on the standard text dialogue set to obtain a word set.
所述计算转换模块20用于:计算所述词语集中词语的重要度得分,根据所述词语的重要度得分,及按预设的方式选取对应的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集。The calculation conversion module 20 is configured to calculate the importance scores of words in the word set, select corresponding words according to the importance scores of the words, and select corresponding words as the topic sequence set of the standard text dialogue set in a preset manner , And convert the subject sequence set into a subject vector set.
所述句子向量预测模块30用于:将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集。The sentence vector prediction module 30 is configured to: convert the word set into a word vector set and input it into a pre-trained sentence vector prediction model, and output a sentence vector set corresponding to the word vector set.
所述情感识别模块40用于:利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。The emotion recognition module 40 is configured to use a pre-trained intelligent dialogue emotion recognition model to perform encoding and decoding operations on the topic vector set and sentence vector set to obtain corresponding emotion expressions, thereby completing the emotion recognition of the text dialogue.
上述去重分词模块10、计算转换模块20、句子向量预测模块30以及情感识别模块40等程序模块被执行时所实现的功能或操作步骤与上述实施例大体相同,在此不再赘述。The functions or operation steps implemented by the program modules such as the deduplication word segmentation module 10, the calculation conversion module 20, the sentence vector prediction module 30, and the emotion recognition module 40 when executed are substantially the same as those in the foregoing embodiment, and will not be repeated here.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质上存储有情感智能识别程序,所述情感智能识别程序可被一个或多个处理器执行,以实现如下操作:In addition, the embodiment of the present application also proposes a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium stores an emotional intelligence recognition program. The emotional intelligence recognition program can be executed by one or more processors to achieve the following operations:
获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集;Acquiring a text dialogue set, performing a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and performing a word segmentation operation on the standard text dialogue set to obtain a word set;
计算所述词语集中词语的重要度得分,根据所述词语的重要度得分,及按预设的方式选取对应的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集;Calculate the importance scores of the words in the word set, select corresponding words according to the importance scores of the words, and select corresponding words as the subject sequence set of the standard text dialogue set in a preset manner, and convert the subject sequence set Is the theme vector set;
将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集;Inputting the word set into a pre-trained sentence vector prediction model after converting the word vector set, and outputting a sentence vector set corresponding to the word vector set;
利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。A pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
本申请计算机可读存储介质具体实施方式与上述电子设备和方法各实施例基本相同,在此不作累述。The specific implementation manners of the computer-readable storage medium of the present application are basically the same as the foregoing embodiments of the electronic device and method, and will not be repeated here.
需要说明的是,上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。并且本文中的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that the serial numbers of the above-mentioned embodiments of the present application are only for description, and do not represent the superiority or inferiority of the embodiments. And the terms "include", "include" or any other variants thereof in this article are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, but also includes those elements that are not explicitly included. The other elements listed may also include elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disks, optical disks), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种情感智能识别方法,其中,所述方法包括:An emotional intelligence recognition method, wherein the method includes:
    获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集;Acquiring a text dialogue set, performing a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and performing a word segmentation operation on the standard text dialogue set to obtain a word set;
    计算所述词语集中词语的重要度得分,根据所述词语的重要度得分及按预设的方式选取词语集中的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集;Calculate the importance score of the words in the word set, select the words in the word set as the subject sequence set of the standard text dialogue set according to the importance score of the words and in a preset manner, and convert the subject sequence set Is the theme vector set;
    将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集;Inputting the word set into a pre-trained sentence vector prediction model after converting the word vector set, and outputting a sentence vector set corresponding to the word vector set;
    利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。A pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
  2. 如权利要求1所述的情感智能识别方法,其中,所述去重操作包括:5. The emotional intelligence recognition method of claim 1, wherein the deduplication operation comprises:
    Figure PCTCN2020098962-appb-100001
    Figure PCTCN2020098962-appb-100001
    其中,n表示所述文本对话集中的文本对话的数量,w 1j和w 2j表示所述文本对话集中任意两个文本对话,d表示任意两个文本对话之间的距离。 Wherein, n represents the number of text dialogs in the text dialog set, w 1j and w 2j represent any two text dialogs in the text dialog set, and d represents the distance between any two text dialogs.
  3. 如权利要求1所述的情感智能识别方法,其中,所述计算所述词语集中词语的重要度得分包括:5. The emotional intelligence recognition method according to claim 1, wherein said calculating the importance score of words in said word set comprises:
    计算所述词语集任意两个词W i和W j的依存关联度: Calculate the dependency correlation degree of any two words W i and W j in the word set:
    Figure PCTCN2020098962-appb-100002
    Figure PCTCN2020098962-appb-100002
    其中,Dep(W i,W j)表示所述词W i和W j的依存关联度,len(W i,W j)表示所述词W i和W j之间的依存路径长度,b是超参数; Among them, Dep(W i , W j ) represents the degree of dependency relationship between the words W i and W j , len(W i , W j ) represents the length of the dependency path between the words W i and W j, and b is Hyperparameter
    根据所述依存关联度计算所述词W i和W j的引力: Calculate the gravitational forces of the words W i and W j according to the dependency correlation degree:
    Figure PCTCN2020098962-appb-100003
    Figure PCTCN2020098962-appb-100003
    其中,f grav(W i,W j)表示所述词W i和W j的引力,tfidf(W i)表示词W i的TF-IDF值,tfidf(W j)表示词W j的TF-IDF值,TF表示词频,IDF表示逆文档频率指数,d是词W i和W j的词向量之间的欧式距离; Among them, f grav (W i , W j ) represents the gravitational force of the words W i and W j , tfidf(W i ) represents the TF-IDF value of the word W i , and tfidf(W j ) represents the TF-IDF of the word W j IDF value, TF means word frequency, IDF means inverse document frequency index, d is the Euclidean distance between the word vectors of words W i and W j;
    根据所述依存关联度和所述引力得到所述词W i和W j之间的关联强度为: According to the dependency correlation degree and the gravity, the correlation strength between the words W i and W j is:
    weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j) weight(W i ,W j )=Dep(W i ,W j )*f grav (W i ,W j )
    根据所述关联强度计算出所述词W i的重要度得分: The words W i is calculated based on the strength of association importance score:
    Figure PCTCN2020098962-appb-100004
    Figure PCTCN2020098962-appb-100004
    其中,
    Figure PCTCN2020098962-appb-100005
    是与顶点W i有关的集合,η为阻尼系数。
    among them,
    Figure PCTCN2020098962-appb-100005
    Is the set related to the vertex W i , and η is the damping coefficient.
  4. 如权利要求1至3中任意一项所述的情感智能识别方法,其中,所述编码和解码操作包括:The emotional intelligence recognition method according to any one of claims 1 to 3, wherein the encoding and decoding operations include:
    对所述主题向量集和句子向量集进行编码操作,得到所述主题向量集和句子向量集的特征序列集;Performing an encoding operation on the topic vector set and the sentence vector set to obtain the feature sequence set of the topic vector set and the sentence vector set;
    采用动态规划算法解码出所述特征序列集对应的状态序列的概率集,将所述状态序列的概率集中最大概率对应的状态序列作为对应的情感表达的输出结果。A dynamic programming algorithm is used to decode the probability set of the state sequence corresponding to the feature sequence set, and the state sequence corresponding to the maximum probability in the probability set of the state sequence is used as the output result of the corresponding emotional expression.
  5. 如权利要求4中所述的情感智能识别方法,其中,所述动态规划算法包括:The emotional intelligence recognition method according to claim 4, wherein the dynamic programming algorithm comprises:
    V 1,k=p(y 1|k)·π k V 1,k = p(y 1 |k)·π k
    V t,k=p(y t|k)·max x∈S(a x,k·V t-1,x) V t,k = p(y t |k)·max x∈S (a x,k ·V t-1,x )
    其中,V 1,k表示第1个状态序列为k对应输出的状态序列概率,p表示概率值,y 1表示第1个状态序列的输出值,k表示状态序列,π k表示初始状态k的概率维;V t,k表示第t个状态为k对应输出的状态序列概率,y t表示第t个状态序列的输出值,S表示状态空间,a x,k表示从状态列x到状态k的转移概率,V t-1,x表示第t-1个状态为x输出对应的状态序列概率。 Among them, V 1,k represents the probability of the output state sequence corresponding to k for the first state sequence, p represents the probability value, y 1 represents the output value of the first state sequence, k represents the state sequence, and π k represents the initial state k. Probability dimension; V t,k represents the probability of the state sequence corresponding to the output of the t-th state, y t represents the output value of the t-th state sequence, S represents the state space, and a x,k represents the state sequence from the state x to the state k The transition probability, V t-1,x represents the probability of the state sequence corresponding to the output of the t-1 state as x.
  6. 如权利要求1至3中任意一项所述的情感智能识别方法,其中,所述分词操作包括:通过预设的策略将所述文本对话集中的词语与预设的词典中的词条进行匹配,得到多个词语,并将所述词语用空格符号隔开得到所述词语集。The emotional intelligent recognition method according to any one of claims 1 to 3, wherein the word segmentation operation comprises: matching words in the text dialogue set with entries in a preset dictionary through a preset strategy To obtain a plurality of words, and separate the words with spaces to obtain the word set.
  7. 如权利要求6所述的情感智能识别方法,其中,所述预设的词典包含统计词典和前缀词典。8. The method for emotional intelligent recognition of claim 6, wherein the preset dictionary includes a statistical dictionary and a prefix dictionary.
  8. 一种电子设备,其中,所述设备包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的情感智能识别程序,所述情感智能识别程序被所述处理器执行时实现如下步骤:An electronic device, wherein the device includes a memory and a processor, the memory stores an emotional intelligence recognition program that can run on the processor, and the emotional intelligence recognition program is implemented when the processor is executed The following steps:
    获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集;Acquiring a text dialogue set, performing a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and performing a word segmentation operation on the standard text dialogue set to obtain a word set;
    计算所述词语集中词语的重要度得分,根据所述词语的重要度得分及按预设的方式选取词语集中的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集;Calculate the importance score of the words in the word set, select the words in the word set as the subject sequence set of the standard text dialogue set according to the importance score of the words and in a preset manner, and convert the subject sequence set Is the theme vector set;
    将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集;Inputting the word set into a pre-trained sentence vector prediction model after converting the word vector set, and outputting a sentence vector set corresponding to the word vector set;
    利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。A pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
  9. 如权利要求8所述的电子设备,其中,所述去重操作包括:The electronic device according to claim 8, wherein the deduplication operation comprises:
    Figure PCTCN2020098962-appb-100006
    Figure PCTCN2020098962-appb-100006
    其中,n表示所述文本对话集中的文本对话的数量,w 1j和w 2j表示所述文本对话集中任意两个文本对话,d表示任意两个文本对话之间的距离。 Wherein, n represents the number of text dialogs in the text dialog set, w 1j and w 2j represent any two text dialogs in the text dialog set, and d represents the distance between any two text dialogs.
  10. 如权利要求8所述的电子设备,其中,所述计算所述词语集中词语的重要度得分包括:8. The electronic device according to claim 8, wherein said calculating the importance score of the words in the word set comprises:
    计算所述词语集任意两个词W i和W j的依存关联度: Calculate the dependency correlation degree of any two words W i and W j in the word set:
    Figure PCTCN2020098962-appb-100007
    Figure PCTCN2020098962-appb-100007
    其中,Dep(W i,W j)表示所述词W i和W j的依存关联度,len(W i,W j)表示所述词W i和W j之间的依存路径长度,b是超参数; Among them, Dep(W i , W j ) represents the degree of dependency relationship between the words W i and W j , len(W i , W j ) represents the length of the dependency path between the words W i and W j, and b is Hyperparameter
    根据所述依存关联度计算所述词W i和W j的引力: Calculate the gravitational forces of the words W i and W j according to the dependency correlation degree:
    Figure PCTCN2020098962-appb-100008
    Figure PCTCN2020098962-appb-100008
    其中,f grav(W W,W j)表示所述词W i和W j的引力,tfidf(W i)表示词W i的TF-IDF值,tfidf(W j)表示词W j的TF-IDF值,TF表示词频,IDF表示逆文档频率指数,d是词W i和W j的词向量之间的欧式距离; Among them, f grav (W W , W j ) represents the gravitational force of the words W i and W j , tfidf(W i ) represents the TF-IDF value of the word W i , and tfidf(W j ) represents the TF-IDF value of the word W j IDF value, TF means word frequency, IDF means inverse document frequency index, d is the Euclidean distance between the word vectors of words W i and W j;
    根据所述依存关联度和所述引力得到所述词W i和W j之间的关联强度为: According to the dependency correlation degree and the gravity, the correlation strength between the words W i and W j is:
    weight(W i,W j)=Dep(W i,W j)*f grav(W W,W j) weight(W i ,W j )=Dep(W i ,W j )*f grav (W W ,W j )
    根据所述关联强度计算出所述词W i的重要度得分: The words W i is calculated based on the strength of association importance score:
    Figure PCTCN2020098962-appb-100009
    Figure PCTCN2020098962-appb-100009
    其中,
    Figure PCTCN2020098962-appb-100010
    是与顶点W i有关的集合,η为阻尼系数。
    among them,
    Figure PCTCN2020098962-appb-100010
    Is the set related to the vertex W i , and η is the damping coefficient.
  11. 如权利要求8至10中任意一项所述的电子设备,其中,所述编码和解码操作包括:The electronic device according to any one of claims 8 to 10, wherein the encoding and decoding operations include:
    对所述主题向量集和句子向量集进行编码操作,得到所述主题向量集和句子向量集的特征序列集;Performing an encoding operation on the topic vector set and the sentence vector set to obtain the feature sequence set of the topic vector set and the sentence vector set;
    采用动态规划算法解码出所述特征序列集对应的状态序列的概率集,将所述状态序列的概率集中最大概率对应的状态序列作为对应的情感表达的输出结果。A dynamic programming algorithm is used to decode the probability set of the state sequence corresponding to the feature sequence set, and the state sequence corresponding to the maximum probability in the probability set of the state sequence is used as the output result of the corresponding emotional expression.
  12. 如权利要求11所述的电子设备,其中,所述动态规划算法包括:The electronic device of claim 11, wherein the dynamic programming algorithm comprises:
    V 1,k=p(y 1|k)·π k V 1,k = p(y 1 |k)·π k
    V t,k=p(y t|k)·max x∈S(a x,k·V t-1,x) V t,k = p(y t |k)·max x∈S (a x,k ·V t-1,x )
    其中,V 1,k表示第1个状态序列为k对应输出的状态序列概率,p表示概率值,y 1表示第1个状态序列的输出值,k表示状态序列,π k表示初始状态k的概率维;V t,k表示第t个状态为k对应输出的状态序列概率,y t表示第t个状态序列的输出值,S表示状态空间,a x,k表示从状态列x到状态k的转移概率,V t-1,x表示第t-1个状态为x输出对应的状态序列概率。 Among them, V 1,k represents the probability of the output state sequence corresponding to k for the first state sequence, p represents the probability value, y 1 represents the output value of the first state sequence, k represents the state sequence, and π k represents the initial state k. Probability dimension; V t,k represents the probability of the state sequence corresponding to the output of the t-th state, y t represents the output value of the t-th state sequence, S represents the state space, and a x,k represents the state sequence from the state x to the state k The transition probability, V t-1,x represents the probability of the state sequence corresponding to the output of the t-1 state as x.
  13. 如权利要求8至10中任意一项所述的电子设备,其中,所述分词操作包括:通过预设的策略将所述文本对话集中的词语与预设的词典中的词条进行匹配,得到多个词语,并将所述词语用空格符号隔开得到所述词语集。The electronic device according to any one of claims 8 to 10, wherein the word segmentation operation comprises: matching words in the text dialogue set with entries in a preset dictionary through a preset strategy to obtain Multiple words, and separate the words with spaces to obtain the word set.
  14. 如权利要求13所述的电子设备,其中,所述预设的词典包含统计词典和前缀词典。The electronic device according to claim 13, wherein the preset dictionary includes a statistical dictionary and a prefix dictionary.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有情感智能识别程序,所述情感智能识别程序可被一个或者多个处理器执行,以实现如下步骤:A computer-readable storage medium, wherein an emotional intelligence recognition program is stored on the computer-readable storage medium, and the emotional intelligence recognition program can be executed by one or more processors to implement the following steps:
    获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集;Acquiring a text dialogue set, performing a deduplication operation on the text dialogue set to obtain a standard text dialogue set, and performing a word segmentation operation on the standard text dialogue set to obtain a word set;
    计算所述词语集中词语的重要度得分,根据所述词语的重要度得分及按预设的方式选取词语集中的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集;Calculate the importance score of the words in the word set, select the words in the word set as the subject sequence set of the standard text dialogue set according to the importance score of the words and in a preset manner, and convert the subject sequence set Is the theme vector set;
    将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集;Inputting the word set into a pre-trained sentence vector prediction model after converting the word vector set, and outputting a sentence vector set corresponding to the word vector set;
    利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。A pre-trained intelligent dialogue emotion recognition model is used to encode and decode the topic vector set and sentence vector set to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
  16. 如权利要求15所述的计算机可读存储介质,其中,所述去重操作包括:The computer-readable storage medium of claim 15, wherein the deduplication operation comprises:
    Figure PCTCN2020098962-appb-100011
    Figure PCTCN2020098962-appb-100011
    其中,n表示所述文本对话集中的文本对话的数量,w 1j和w 2j表示所述文本对话集中任意两个文本对话,d表示任意两个文本对话之间的距离。 Wherein, n represents the number of text dialogs in the text dialog set, w 1j and w 2j represent any two text dialogs in the text dialog set, and d represents the distance between any two text dialogs.
  17. 如权利要求15所述的计算机可读存储介质,其中,所述计算所述词语集中词语的重要度得分包括:15. The computer-readable storage medium according to claim 15, wherein said calculating the importance score of the words in the word set comprises:
    计算所述词语集任意两个词W i和W j的依存关联度: Calculate the dependency correlation degree of any two words W i and W j in the word set:
    Figure PCTCN2020098962-appb-100012
    Figure PCTCN2020098962-appb-100012
    其中,Dep(W i,W j)表示所述词W i和W j的依存关联度,len(W i,W j)表示所述词W i和W j之间的依存路径长度,b是超参数; Among them, Dep(W i , W j ) represents the degree of dependency relationship between the words W i and W j , len(W i , W j ) represents the length of the dependency path between the words W i and W j, and b is Hyperparameter
    根据所述依存关联度计算所述词W i和W j的引力: Calculate the gravitational forces of the words W i and W j according to the dependency correlation degree:
    Figure PCTCN2020098962-appb-100013
    Figure PCTCN2020098962-appb-100013
    其中,f grav(W W,W j)表示所述词W i和W j的引力,tfidf(W i)表示词W i的TF-IDF值,tfidf(W j)表示词W j的TF-IDF值,TF表示词频,IDF表示逆文档频率指数,d是词W i和W j的词向量之间的欧式距离; Among them, f grav (W W , W j ) represents the gravitational force of the words W i and W j , tfidf(W i ) represents the TF-IDF value of the word W i , and tfidf(W j ) represents the TF-IDF value of the word W j IDF value, TF means word frequency, IDF means inverse document frequency index, d is the Euclidean distance between the word vectors of words W i and W j;
    根据所述依存关联度和所述引力得到所述词W i和W j之间的关联强度为: According to the dependency correlation degree and the gravity, the correlation strength between the words W i and W j is:
    weight(W i,W j)=Dep(W i,W j)*f grav(W i,W j) weight(W i ,W j )=Dep(W i ,W j )*f grav (W i ,W j )
    根据所述关联强度计算出所述词W i的重要度得分: The words W i is calculated based on the strength of association importance score:
    Figure PCTCN2020098962-appb-100014
    Figure PCTCN2020098962-appb-100014
    其中,
    Figure PCTCN2020098962-appb-100015
    是与顶点W i有关的集合,η为阻尼系数。
    among them,
    Figure PCTCN2020098962-appb-100015
    Is the set related to the vertex W i , and η is the damping coefficient.
  18. 如权利要求15至17中任意一项所述的计算机可读存储介质,其中,所述编码和解码操作包括:17. The computer-readable storage medium according to any one of claims 15 to 17, wherein the encoding and decoding operations include:
    对所述主题向量集和句子向量集进行编码操作,得到所述主题向量集和句子向量集的特征序列集;Performing an encoding operation on the topic vector set and the sentence vector set to obtain the feature sequence set of the topic vector set and the sentence vector set;
    采用动态规划算法解码出所述特征序列集对应的状态序列的概率集,将所述状态序列的概率集中最大概率对应的状态序列作为对应的情感表达的输出结果。A dynamic programming algorithm is used to decode the probability set of the state sequence corresponding to the feature sequence set, and the state sequence corresponding to the maximum probability in the probability set of the state sequence is used as the output result of the corresponding emotional expression.
  19. 如权利要求18所述的计算机可读存储介质,其中,所述动态规划算法包括:The computer-readable storage medium of claim 18, wherein the dynamic programming algorithm comprises:
    V 1,k=p(y 1|k)·π k V 1,k = p(y 1 |k)·π k
    V t,k=p(y t|k)·max x∈S(a x,k·V t-1,x) V t,k = p(y t |k)·max x∈S (a x,k ·V t-1,x )
    其中,V 1,k表示第1个状态序列为k对应输出的状态序列概率,p表示概率值,y 1表示第1个状态序列的输出值,k表示状态序列,π k表示初始状态k的概率维;V t,k表示第t个状态为k对应输出的状态序列概率,y t表示第t个状态序列的输出值,S表示状态空间,a x,k表示从状态列x到状态k的转移概率,V t-1,x表示第t-1个状态为x输出对应的状态序列概率。 Among them, V 1,k represents the probability of the output state sequence corresponding to k for the first state sequence, p represents the probability value, y 1 represents the output value of the first state sequence, k represents the state sequence, and π k represents the initial state k. Probability dimension; V t,k represents the probability of the state sequence corresponding to the output of the t-th state, y t represents the output value of the t-th state sequence, S represents the state space, and a x,k represents the state sequence from the state x to the state k The transition probability, V t-1,x represents the probability of the state sequence corresponding to the output of the t-1 state as x.
  20. 一种情感智能识别装置,其中,包括:An emotional intelligent recognition device, which includes:
    去重分词模块,用于获取文本对话集,对所述文本对话集进行去重操作后得到标准文本对话集,并对所述标准文本对话集进行分词操作,得到词语集;The deduplication word segmentation module is used to obtain a text dialog set, perform a deduplication operation on the text dialog set to obtain a standard text dialog set, and perform a word segmentation operation on the standard text dialog set to obtain a word set;
    计算转换模块,用于计算所述词语集中词语的重要度得分,根据所述词语的重要度得分,及按预设的方式选取对应的词语作为所述标准文本对话集的主题序列集,并将所述主题序列集转换为主题向量集;The calculation conversion module is used to calculate the importance score of the words in the word set, select corresponding words as the subject sequence set of the standard text dialogue set according to the importance score of the words, and in a preset manner, and The subject sequence set is converted into a subject vector set;
    句子向量预测模块,用于将所述词语集转换词向量集后输入至预先训练的句子向量预测模型中,输出所述词向量集对应的句子向量集;A sentence vector prediction module, configured to convert the word set into a word vector set and input it into a pre-trained sentence vector prediction model, and output a sentence vector set corresponding to the word vector set;
    情感识别模块,用于利用预先训练的智能对话情感识别模型对所述主题向量集和句子向量集进行编码和解码操作后得到对应的情感表达,从而完成文本对话的情感识别。The emotion recognition module is used to encode and decode the topic vector set and sentence vector set by using a pre-trained intelligent dialogue emotion recognition model to obtain the corresponding emotion expression, thereby completing the emotion recognition of the text dialogue.
PCT/CN2020/098962 2020-01-10 2020-06-29 Intelligent emotion recognition method and apparatus, electronic device, and storage medium WO2021139107A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010034202.3A CN111241828A (en) 2020-01-10 2020-01-10 Intelligent emotion recognition method and device and computer readable storage medium
CN202010034202.3 2020-01-10

Publications (1)

Publication Number Publication Date
WO2021139107A1 true WO2021139107A1 (en) 2021-07-15

Family

ID=70866114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/098962 WO2021139107A1 (en) 2020-01-10 2020-06-29 Intelligent emotion recognition method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN111241828A (en)
WO (1) WO2021139107A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258134A (en) * 2023-04-24 2023-06-13 中国科学技术大学 Dialogue emotion recognition method based on convolution joint model

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241828A (en) * 2020-01-10 2020-06-05 平安科技(深圳)有限公司 Intelligent emotion recognition method and device and computer readable storage medium
CN111881257B (en) * 2020-07-24 2022-06-03 广州大学 Automatic matching method, system and storage medium based on subject word and sentence subject matter
CN114005467A (en) * 2020-07-28 2022-02-01 中移(苏州)软件技术有限公司 Speech emotion recognition method, device, equipment and storage medium
CN112000793B (en) * 2020-08-28 2022-08-09 哈尔滨工业大学 Man-machine interaction oriented dialogue target planning method
CN113095076B (en) * 2021-04-20 2023-08-22 平安银行股份有限公司 Sensitive word recognition method and device, electronic equipment and storage medium
CN113268562B (en) * 2021-05-24 2022-05-13 平安科技(深圳)有限公司 Text emotion recognition method, device and equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639824A (en) * 2009-08-27 2010-02-03 北京理工大学 Text filtering method based on emotional orientation analysis against malicious information
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103226576A (en) * 2013-04-01 2013-07-31 杭州电子科技大学 Comment spam filtering method based on semantic similarity
CN103268350A (en) * 2013-05-29 2013-08-28 安徽雷越网络科技有限公司 Internet public opinion information monitoring system and monitoring method
CN111241828A (en) * 2020-01-10 2020-06-05 平安科技(深圳)有限公司 Intelligent emotion recognition method and device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639824A (en) * 2009-08-27 2010-02-03 北京理工大学 Text filtering method based on emotional orientation analysis against malicious information
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103226576A (en) * 2013-04-01 2013-07-31 杭州电子科技大学 Comment spam filtering method based on semantic similarity
CN103268350A (en) * 2013-05-29 2013-08-28 安徽雷越网络科技有限公司 Internet public opinion information monitoring system and monitoring method
CN111241828A (en) * 2020-01-10 2020-06-05 平安科技(深圳)有限公司 Intelligent emotion recognition method and device and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258134A (en) * 2023-04-24 2023-06-13 中国科学技术大学 Dialogue emotion recognition method based on convolution joint model
CN116258134B (en) * 2023-04-24 2023-08-29 中国科学技术大学 Dialogue emotion recognition method based on convolution joint model

Also Published As

Publication number Publication date
CN111241828A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
WO2021139107A1 (en) Intelligent emotion recognition method and apparatus, electronic device, and storage medium
CN111368996B (en) Retraining projection network capable of transmitting natural language representation
Zhang et al. LSTM-CNN hybrid model for text classification
WO2020224097A1 (en) Intelligent semantic document recommendation method and device, and computer-readable storage medium
WO2023065544A1 (en) Intention classification method and apparatus, electronic device, and computer-readable storage medium
WO2020082560A1 (en) Method, apparatus and device for extracting text keyword, as well as computer readable storage medium
WO2021068339A1 (en) Text classification method and device, and computer readable storage medium
WO2020107878A1 (en) Method and apparatus for generating text summary, computer device and storage medium
CN110516253B (en) Chinese spoken language semantic understanding method and system
KR101715118B1 (en) Deep Learning Encoding Device and Method for Sentiment Classification of Document
CN112101041B (en) Entity relationship extraction method, device, equipment and medium based on semantic similarity
CN112989834A (en) Named entity identification method and system based on flat grid enhanced linear converter
Altowayan et al. Improving Arabic sentiment analysis with sentiment-specific embeddings
CN112395385B (en) Text generation method and device based on artificial intelligence, computer equipment and medium
CN110879834B (en) Viewpoint retrieval system based on cyclic convolution network and viewpoint retrieval method thereof
CN110502748B (en) Text topic extraction method, device and computer readable storage medium
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
US11783179B2 (en) System and method for domain- and language-independent definition extraction using deep neural networks
CN113177412A (en) Named entity identification method and system based on bert, electronic equipment and storage medium
CN111599340A (en) Polyphone pronunciation prediction method and device and computer readable storage medium
CN113178193A (en) Chinese self-defined awakening and Internet of things interaction method based on intelligent voice chip
CN113704416A (en) Word sense disambiguation method and device, electronic equipment and computer-readable storage medium
WO2022228127A1 (en) Element text processing method and apparatus, electronic device, and storage medium
CN109815497B (en) Character attribute extraction method based on syntactic dependency
Jaech et al. What your username says about you

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911576

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911576

Country of ref document: EP

Kind code of ref document: A1