CN111694935A

CN111694935A - Multi-turn question and answer emotion determining method and device, computer equipment and storage medium

Info

Publication number: CN111694935A
Application number: CN202010340290.XA
Authority: CN
Inventors: 邓悦; 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-04-26
Filing date: 2020-04-26
Publication date: 2020-09-22
Also published as: WO2021218023A1

Abstract

The embodiment of the application belongs to the field of semantic recognition in artificial intelligence and relates to a method for determining emotion of multi-turn question answering, which comprises the steps of dividing a question answering text into a plurality of groups of question text segments and a plurality of answer text segments; inputting all the question text segments and answer text segments into a trained model, and respectively coding each group of the question text segments and answer text segments into a matrix, wherein words in the question text segments and the answer text segments are represented by lines or columns in the array; combining all the question text segments with each answer text segment to form a question-answer pair; updating all data representing the matrix according to the correlation between the words in the question text segment and the words in the answer text segment in the question-answer pair; splicing all the updated matrixes, and performing linear transformation to determine the probability of meeting any emotion types; and taking the emotion with the highest probability as the emotion of the respondent, and storing the emotion of the respondent in the blockchain network.

Description

Multi-turn question and answer emotion determining method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of semantic recognition technology, and in particular, to a method and an apparatus for determining emotion of multiple rounds of question answering, a computer device, and a storage medium

Background

In a commercial activity, many decisions need to be made after communication negotiation, and in order to realize communication between two parties, a large number of negotiations need to be generated, which may cause physical and mental fatigue of related personnel, and thus, the work efficiency is low. One use scenario is that an enterprise recruiter needs to interview, and an interviewee needs to continuously communicate with a plurality of interviewers within a period of time, and the efficiency of recruitment work is possibly low due to the huge workload. Therefore, the technical scheme of intelligent interviewing is brought forward, communication information between the interviewer and the ai interviewer is collected by means of results generated in the aspect of semantic recognition through artificial intelligence, semantic judgment is carried out to determine the emotion of the interviewer, and the interviewer is assisted to judge whether the interviewer can perform related work or not.

The existing scheme is used for judging the emotion of a user in a question-answering link, the whole segment of language content is input into a model for emotion recognition, so that a group of very large data needs to be processed at the same time to obtain semantics, the calculation amount is very large, the hardware cost is high, the efficiency is low, semantic relation among segment contexts cannot be considered, and the emotion judgment accuracy is to be further improved.

Disclosure of Invention

The embodiment of the application aims to provide a method for accurately and quickly judging the emotion of interviewers by combining the context of a question and answer text.

In order to solve the above technical problem, an embodiment of the present application provides a method for determining emotion of multiple rounds of questions and answers, which adopts the following technical solutions:

a method for determining emotion of multi-turn question answering comprises the following steps: receiving a question and answer text, and dividing the question and answer text into at least two question text segments and at least two answer text segments; inputting all the question text segments and answer text segments into a trained model, and respectively coding each group of the question text segments and answer text segments into a matrix, wherein words in the question text segments and the answer text segments are represented by lines or columns in the array; combining all the coded question text segments with each answer text segment in pairs to form question-answer pairs; updating the words in the question text matrix or the answer text matrix according to the correlation between the words in the question text segment and the words in the answer text segment in the question-answer pair; splicing all the updated question text matrixes and answer text matrixes into a spliced matrix, converting rows or columns of the spliced matrix into a number which is adaptive to a preset emotion type through linear transformation, and determining the probability of meeting one preset emotion type through each row or each column of the spliced matrix; and taking the emotion class with the highest probability as the emotion of the respondent and outputting the emotion of the respondent.

Further, the step of updating the words in the presentation question text matrix or the answer text matrix according to the relevance between the words in the question text segment and the words in the answer text segment pair in the question-answer pair specifically includes: respectively performing dot product on the words of the question text segment in one question-answer pair and each word in the opposite answer text segment through an attention mechanism to obtain a group of relationship values of each word in the question text segment and each word in the opposite answer text segment; or dot product the words of one answer text segment with each word in the corresponding question text segment to obtain a group of relationship values of each word in the answer text segment and each word in the corresponding question text segment; mapping the relation value according to the proportion of the relation value corresponding to each word; according to the mapped set of relation values, carrying out weighted summation on all words of the answer text segment so as to update words in the question text segment relative to the set of relation values; or according to the mapped set of relationship values, carrying out weighted summation on all words in the question text segment so as to update words in the answer text segment relative to the set of relationship values; all words in all question-answer pairs are traversed in a loop to update all words in all relative question text segments or answer text segments.

Further, mapping the relationship value according to the proportion of the relationship value corresponding to each word specifically includes: the set of relationship values is processed by softmax mapping such that the set of relationship values falls between 0 and 1 and sums to 1.

Further, after the step of inputting all the question text segments and answer text segments into the trained model, encoding each group of the question text segments and answer text segments into a matrix, wherein words in the question text segments and answer text segments are represented by rows or columns in the array, and before combining all the question text segments with each answer text segment to form a question-answer pair, the method further comprises: in the trained model, a front item hidden layer and a back item hidden layer of each word are obtained through a bidirectional LSTM model, and the corresponding words in the question text segment and the answer text segment are replaced through the spliced front item hidden layer and back item hidden layer.

Further, the splicing all the updated matrices and performing linear transformation to determine the probability of meeting the preset emotion types specifically includes: splicing all the updated question-answer pairs corresponding to the matrixes to obtain spliced matrixes, wherein vectors corresponding to the words are sequentially arranged in the spliced matrixes; obtaining an emotion matrix with the number of rows or columns corresponding to the emotion types by multiplying a preset first matrix by the splicing matrix; adding a preset second matrix to the emotion matrix to offset and update the emotion matrix, wherein the number of rows and columns of the second matrix are adaptive to the first matrix; adding all the in-line elements according to the corresponding relation between each line of the emotion matrix and the emotion types to determine the probability of the corresponding emotion; or according to the corresponding relation between each column of the emotion matrix and the emotion types, adding all the elements in the columns to determine the probability of the corresponding emotion.

Further, the training method of the model specifically includes: inputting a plurality of question text segments, answer text segments and corresponding emotion classifications into an initial model; determining a front item hidden layer and a back item hidden layer of each word in the question text segment and the answer text segment through a bidirectional LSTM model layer, and splicing the front item hidden layers and the back item hidden layers to replace the words; combining all the question text segments which finish the word replacement with any answer text segment in pairs to form question-answer pairs; updating vectors representing all representative words of a question text matrix or an answer text matrix according to the correlation between the words in the question text segment and the words in the answer text segment in the question-answer pair through an attention mechanism layer; splicing all the updated matrixes, and performing linear transformation to determine the probability of meeting the preset emotion types; taking the emotion classification as a true value, and calculating the cross entropy loss of the probability of the emotion type; and iteratively updating the initial model according to the cross entropy loss until the cross entropy loss is converged, and taking the updated initial model as the trained model.

Further, the step of encoding each group of question text segments and answer text segments into a matrix includes: processing each word of each group of the question text segment and the answer text segment into a low-dimensional vector through embedding; combining vectors of words in a question text segment or an answer text segment into a matrix expressing the question text segment or the answer text segment.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

an emotion determining apparatus for multiple rounds of question answering, comprising:

the segmentation module is used for receiving the question and answer text and segmenting the question and answer text into at least two question text segments and at least two answer text segments;

the coding module is used for inputting all the question text segments and the answer text segments into a trained model and respectively coding each group of the question text segments and the answer text segments into a matrix, wherein words in the question text segments and the answer text segments are represented by lines or columns in the array;

the combined module is used for combining all the coded question text segments with each answer text segment pairwise to form question-answer pairs;

the updating module is used for updating the words in the question text matrix or the answer text matrix according to the correlation between the words in the question text segment and the words in the answer text segment in the question-answer pair;

the probability module is used for splicing all the updated question text matrixes and answer text matrixes into a spliced matrix, converting rows or columns of the spliced matrix into a number which is adaptive to the number of the preset emotion types through linear transformation, and determining the probability of meeting one preset emotion type through each row or each column of the spliced matrix; and

and the selection module is used for taking the emotion type with the highest probability as the emotion of the respondent and outputting the emotion of the respondent.

A computer device comprising a memory having a computer program stored therein and a processor implementing the steps of a method for emotion determination for multiple question and answer rounds as described above when the computer program is executed.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of a method for emotion determination in multiple question answering rounds as described above.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: the text is segmented according to the question and answer, the question text segment and the answer text segment are distinguished, and the content of all the question text segments and the content of all the answer text segments are updated for many times according to the original data according to the relevance of the words in the question text segment and the answer text segment in the text and the relevance of the words and the question text segment in the question and answer text. The updated data are spliced and subjected to linear transformation to respectively determine the possibility that the user is in the corresponding emotion in a plurality of dimensions, and the current emotion of the user is determined according to the dimension with the maximum possibility. The scheme can improve the emotion recognition efficiency of interviewers.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is a flow diagram of one embodiment of a method for emotion determination in multiple rounds of question answering according to the present application;

FIG. 2 is a flowchart of one embodiment of step S500 of FIG. 1;

FIG. 3 is a flowchart of one embodiment of step S600 of FIG. 1;

FIG. 4 is a flow diagram of one embodiment of model training in a multi-turn question-and-answer emotion determination method of the application;

FIG. 5 is a schematic diagram of an embodiment of an emotion determining device for multiple rounds of question answering according to the application;

FIG. 6 is a schematic block diagram of one embodiment of the 500 module shown in FIG. 5;

FIG. 7 is a schematic diagram of an embodiment of an emotion determining device for multiple rounds of question answering according to the application;

FIG. 8 is a schematic block diagram of one embodiment of the 800 module shown in FIG. 7;

FIG. 9 is a schematic block diagram of one embodiment of a computer device according to the present application.

Reference numerals:

100-segmentation module, 200-coding module, 300-word adjustment module, 400-combination module, 500-update module, 501-relation value sub-module, 502-mapping sub-module, 503-update sub-module, 600-probability module, 700-selection module, 800-model training module, 801-information input sub-module, 802-context processing sub-module, 803-combination text sub-module, 804-update replacement sub-module, 805-probability analysis sub-module, 806-parameter acquisition sub-module.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

Referring to fig. 1, a flow diagram of one embodiment of a method for emotion determination for multiple rounds of question answering according to the present application is shown. The emotion determining method for the multiple rounds of question answering comprises the following steps:

step S100: receiving a question and answer text, and dividing the question and answer text into at least two question text segments and at least two answer text segments.

In the existing scheme, a question and answer text is input as a whole and data learning and judgment are carried out through a model, the data processing capacity is large, on the basis, the question and answer is taken as two kinds of data which are respectively output from an interview system and a user, an interactive relation exists between the two data, the question and answer text of the whole section is divided into a plurality of question text sections and a plurality of answer text sections according to questions output by the system to the user and answers carried out by the user aiming at the questions, wherein one question text section corresponds to a group of questions sent by the system, and one answer text section corresponds to one answer returned by the user. The questioning content and the answering content can be separated, and the question or the answer is split according to the content, so that the relevance relationship between different questions and different answers can be processed subsequently.

Step S200: inputting all the question text segments and the answer text segments into a trained model, and respectively coding each group of the question text segments and the answer text segments into a matrix, wherein words in the question text segments and the answer text segments are represented by lines or columns in the array.

Specifically, a computer cannot directly process and process natural language, in order to process natural language, basic units such as words in natural language can be represented by digital coding of natural language, the words can be represented by a group of numerical values in the coding process, in the embodiment provided by the application, the words are represented by a group of vectors, which is beneficial to determining the relevance between subsequent words, and in the embodiment, each word of each group of question text segments and answer text segments is processed into a low-dimensional vector by embedding; combining vectors of words in a question text segment or an answer text segment into a matrix expressing the question text segment or the answer text segment.

The method determines the probability that a user belongs to any emotion type through judging the correlation between a question text segment and an answer text segment, before that, data needs to be standardized to form a certain format and code, so that subsequent operation can be carried out, and therefore the question text segment and the answer text segment need to be coded.

Step S400: and combining all the coded question text segments with each answer text segment to form a question-answer pair.

The relevance between the question text segment and the answer text segment is embodied in the synthesis of the relevance between the question text segment and any other answer text segment, the question text segment and each answer are respectively combined into question-answer pairs in the process of asking multiple rounds of question-answer texts, and the relevance between the question text segment and the answer text segment in each question-answer pair is respectively judged in the subsequent process.

Step S500: and updating the words in the presentation question text matrix or the answer text matrix according to the correlation between the words in the question text segment and the words in the answer text segment in the question-answer pair.

In each question-answer pair, the corresponding columns of the words in the matrix are adjusted in a weighted mode according to the relevance of the words in the question text segment and each word in the answer text segment in each question-answer pair and the relevance of the words in the answer text segment and each word in the question text segment respectively, so as to update all vectors representing the words in the matrix of the question text segment and the answer text segment.

Specifically, a group of related values are respectively generated according to the relation between one word in the question text and each word in the corresponding answer text, the related values are superposed to obtain the relation value between the word and the corresponding answer text, and the relation value between each word and the corresponding answer text is used as a weight to be multiplied by the vector corresponding to the word so as to adjust the vectors representing all the words in the question text. Similarly, the related value of each word in the answer text and the corresponding question text is obtained as the weight, and the vectors of all the representative words in the answer text are adjusted.

Step S600: and splicing all the updated question text matrixes and answer text matrixes into a spliced matrix, converting rows or columns of the spliced matrix into a number which is adaptive to the number of the preset emotion types through linear transformation, and determining the probability of meeting one preset emotion type through each row or each column of the spliced matrix.

By updating the matrix corresponding to the question text and the answer text for all question and answer pairs, a plurality of question text segment matrixes reflecting the relation with the answer text segment and a plurality of answer text segment matrixes reflecting the relation with the question text segment can be obtained, wherein each group of question text segments in the original text has a plurality of matrixes corresponding to a plurality of answer texts, each group of answer text segments also has a plurality of matrixes corresponding to a plurality of question texts, the matrixes are spliced to form a set of complete data for analyzing the emotion of the user, the complete set of data can fully reflect the relation between each question text segment and each answer text segment, and the attention can be focused on key words. And performing linear transformation on the group of matrixes according to the parameters obtained by training, so that the dimensionality of the transformed matrixes corresponds to the number of the preset emotion types, and determining the probability that the received question and answer text belongs to the preset emotion types according to the corresponding row or column in the matrix where the emotion of each preset emotion type is located.

Step S700: and taking the emotion class with the highest probability as the emotion of the respondent and outputting the emotion of the respondent.

And determining the emotion type with the maximum probability as the emotion classification of the user according to the probability of the matrix reaction conforming to the preset emotion type after the linear transformation.

In an embodiment, after the emotion type with the highest probability is taken as the emotion of the responder, the question-answering text and the emotion of the responder are correspondingly stored in the block chain network nodes, and data information is shared among different platforms through block chain storage, so that data can be prevented from being tampered.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

Wherein the method steps of steps S200 to S700 are implemented in the trained model. According to the scheme, the questions and the answers are extracted according to the paragraphs respectively, the data are updated according to the questions, all the answers, the relevance between the answers and all the questions, emotion judgment is carried out, all text data are considered, the relevance between the questions and the answers is considered, compared with direct learning of the whole paragraph, the calculation amount is smaller, and the emotion confirmation precision is higher. The scheme can improve the emotion recognition efficiency of interviewers.

Further, as shown in fig. 2, the updating represents all data in the matrix according to the correlation between the words in the question text segment and the words in the answer text segment in the question-answer pair; the method specifically comprises the following steps:

step S501: respectively performing dot product on the words of the question text segment in one question-answer pair and each word in the opposite answer text segment through an attention mechanism to obtain a group of relationship values of each word in the question text segment and each word in the opposite answer text segment; or respectively dot-product the words of one answer text segment with each word in the opposite question text segment to obtain a set of relationship values of each word in the answer text segment and each word in the corresponding question text segment.

And multiplying the vectors corresponding to the words by the vectors of all the words in the corresponding question text segment or answer text segment to obtain a group of numerical values, wherein the numerical values reflect the relationship between the words and all the words of the question text or answer text, and are the relationship values.

Step S502: mapping the relation value according to the proportion of the relation value corresponding to each word; .

The relationship values are scaled by a set of mappings to values between 0 and 1 and the sum of the relationship values of a word and the corresponding word in the question text segment or answer text segment is 1. Therefore, the relevance between all words and the words in the corresponding question texts and answer texts can be accurately and effectively determined, wherein softmax is a probability that a classification algorithm maps the conveying value of a group of classifications to 0-1, and the total sum of the probabilities is 1, so that each group of classifications can obtain the opportunity of obtaining the weight.

Step S503: according to the mapped set of relation values, carrying out weighted summation on all words of the answer text segment so as to update words in the question text segment relative to the set of relation values; or weighted and summing all the words in the question text segment according to the mapped set of relation values so as to update the words in the answer text segment relative to the set of relation values.

Specifically, in one embodiment, a word q1 is extracted from the question text, and the answer text comprises a plurality of words (a1, a2, a3, a4, a5, a 6)

The word q1 in the quiz text and each word in the answer text are dot-product separately

Obtaining S1, S2, S3, S4, S5 and S6 which are correlation values, and corresponding to a1 … … a6 one to one

Then S1a1+ S2a2+ S3a3+ S4a4+ S5a5+ S6a6 ═ q1 ', q 1' is used to replace q1

And weighting and summing all the words in the answer text segment and the question text segment by using the relation value as a weight to determine the relevance of the word to all the words in the corresponding answer text segment and question text segment, and updating the word according to the relevance, wherein the attention of the answer text and the question text is reflected. And after weighting and summing the words according to the relation between the words in the answer text and each word in the question text, obtaining a vector, wherein the length of the vector is consistent with that of the vector representing the words, and the original vector representing the words is replaced by the vector.

Step S504: all words in all question-answer pairs are traversed in a loop to update all words in all relative question text segments or answer text segments.

And performing the operation on all the words in all the question-answer pairs to update all the words and determine the attention point of the whole text. According to the scheme, word data can be adjusted and updated according to the relevance of the words to the question and answer text in the question and answer pairs, the relevance of the question and answer text to interview discussion is improved, and the emotion judgment efficiency is improved.

On one hand, the correlation value is obtained after the correlation value is superposed and represents the correlation between one word in the question text and one word in the answer text, and the correlation value is used as a weight to weight the words in the question text so as to embody the correlation between the word and the answer text. Similarly, the words in the question and answer text are weighted according to the correlation between the words in the answer text and the question text, so that the starlight property between the words in the answer text and the question text is embodied.

Further, after the step of inputting all the question text segments and answer text segments into the trained model, encoding each group of the question text segments and answer text segments into a matrix, wherein words in the question text segments and answer text segments are represented by rows or columns in the array, and before combining all the question text segments with each answer text segment to form a question-answer pair, the method further comprises:

step S300: in the trained model, a front item hidden layer and a back item hidden layer of each word are obtained through a bidirectional LSTM model, and the corresponding words in the question text segment and the answer text segment are replaced through the spliced front item hidden layer and back item hidden layer.

Specifically, all updated question-answer pairs are spliced to obtain a spliced matrix, wherein vectors corresponding to words are sequentially arranged in the spliced matrix;

in the process of emotion judgment through the question text segment and the answer text segment, the context relation in the corresponding matrix cannot be considered, the matrixes in the question text segment and the answer text segment are correspondingly processed through the LSTM model, part of interference information can be forgotten in the vector corresponding to each word, and part of useful information is added, so that the influence caused by other vectors in the matrix can be recorded by the vector corresponding to the word.

Compared with the general cyclic neural network, in the process of processing semantic problems, the LSTM can carry out semantic recognition by combining semantic contents of contexts, particularly on the semantic recognition of long texts, the correlation between questions before and after can be mined to control the semantic recognition precision, and other neural networks can generate errors due to the fact that the contexts cannot be associated in the process of processing texts and voices, which is more obvious for the use environment of continuous question answering. The bidirectional LSTM model used in the embodiment of the application can output the antecedent hidden layer and the consequent hidden layer corresponding to each word, and the outputted antecedent hidden layer and the outputted consequent hidden layer are spliced to replace the original text for subsequent processing, so that the relation between the word and the context can be further considered. The scheme can record word data by combining the context of the question text or the answer text, and improves the accuracy of judging the emotion of the interviewer through the words.

The trained model comprises another layer, namely an attention mechanism layer, and is used for updating a vector corresponding to each word in the question text and the answer text according to attention between the question text and the answer text.

Further, as shown in fig. 3, step S600 splices all the updated question text matrices and answer text matrices into a spliced matrix, and converts rows or columns of the spliced matrix into a number corresponding to a preset emotion category through linear transformation, so as to determine a probability of meeting a preset emotion category through each row or each column of the spliced matrix, which specifically includes:

step S601 splices all the updated question-answer pairs corresponding to the matrices to obtain a spliced matrix, wherein vectors corresponding to words are sequentially arranged in the spliced matrix.

In this embodiment, after each word is processed by softmax mapping, each word is represented by a low-dimensional vector, the question text and the answer text form a matrix by a plurality of vectors corresponding to the word, and the matrices processed by the LSTM model and the attention mechanism are concatenated to form a matrix with the same number of rows or columns as the vector dimension corresponding to the word, that is, the matrix corresponding to each word occupies a positive row or an entire column.

Step S602 obtains an emotion matrix with a number of rows or columns corresponding to the emotion type by multiplying a preset first matrix by the mosaic matrix.

In the process of linear transformation, the value range of the matrix is reduced by multiplying the first matrix and the splicing matrix, wherein in one embodiment, in order to reduce the size of the matrix after multiplication as much as possible, the first matrix only comprises a vector with one dimension corresponding to the emotion type, and each row or each column of the finally obtained emotion matrix corresponds to a preset emotion.

Step S603 adds a preset second matrix to the emotion matrix to offset and update the emotion matrix, where the number of rows and columns of the second matrix are adapted to the first matrix.

The emotion matrix is shifted by adding the second matrix and the emotion matrix, and the emotion matrix is adjusted, so that the emotion matrix is more smooth.

Step S604, according to the corresponding relation between each row of the emotion matrix and the emotion types, adding all the elements in the rows to determine the probability of the corresponding emotion; or according to the corresponding relation between each column of the emotion matrix and the emotion types, adding all the elements in the columns to determine the probability of the corresponding emotion.

The preset emotion types correspond to one row or one column in the emotion matrix, and the probability that the question and answer text belongs to the corresponding emotion is obtained by adding elements in the row or the column in the corresponding emotion. According to the scheme, the correlation between the question text and the answer text in the question-answer text can be integrated, and the accuracy of determining the emotion probability is improved.

Further, as shown in fig. 4, the training method of the model specifically includes:

step Sa: a number of question text segments, answer text segments and corresponding emotion classifications are entered into the initial model.

The training process of the model includes inputting a plurality of groups of text segments and emotion classified data, training the model through a group of question texts, answer texts and emotion types which are prepared in advance, wherein the emotion types corresponding to the question texts and the answer texts are known, inputting the question texts and the answer texts into the model for operation to obtain the emotion types obtained through calculation, comparing the emotion types with the known emotion types, adjusting parameters in the model, calculating the question texts and the answer texts again, and comparing the emotion types obtained through calculation with the known emotion types. And the model can be put into practical operation and use until the output result of the model is consistent with the actual situation. In the training process, a plurality of groups of combinations of the question text, the answer text and the known emotion types are preset, and the model is trained through a plurality of groups of data, so that the parameters in the model tend to be reasonable, and the training precision of the model is ensured.

And Sb: determining a front item hidden layer and a back item hidden layer of each word in the question text segment and the answer text segment through a two-way LSTM (Long Short-Term Memory network) model layer, and splicing the front item hidden layer and the back item hidden layer to replace the word.

The LSTM is a layer of a model used in a real-time example of the application, is a nonlinear time-cycle neural network, and compared with a general cycle neural network, in the process of processing semantic problems, the LSTM can perform semantic recognition by combining semantic contents of contexts, particularly on the semantic recognition of long texts, can mine the correlation between a front question and a back question to control the precision of the semantic recognition, and in the process of processing texts and voices, other neural networks cannot correlate the contexts, so that errors can be generated, which is more obvious for the use environment of continuous question answering. The bidirectional LSTM model used in the embodiment of the application can output the antecedent hidden layer and the consequent hidden layer corresponding to each word, and the outputted antecedent hidden layer and the outputted consequent hidden layer are spliced to replace the original text for subsequent processing, so that the relation between the word and the context can be further considered.

Step Sc: and combining all the question text segments which finish the word replacement with any answer text segment pairwise to form question-answer pairs.

And respectively combining the question text segment and each answer in pairs to form question-answer pairs, and respectively carrying out relevance judgment on the question text segment and the answer text segment in each question-answer pair in the subsequent process.

Step Sd: by means of the attention mechanism layer, the vectors representing all the representative words in the matrix are updated according to the correlation between the words in the quiz text segment and the words in the answer text segment pairs in the question-answer pair.

Attention is a mechanism (Attention mechanism) aimed at combining correlations between various parts in data and other parts in the whole to determine the focus of Attention on data analysis. The specific operation idea is to determine the correlation between one part and other parts in the whole data (text, voice or image), and express the correlation between the part of data and other parts of data by weight, and then weight the part of data to extract the most "noticeable" part of the part of data, and there are many specific ways to actually implement this mechanism.

For the scheme, in each question-answer pair, according to the relevance of the words in the question text segment and each word in the answer text segment in each question-answer pair and the relevance of the words in the answer text segment and each word in the question text segment, the corresponding columns of the words in the matrix are adjusted in a weighted mode so as to update the vectors representing all the representative words in the matrix of the question text segment and the answer text segment. Specifically, after the above operations, the question text segment and the answer text segment in each question-answer pair are represented by a matrix, where the two matrices have the same number of columns or rows, and if the number of columns is the same, each row in the matrix represents a word in the question text and the answer text (specifically, represented by a bidirectional LSTM, actually a front hidden layer and a back hidden layer of each word), and vice versa. Performing dot product on a vector representing one word in the question text and a vector representing each word in the answer text, if the answer text has n words, then obtaining n values, wherein the n values represent the correlation between one word in the question text and each word in the answer text, and considering the operational unification between the words and between the text and the text, the embodiment maps the values into a value with the sum equal to 1 through softmax mapping.

Then, according to the above-mentioned n values as weights, respectively weighting the words in the corresponding answer texts, then adding the weighted n vectors to obtain a new vector, and replacing the above-mentioned "vector representing one word in the question text" with this vector. Therefore, the method completes the replacement of all data in the matrix by updating the vector representing one word in the representation matrix through the correlation between the question text and the answer text and performing the replacement on all the vectors representing one word in the question text and the answer text.

Step Se: and splicing all the updated matrixes, and performing linear transformation to determine the probability of meeting any emotion types.

By updating the matrixes corresponding to the question texts and the answer texts of all question and answer pairs, a plurality of question text segment matrixes reflecting the relation with the answer text segments and a plurality of answer text segment matrixes reflecting the relation with the question text segments can be obtained, wherein each group of question text segments in the original text correspond to a plurality of matrixes, each group of answer text segments also correspond to a plurality of matrixes, the matrixes are spliced to form a set of complete data for analyzing the emotion of the user, the complete set of data can fully reflect the relation between each question text segment and each answer text segment, and the attention can be focused on words. And on the basis, the spliced matrix is converted into dimensionality according with the number of preset emotion types through linear transformation. The linear transformation method for the splicing matrix is to multiply a preset matrix, the dimensionality of the matrix corresponds to the number of emotion types in the emotion classification, and then a preset offset is added to obtain a matrix which finally reflects the emotion types.

Specifically, each word represented in the matrix is represented by a vector, so that the matrixes corresponding to the question text and the answer text formed by the vectors have the same number of rows or columns, the two groups of matrixes are combined into one matrix, the matrix corresponding to the question text and the matrix corresponding to the answer text are combined into a complete matrix by using the vector representing the word as a row or a column.

Because the complete matrix is very large, if the subsequent calculation is directly performed, the operation amount is large, and in order to not influence the precision of the subsequent calculation and reduce the operation amount of the subsequent calculation, the linear transformation needs to be performed on the complete linear matrix. Specifically, the size of the emotion recognition device is greatly reduced by multiplying a first preset matrix with the rank number corresponding to the number of emotion types, and then the emotion recognition device is subjected to offset addition with a second preset matrix, wherein the second matrix is used as an offset, and the number of rows and columns of the second matrix are matched with the first preset matrix. The first matrix and the second matrix are preset parameters of the model and are determined through pre-training. After one round of training process, the first matrix and the second matrix are adjusted, and the training process is iterated until a group of emotion types corresponding to the question text segment and the answer text segment can be accurately reflected through linear transformation of the first matrix and the second matrix on the splicing matrix.

Step Sf: and (4) taking the emotion classification as a true value, and calculating the cross entropy loss of the probability of the emotion type.

The splicing matrix after linear transformation can reflect the judgment of emotion in one round of training, the true value of emotion classification corresponding to the text is 1, the true values of other emotions are 0, the loss of cross entropy is calculated to determine the judgment accuracy degree of the round, and the adjustment of parameters is determined according to the loss value of the cross entropy.

Please refer to the above formula, wherein p (x) is the true value of the text corresponding to an emotion, and q (x) is the probability that the model predicts the text as p (x). Since the preset emotions include a plurality of groups, X is a set of all preset emotion categories, and X belongs to one of the set of emotion categories. The specific calculation method of the cross entropy loss comprises the following steps: determining according to the real value of each emotion corresponding to the text and the corresponding prediction probability: the cross entropy loss of the text in the emotion judgment is obtained by adding the cross entropy losses of the multiple emotion judgments to determine the cross entropy loss of the model in the emotion judgment.

Step Sg: and iteratively updating the initial model according to the cross entropy loss until the cross entropy loss is converged, and taking the updated initial model as the trained model.

And adjusting preset parameters in the model according to the cross entropy loss value, wherein the preset parameters comprise parameter values in the LSTM model, a preset matrix multiplied by the splicing matrix and added offset, and then performing next round of training until the cross entropy loss value is converged. Under an ideal state, when the cross entropy loss is infinitely close to 0, the emotion judgment of the model is most accurate, after the parameters summarized by the model are adjusted, the cross entropy of the model approaches to a true value in the process of judging the emotion type, namely the cross entropy loss approaches to zero, and after multiple rounds of training, the cross entropy loss is not reduced any more, and at the moment, the cross entropy loss is determined to be converged, and the current model is trained to the maximum precision capable of being provided by the model. The scheme can be used for training according to the relation between the words and the context and the relation between the question text and the answer text, and the precision of the model parameters is high.

Further, the emotion classification comprises a plurality of preset emotion types, wherein the probability of one emotion type is set to be one hundred percent, and the probabilities of the rest emotion types are set to be zero percent. The scheme can be used for training the model by combining multiple preset emotions, and parameters in the model are converged to the maximum extent by specifying the probability of one emotion as the probability corresponding to the question text section he answer text section, so that the precision of the model parameters is high.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 5, as an implementation of the method shown in fig. 1, the present application provides an embodiment of an emotion determining apparatus for multiple rounds of question and answer, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 1, and the apparatus may be applied to various electronic devices.

As shown in fig. 4, the emotion determining apparatus for multiple rounds of quiz according to the present embodiment includes: a segmentation module 100, an encoding module 200, a combination module 400, an update module 500, a probability module 600, a selection module 700. Wherein:

the segmentation module 100 is configured to receive a question and answer text, and segment the question and answer text into at least two question text segments and at least two answer text segments.

And the encoding module 200 is configured to input all the question text segments and the answer text segments into the trained model, and encode each group of the question text segments and the answer text segments into a matrix, where words in the question text segments and the answer text segments are represented by rows or columns in the array.

And the combination module 400 is used for combining all the coded question text segments with each answer text segment to form a question-answer pair.

And the updating module 500 is used for updating the words in the presentation question text matrix according to the correlation between the words in the question text segment of the question-answer pair and the words in the answer text segment pair.

And a probability module 600, configured to splice all the updated question text matrices and answer text matrices into a spliced matrix, and convert rows or columns of the spliced matrix into a number corresponding to a preset emotion category through linear transformation, so as to determine a probability that a preset emotion category is met through each row or each column of the spliced matrix. And

and the selection module 700 is used for taking the emotion with the highest probability as the emotion of the respondent and outputting the emotion of the respondent.

Specifically, the text is segmented according to the question and answer through the segmentation module, the question text segment and the answer text segment are distinguished, and the contents of all the question text segments and the answer text segments are updated for many times through the updating module according to the original data according to the relevance of the words in the question text segment and the answer text segment in the text and the relevance of the words and the question text segment in the question and answer text. The updated data are spliced through the probability module and subjected to linear transformation to respectively determine the possibility that the user is in the corresponding emotion in multiple dimensions, and the current emotion of the user is determined through the selection module according to the dimension with the maximum possibility. The scheme can improve the emotion recognition efficiency of interviewers.

Further, as shown in fig. 6, the update module 500 further specifically includes:

the relationship value submodule 501 is configured to perform dot product on the words in the question text segment in a question-answer pair and each word in the corresponding answer text segment through an attention mechanism, so as to obtain a set of relationship values between each word in the question text segment and each word in the corresponding answer text segment; or respectively dot-product the words of one answer text segment with each word in the opposite question text segment to obtain a set of relationship values of each word in the answer text segment and each word in the corresponding question text segment.

The mapping submodule 502 is configured to map the relationship value according to a proportion of the relationship value corresponding to each word; . And

the updating submodule 503 is configured to perform weighted summation on all the words in the answer text segment according to the mapped set of relationship values, so as to update the words in the question text segment, which are opposite to the set of relationship values; or according to the mapped set of relationship values, carrying out weighted summation on all words in the question text segment so as to update words in the answer text segment relative to the set of relationship values; and also for looping through all words in all question-answer pairs to update all words in all relative question or answer text segments.

According to the scheme, word data can be adjusted and updated according to the relevance of the words to the question and answer text in the question and answer pairs, the relevance of the question and answer text to interview discussion is improved, and the emotion judgment efficiency is improved.

Further, the emotion determining apparatus for multiple rounds of question answering further includes: a word adjustment module 300 configured to: in the trained model, a front item hidden layer and a back item hidden layer of each word are obtained through a bidirectional LSTM model, and the corresponding words in the question text segment and the answer text segment are replaced. In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 6, fig. 6 is a block diagram of a basic structure of a computer device according to the present embodiment.

The scheme can record word data by combining the context of the question text or the answer text, and improves the accuracy of judging the emotion of the interviewer through the words.

Further, as shown in fig. 7 and 8, the emotion determining apparatus for multiple rounds of quiz includes: the model training module 800, and specifically includes:

the information input sub-module 801 is used for inputting a plurality of question text segments, answer text segments and corresponding emotion classifications into the initial model.

The context processing sub-module 802 determines a front hidden layer and a back hidden layer of each word in the question text segment and the answer text segment by using the remaining light through a two-way LSTM (Long Short-Term Memory network) model, and replaces the word by splicing the front hidden layer and the back hidden layer.

And the combined text submodule 803 is used for combining all the question text segments which finish the word replacement with any answer text segment pairwise to form question-answer pairs.

An update replacement sub-module 804 for updating, by the attention mechanism layer, vectors representing all the representative words in the matrix according to the correlation between the words in the question text segment and the words in the answer text segment pair in the question-answer pair.

And the probability analysis submodule 805 is used for splicing all the updated matrixes and performing linear transformation to determine the probability of meeting any emotion type. And

a parameter obtaining submodule 806, configured to calculate cross entropy loss of the probability of the emotion type by using the emotion classification as a true value; and according to the cross entropy loss, carrying out iterative updating on the initial model until the cross entropy loss is converged, and taking the updated initial model as the trained model.

The scheme can be used for training according to the relation between the words and the context and the relation between the question text and the answer text, and the precision of the model parameters is high.

Referring to fig. 9, in order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

a computer device comprising a memory having a computer program stored therein and a processor implementing the steps of a method for emotion determination for multiple question and answer rounds as described above when the computer program is executed. The scheme can improve the emotion recognition efficiency of interviewers.

The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only a computer device 6 having components 61-63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal storage unit of the computer device 6 and an external storage device thereof. In this embodiment, the memory 61 is generally used for storing an operating system installed in the computer device 6 and various types of application software, such as program codes of a mood determination method for multiple rounds of question and answer. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to run the program code stored in the memory 61 or process data, for example, the program code of the emotion determining method for multiple question answering rounds.

The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

The present application further provides another embodiment, which is to provide a computer readable storage medium storing a multi-turn question-and-answer emotion determination method program, which is executable by at least one processor to cause the at least one processor to perform the steps of the multi-turn question-and-answer emotion determination method as described above. The scheme can improve the emotion recognition efficiency of interviewers.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method for determining emotion of multi-turn question answering is characterized by comprising the following steps:

receiving a question and answer text, and dividing the question and answer text into at least two question text segments and at least two answer text segments;

inputting all the question text segments and answer text segments into a trained model, and respectively coding each group of the question text segments and answer text segments into a matrix, wherein words in the question text segments and the answer text segments are represented by lines or columns in the array;

combining all the coded question text segments with each answer text segment in pairs to form question-answer pairs;

updating the words in the question text matrix or the answer text matrix according to the correlation between the words in the question text segment and the words in the answer text segment in the question-answer pair;

splicing all the updated question text matrixes and answer text matrixes into a spliced matrix, converting rows or columns of the spliced matrix into a number which is adaptive to a preset emotion type through linear transformation, and determining the probability of meeting one preset emotion type through each row or each column of the spliced matrix;

and taking the emotion class with the highest probability as the emotion of the respondent and outputting the emotion of the respondent.

2. The emotion determining method for multiple rounds of quiz according to claim 1, wherein the step of updating the words in the quiz text segment or the answer text matrix representing the quiz text matrix according to the relevance between the words in the quiz text segment and the words in the answer text segment in the quiz pair specifically comprises:

respectively performing dot product on the words of the question text segment in one question-answer pair and each word in the opposite answer text segment through an attention mechanism to obtain a group of relationship values of each word in the question text segment and each word in the opposite answer text segment; or dot product the words of one answer text segment with each word in the corresponding question text segment to obtain a group of relationship values of each word in the answer text segment and each word in the corresponding question text segment;

mapping the relation value according to the proportion of the relation value corresponding to each word;

according to the mapped set of relation values, carrying out weighted summation on all words of the answer text segment so as to update words in the question text segment relative to the set of relation values; or according to the mapped set of relationship values, carrying out weighted summation on all words in the question text segment so as to update words in the answer text segment relative to the set of relationship values;

all words in all question-answer pairs are traversed in a loop to update all words in all relative question text segments or answer text segments.

3. The method for determining emotion of multiple rounds of quizzes and answers as claimed in claim 1, wherein after said selecting the emotion category with the highest probability as the emotion of the respondent, further comprising: storing the question-answer text and the respondent emotion into a blockchain network node.

4. The emotion determining method for multiple rounds of quizzes and answers as claimed in claim 1, wherein after the step of inputting all the quiz text segments and answer text segments into the trained model, each set of the quiz text segments and answer text segments is encoded as a matrix, respectively, wherein words in the quiz text segments and answer text segments are represented by rows or columns in the array, and before the step of combining all the quiz text segments with each answer text segment to form quiz pairs, the method further comprises:

in the trained model, a front item hidden layer and a back item hidden layer of each word are obtained through a bidirectional LSTM model, and the corresponding words in the question text segment and the answer text segment are replaced through the spliced front item hidden layer and back item hidden layer.

5. The method for determining emotion of multiple rounds of questions and answers as claimed in claim 1, wherein said concatenating all the updated question text matrices and answer text matrices into a concatenated matrix, and converting rows or columns of the concatenated matrix into a number corresponding to a preset emotion category through linear transformation, so as to determine a probability of meeting a preset emotion category through each row or each column of the concatenated matrix, specifically comprising:

splicing all the updated question-answer pairs corresponding to the matrixes to obtain spliced matrixes, wherein vectors corresponding to the words are sequentially arranged in the spliced matrixes;

obtaining an emotion matrix with the number of rows or columns corresponding to the emotion types by multiplying a preset first matrix by the splicing matrix;

adding a preset second matrix to the emotion matrix to offset and update the emotion matrix, wherein the number of rows and columns of the second matrix are adaptive to the first matrix;

adding all the in-line elements according to the corresponding relation between each line of the emotion matrix and the emotion types to determine the probability of the corresponding emotion; or according to the corresponding relation between each column of the emotion matrix and the emotion types, adding all the elements in the columns to determine the probability of the corresponding emotion.

6. The method for determining emotion of multiple rounds of questions and answers as claimed in any one of claims 1 to 5, wherein the method for training the model specifically comprises:

inputting a plurality of question text segments, answer text segments and corresponding emotion classifications into an initial model;

determining a front item hidden layer and a back item hidden layer of each word in the question text segment and the answer text segment through a bidirectional LSTM model layer, and splicing the front item hidden layers and the back item hidden layers to replace the words;

combining all the question text segments which finish the word replacement with any answer text segment in pairs to form question-answer pairs;

updating vectors representing all representative words of a question text matrix or an answer text matrix according to the correlation between the words in the question text segment and the words in the answer text segment in the question-answer pair through an attention mechanism layer;

splicing all the updated matrixes, and performing linear transformation to determine the probability of meeting the preset emotion types;

taking the emotion classification as a true value, and calculating the cross entropy loss of the probability of the emotion type;

and iteratively updating the initial model according to the cross entropy loss until the cross entropy loss is converged, and taking the updated initial model as the trained model.

7. The method for determining emotion of multiple rounds of questions and answers as claimed in any one of claims 1 to 5, wherein said step of encoding each set of said question text segments and answer text segments respectively as a matrix specifically comprises:

processing each word of each group of the question text segment and the answer text segment into a low-dimensional vector through embedding;

combining vectors of words in a question text segment or an answer text segment into a matrix expressing the question text segment or the answer text segment.

8. An emotion determining apparatus for multiple rounds of question answering, comprising:

9. A computer device comprising a memory having stored therein a computer program and a processor implementing the steps of the method of emotion determination for multiple rounds of questioning and answering according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the method for emotion determination for multiple rounds of quiz as claimed in any one of claims 1 to 7.