CN113326359A

CN113326359A - Training method and device for dialogue response and response strategy matching model

Info

Publication number: CN113326359A
Application number: CN202010130939.5A
Authority: CN
Inventors: 徐胜全
Original assignee: Zhejiang Dasou Vehicle Software Technology Co Ltd
Current assignee: Zhejiang Dasou Vehicle Software Technology Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2021-08-31

Abstract

The application provides a method and a device for training a dialogue response and response strategy matching model, wherein the method comprises the following steps: vectorizing the dialogue information sent by the consultant in the dialogue; parsing the dialog information vector to determine a state information vector containing multi-dimensional features; inputting the state information vector and the historical dialogue information vector of the historical dialogue information corresponding to the dialogue information in the current round of dialogue into a response strategy matching model; determining a plurality of response strategies corresponding to the characteristic vectors extracted by the response strategy matching model and the confidence degrees corresponding to the response strategies; and taking the response strategy with the highest confidence level in the plurality of response strategies as response information corresponding to the conversation information. Through the technical scheme of the application, the problem of low efficiency caused by identification based on the fixed flow template can be solved, and the accuracy of the matched response information is ensured.

Description

Training method and device for dialogue response and response strategy matching model

Technical Field

The application relates to the technical field of networks, in particular to a method and a device for training a dialogue response and response strategy matching model.

Background

Dialog systems (also called conversation agents) include agents with human-machine interfaces for accessing, processing, managing and communicating information, enabling conversations with people through computer systems that simulate human beings. With the development of electronic technology, the conversation system gradually goes deep into the aspects of social life, and provides convenience for the work and life of people.

In the related art, the received question information sent by the consultant is matched with the response information corresponding to the question information through the dialect process and the dialect template which are manually configured in advance, however, the content of the dialect process or the template which are manually configured in advance is often fixed, so that the matched response content is rigid, even the language understanding is lacked, and the accuracy of the response information cannot be ensured; and under the condition of lacking template configuration, the inquiry of the consultant cannot be replied, so that the problems of low response efficiency and the like are caused.

Disclosure of Invention

In view of the above, the present application provides a method and an apparatus for training a dialog response and a response strategy matching model to solve the problems in the related art.

In order to achieve the above purpose, the present application provides the following technical solutions:

according to a first aspect of the present application, a dialog response method is presented, the method comprising:

in the process of carrying out the current round of conversation with a consultation party, vectorizing the conversation information sent by the consultation party in the current conversation to obtain a conversation information vector corresponding to the conversation information;

parsing the dialog information vector to determine a state information vector containing multidimensional features;

inputting the state information vector and a historical dialogue information vector of historical dialogue information corresponding to the dialogue information in the current round of dialogue into a response strategy matching model, wherein the response strategy matching model is trained and completed by adopting a dialogue information sample containing response strategy marking information and the historical dialogue information corresponding to the dialogue information sample in the same round of dialogue in advance;

determining a plurality of response strategies corresponding to the characteristic vectors extracted by the response strategy matching model and the confidence degrees corresponding to the response strategies;

and taking the response strategy with the highest confidence level in the plurality of response strategies as the response information corresponding to the dialogue information.

Optionally, the vectorizing the session information sent by the counselor in the current session to obtain a session information vector corresponding to the session information includes:

performing word segmentation processing on the conversation information sent by a consultation party in the conversation to determine a plurality of non-overlapping words after the conversation information is dictionary-like;

determining word vectors corresponding to the non-overlapping words according to a preset WordEmbedding matrix, so as to determine the word vectors as dialogue information vectors corresponding to the dialogue information.

Optionally, the parsing the dialog information vector to determine a state information vector containing multidimensional features includes:

performing feature extraction on the word vector through a neural network model comprising a BERT optimizer, a bidirectional long-short term memory network and a conditional random field;

and determining a state information vector containing multi-dimensional features corresponding to the dialogue information vector according to the feature vector extracted by the neural network model.

Optionally, the dialog information vector of the current historical dialog includes the dialog information vector.

Optionally, before inputting the state information vector and the dialog information vector of the current round of historical dialog into the response policy matching model, the method further includes:

determining a dialoging template that matches the state information vector;

filling information into word slots in the dialoging template based on the state information vector;

if the filled dialect template has an empty word slot, constructing a question response for clarifying the dialect according to the dialect information corresponding to the empty word slot;

and receiving reply information of the consultant to the question response to add the reply information to the state information vector.

Optionally, the determining a plurality of response policies corresponding to the feature vectors extracted by the response policy matching model and the confidence degrees corresponding to the respective response policies includes:

mapping the feature vectors to a preset number of response strategies, wherein the values of the response strategies comprise feature values of the feature vectors after probability normalization processing, and the probability normalization function is as follows:

wherein, the sigma_i(z) is the characteristic value of the characteristic vector after probability normalization, wherein

The natural constant is the numerical power of the feature vector of the natural constant, and k is the category total number of the conversation strategy;

and determining the characteristic value of the characteristic vector after probability normalization processing as the confidence degree corresponding to the response strategy.

Optionally, the response strategy matching model is a learning model in reinforcement learning or a recurrent neural network model in deep learning.

According to a second aspect of the present application, there is provided a training method of a response strategy matching model for dialog responses, the method comprising:

determining a dialogue information sample set serving as a training sample, wherein the dialogue information sample set comprises dialogue information samples of response strategy marking information and historical dialogue information training corresponding to the dialogue information samples in the same round of dialogue is completed;

vectorizing the dialogue information sample to obtain a dialogue information sample vector corresponding to the dialogue information sample;

parsing the dialog information sample vector to determine a state information vector that includes multi-dimensional features;

inputting the state information vector and the dialogue information vector of the historical dialogue information into a response strategy matching model so as to perform feature extraction on the state information vector and the dialogue information vector of the historical dialogue information by the response strategy matching model;

determining a response strategy prediction information vector corresponding to the dialogue information sample according to the extracted features;

adjusting model parameters of the response policy matching model based on a difference between the response policy prediction information vector and the information vector of the response policy annotation information;

and analyzing the dialogue information input by the consultant according to the trained response strategy matching model, and determining a response strategy matched with the dialogue information so as to take the response strategy as the response information corresponding to the dialogue information.

According to a third aspect of the present application, there is provided a dialogue response apparatus, the apparatus comprising:

the processing unit is used for vectorizing the dialogue information sent by the consultation party in the current dialogue to obtain a dialogue information vector corresponding to the dialogue information in the current dialogue process;

the analysis unit is used for analyzing the dialogue information vector to determine a state information vector containing multi-dimensional features;

the input unit is used for inputting the state information vector and the historical dialogue information vector of the historical dialogue information corresponding to the dialogue information in the current round of dialogue into a response strategy matching model, wherein the response strategy matching model is trained by adopting a dialogue information sample containing response strategy marking information and the historical dialogue information corresponding to the dialogue information sample in the same round of dialogue in advance;

a first determining unit that determines a plurality of response policies corresponding to the feature vectors extracted by the response policy matching model and confidence degrees corresponding to the respective response policies;

and a second determining unit configured to use a response policy with a highest confidence level among the plurality of response policies as the response information corresponding to the session information.

According to a fourth aspect of the present application, a training apparatus for a response strategy matching model for dialog responses is presented, the apparatus comprising:

the device comprises a first determining unit, a second determining unit and a third determining unit, wherein the first determining unit is used for determining a conversation information sample set serving as a training sample, and the conversation information sample set comprises a conversation information sample of response strategy marking information and historical conversation information corresponding to the conversation information sample in the same round of conversation;

the processing unit is used for vectorizing the dialogue information samples to obtain dialogue information sample vectors corresponding to the dialogue information samples;

the analysis unit is used for analyzing the dialogue information sample vector to determine a state information vector containing multi-dimensional features;

the input unit is used for inputting the state information vector and the dialogue information vector of the historical dialogue information into a response strategy matching model so as to extract the characteristics of the state information vector and the dialogue information vector of the historical dialogue information by the response strategy matching model;

the second determining unit is used for determining a response strategy prediction information vector corresponding to the dialogue information sample according to the extracted features;

a parameter adjusting unit that adjusts a model parameter of the response policy matching model based on a difference between the response policy prediction information vector and the information vector of the response policy labeling information;

and the information response unit analyzes the dialogue information input by the consultant according to the trained response strategy matching model, determines a response strategy matched with the dialogue information, and takes the response strategy as the response information corresponding to the dialogue information.

According to a fifth aspect of the present application, there is provided an electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute instructions to implement the method of the first and second aspects.

According to a sixth aspect of the present application, a computer-readable storage medium is proposed, on which computer instructions are stored, which instructions, when executed by a processor, implement the steps of the method according to the first and second aspects.

According to the embodiment, the state information vector of the multidimensional characteristics of the dialog information sent by the consultant in the dialog is determined, the plurality of response strategies corresponding to the state information vector are matched based on the response strategy matching model trained in advance, the response strategy with the highest confidence level in the plurality of response strategies is determined as the response information corresponding to the dialog information, the state information vector corresponding to the dialog information after comprehensive analysis is subjected to characteristic extraction through the response strategy matching model, the problem of low efficiency caused by identification based on the fixed process template is solved, the input information of the response strategy matching model is the state information vector containing the multidimensional characteristic information, and the accuracy of the matched response information is ensured under the condition of prediction according to the fully-mined characteristic information.

Drawings

FIG. 1 is a flow chart of a dialog response method in accordance with an exemplary embodiment of the present application;

FIG. 2 is a flow diagram of a method for training a response strategy matching model for dialog responses in an exemplary embodiment according to the present application;

FIG. 3 is a flow chart of another dialog response method in accordance with an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of a neural network model structure for determining a state information vector in an exemplary embodiment according to the present application;

FIG. 5 is a schematic diagram of a mechanism for adding rewards after a fully connected layer in an exemplary embodiment according to the application;

FIG. 6 is a schematic illustration of a dialog state tracking process in an exemplary embodiment according to the present application;

FIG. 7 is a flow chart of another method for training a response strategy matching model for dialog responses in an exemplary embodiment according to the present application;

FIG. 8 is a schematic block diagram of an electronic device in an exemplary embodiment in accordance with the subject application;

FIG. 9 is a block diagram of a dialogue response apparatus in an example embodiment according to the application;

FIG. 10 is a schematic block diagram of another electronic device in an exemplary embodiment in accordance with the subject application;

FIG. 11 is a block diagram of a training apparatus for response strategy matching models for dialog responses in an exemplary embodiment according to the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Dialog systems (also called conversation agents) include agents with human-machine interfaces for accessing, processing, managing and communicating information, enabling conversations with people through computer systems that simulate human beings. With the development of electronic technology, conversation systems have gradually deepened into the aspects of social life, and convenience is provided for the life of people.

In the related art, the dialog system matches the dialog information sent by the counseling party based on the pre-configured dialogues and dialogues templates, and the pre-configured dialogues and dialogues templates are fixed and less flexible, so that the response information matched with the dialog information is too rigid, and even in the case of lacking of corresponding response responses in the dialogues and dialogues templates, the dialog system is difficult to effectively respond through the answers which do not exist in the pre-configured flow templates, so that the accuracy of the response information matched by the dialog system in the related art for the question information is low, and the response efficiency is poor.

In view of the above, the present application provides a method and an apparatus for training a dialog response and response policy matching model, which can perform comprehensive analysis on dialog information sent by a consultant to determine a state information vector of a multidimensional feature corresponding to the dialog information sent by the consultant, and further determine response information matched with the state information vector and a historical dialog information vector of historical dialog information corresponding to the dialog information in the current round of dialog by using a response policy matching model trained in advance, thereby eliminating technical problems of low accuracy and poor response efficiency caused by crash of the response information, optimizing a matching mode corresponding to the response information in a dialog system, and improving matching efficiency of the response information.

Referring to fig. 1, fig. 1 is a flowchart of a dialog response method according to an exemplary embodiment of the present application, and as shown in fig. 1, the method may include the following steps:

step 101, in the process of the current round of conversation with the counseling party, vectorizing the conversation information sent by the counseling party in the current conversation to obtain a conversation information vector corresponding to the conversation information.

In an embodiment, word segmentation processing may be performed on dialog information sent by a consultant in the dialog to determine a plurality of non-overlapping words obtained after the dialog information is dictionary-based, and then word vectors corresponding to the non-overlapping words are determined according to a preset wordledbedding matrix, so as to determine the word vectors as dialog information vectors corresponding to the dialog information.

Step 102, parsing the dialog information vector to determine a state information vector comprising multidimensional features.

In one embodiment, the word vector may be feature extracted through a neural network model including a BERT optimizer, a bidirectional long-short term memory network, and a conditional random field, and a state information vector including multidimensional features corresponding to the dialogue information vector is determined according to the feature vector extracted by the neural network. Further, the dimensional features involved in the state information vector may be of various types, and in an exemplary but non-limiting embodiment, the features involved in the state information vector may be entity features, intention features, and emotion features, and accordingly, the dialog body, the dialog intention, and the emotion state involved in the dialog information are used as features for identifying the received dialog information.

Step 103, inputting the state information vector and the historical dialogue information vector of the historical dialogue information corresponding to the dialogue information in the current round of dialogue into a response strategy matching model, wherein the response strategy matching model is trained by adopting a dialogue information sample containing response strategy marking information and the historical dialogue information corresponding to the dialogue information sample in the same round of dialogue in advance.

In an embodiment, before the state information vector and the dialog information vector of the current round of historical dialog are input into the response policy matching model, a dialect template matched with the state information vector may be determined, information filling is further performed based on word slots in the state information vector dialect template, and under the condition that the filled dialect template has empty word slots, a question response for clarifying the dialect is constructed according to the dialect information corresponding to the empty word slots, and response information of a consultant on the question response is received, so that the response information is added to the state information vector.

In this embodiment, a state information vector determined by dialog information is used to fill a dialect template corresponding to the state information, a dialect clarification condition is determined through the filled and processed dialect template, a question response for the dialect clarification is constructed according to the dialect information corresponding to an empty word slot in the dialect template, and a reply message of a consultant is received to add the reply message to the state information vector, so that secondary supplement of the dialog information to be responded is realized through a dialect clarification link, and the problem of low response accuracy caused by the loss of characteristic information in the dialog information is avoided.

Further, the dialog information vector of the historical dialog of the current round may include the dialog information vector itself. The response strategy matching model can be a learning model in reinforcement learning or a recurrent neural network model in deep learning.

And 104, determining a plurality of response strategies corresponding to the feature vectors extracted by the response strategy matching model and the confidence degrees corresponding to the response strategies.

In an embodiment, the feature vectors may be mapped to a preset number of response policies, wherein in the mapping process, a feature value of the feature vector after the probability normalization processing may be determined as a value of the response policy, and then the feature value of the feature vector after the probability normalization processing is determined as a confidence corresponding to the response policy. Wherein the probability normalization function for determining the feature value corresponding to the feature vector is

In a function of probability normalization

Is the numerical power of the feature vector of the natural constant, and k is the total number of categories of the conversation strategy.

And 105, taking the response strategy with the highest confidence level in the plurality of response strategies as the response information corresponding to the dialogue information.

According to the embodiment, the state information vector of the multidimensional characteristics of the dialogue information sent by the consultant in the dialogue is determined, the plurality of response strategies corresponding to the state information vector are matched based on the response strategy matching model trained in advance, the response strategy with the highest confidence level in the plurality of response strategies is determined as the response information corresponding to the dialogue information, the state information vector corresponding to the dialogue information after comprehensive analysis is subjected to characteristic extraction through the response strategy matching model, the problem of low efficiency caused by identification based on the fixed process template is solved, the input information of the response strategy matching model is the state information vector containing the multidimensional characteristic information, and the accuracy of the matched response information is ensured under the condition of prediction according to the fully-mined characteristic information.

Fig. 2 is a flowchart of a training method for a response strategy matching model for dialog responses according to an exemplary embodiment of the present application, which may include the following steps, as shown in fig. 2:

step 201, determining a dialog information sample set as a training sample, where the dialog information sample set includes a dialog information sample of response strategy labeling information and historical dialog information corresponding to the dialog information sample in the same round of dialog.

Step 202, performing vectorization processing on the session information samples to obtain a session information sample vector corresponding to the session information samples.

In an embodiment, word segmentation processing may be performed on the session information sample to determine a plurality of non-overlapping words obtained after the session information sample is lexicalized, and then word vectors corresponding to the non-overlapping words are determined according to a preset wordledeboding matrix, so as to complete vectorization processing on the session information sample, and obtain a session information sample vector corresponding to the session information sample. In this embodiment, the preset wordlebellding matrix can represent the features contained in the dialog information and the semantic relationship information between the features in a dense matrix with a lower dimension, so as to improve the extraction efficiency of the features in the dialog information.

Step 203, parse the dialog information sample vector to determine a state information vector containing multi-dimensional features.

In one embodiment, the word vector may be feature extracted through a neural network model including a BERT optimizer, a bidirectional long-short term memory network, and a conditional random field, and a state information vector including multidimensional features corresponding to the dialogue information vector is determined according to the feature vector extracted by the neural network. Further, the dimensional features involved in the state information vector may be of various types, and in an exemplary but non-limiting embodiment, the features involved in the state information vector may be entity features, intention features, and emotion features, and accordingly, the dialog body, the dialog intention, and the emotion state involved in the dialog information are used as the features for identifying the received dialog information.

And 204, inputting the state information vector and the dialogue information vector of the historical dialogue information into a response strategy matching model, and performing feature extraction on the state information vector and the dialogue information vector of the historical dialogue information by the response strategy matching model.

In an embodiment, before the state information vector and the dialogue information vector of the historical dialogue information are input into the response strategy matching model, a dialect template matched with the state information vector can be determined, information filling is performed on word slots in the state information vector dialogue template, and in the case that empty word slots exist in the filled dialect template, prompt information for clarifying the dialect is constructed according to the dialogue information corresponding to the empty word slots, so that the dialogue information sample vector is re-analyzed by the system based on the prompt information, or alarm information related to the prompt information is sent, so that the administrator supplements feature information corresponding to the empty word slots in the dialect template characterized by the prompt information.

Step 205, determining a response strategy prediction information vector corresponding to the dialogue information sample according to the extracted features.

In an embodiment, a plurality of dialog strategy prediction information corresponding to the dialog information samples are determined according to the extracted features, wherein the value of the confidence coefficient of each dialog strategy prediction information may include a feature value obtained by subjecting the features to probability normalization processing, and further, the dialog strategy prediction information with the highest value of the confidence coefficient among the plurality of dialog strategy prediction information is determined as the dialog strategy prediction information vector corresponding to the dialog information samples.

Step 206, adjusting model parameters of the response strategy matching model based on a difference between the response strategy prediction information vector and the information vector of the response strategy tagging information.

In an embodiment, a loss function corresponding to the response policy matching model may be determined, and in a case that it is determined based on the loss function that an error between the dialog policy prediction information vector and the dialog policy annotation information vector is greater than a preset threshold, a parameter of the response policy matching model is updated based on an error back propagation algorithm.

And step 207, analyzing the dialogue information input by the consultant according to the trained response strategy matching model, and determining a response strategy matched with the dialogue information so as to take the response strategy as the response information corresponding to the dialogue information.

It can be known from the above embodiments that the response policy matching model for the response may be trained by the dialog information sample including the response policy markup information and the dialog information sample set corresponding to the historical dialog information of the dialog information sample in the same round of dialog, and during the training process, the dialog information sample subjected to vectorization is analyzed to determine the state information vector including the multidimensional feature corresponding to the dialog information sample, and then the response policy matching model adjusts the model parameters of the response policy matching model according to the difference between the response policy prediction information vector determined by the extracted feature and the information vector of the response policy markup information, so that the response policy matching model completed by training may intelligently respond to the dialog information input by the counselor, on the basis of ensuring the accuracy of the response information, the processing efficiency of the dialogue information is improved.

To further explain the technical solution of the present application, the following describes the technical solution of the present application in detail by using the embodiments corresponding to fig. 3 and fig. 4:

fig. 3 is a flow chart of another dialog response method according to an exemplary embodiment of the present application, which may include the following steps, as shown in fig. 3:

step 301, receiving the dialog information sent by the counselor in the current dialog.

The counselor can organize and send dialogue information according to the application requirements of the actual scene, and the dialogue information can be question consultation, call words, statement words for providing information and the like. The dialogue information sent by the consulting party can be question consultations such as 'want to ask about the latest style of the Audi vehicle' and 'ask about how much weather today' and the like; the dialogue information sent by the consulting party can also be calling words such as 'intelligent dialogue system', 'happy dialogue with you' and the like; the dialogue information sent by the counselor may be statement words such as "i know", "i do not know this action", etc. for feedback information.

Furthermore, the system can carry out multiple rounds of conversations with the consultant, each round of conversation can contain multiple times of conversation information sent by the consultant and response information aiming at the multiple times of conversation information, and in the practical application process, the conversation information sent by the consultant and the response information aiming at the conversation information can be monitored, so that the conversation information and the response information in each round of conversation can be maintained.

Specifically, the received dialog information or the sent information after the preset time period has elapsed may be set as the first information of a new round of dialog, or the dialog information determined as the call phrase may be determined as the first information of the new round of dialog, such as "hello", "consult a question", and the like. After determining the first message of a new round of conversation, determining the response message aiming at the first message as the conversation message in the same round of conversation with the first message, and for the convenience of description, determining the conversation message sent by the consultant and the response message aiming at the conversation message as one conversation, wherein the same round of conversation can contain a plurality of conversations.

In the process of sending the dialogue information, the consultation party can send the dialogue information through a front-end interface such as a web page of the terminal or an interface of an application program, wherein the application program can be software preinstalled in the terminal or third-party software installed by the user in the later period. Under the condition that the counseling party is the user, the counseling user can directly enter dialogue information for counseling in the front-end interface, or the counseling user can select question information for asking in the front-end interface, and then the equipment determines the question information selected by the counseling user as the dialogue information sent by the counseling user. In the case that the counselor is a virtual user composed of program codes, the counselor can automatically generate and send the dialogue information according to the pre-configured dialogue information set, wherein the dialogue information for sending can be determined according to the question frequency of each dialogue information in the dialogue information set.

Step 302, processing the received dialog information to determine a state information vector containing multi-dimensional features corresponding to the dialog information.

In one embodiment, entity recognition, intent recognition, and emotion recognition may be performed on the dialog information to determine a multi-dimensional state information vector containing entity features, intent features, and emotion features. In a practical application process, the entity feature determined after the entity identification may be a subject consulted in the dialogue information, the intention feature determined after the intention identification may correspond to a consultation intention of the consulting party, and the emotion feature determined after the emotion identification corresponds to an emotion of the consulting party in the dialogue information, such as if the received dialogue information sent by the consulting party is "ask about what activity the audi a4 has recently been", the entity feature determined after the entity identification of the dialogue information may be "audi a 4", the intention feature determined after the intention identification may be "acquire activity information", and accordingly, the emotion feature may be neutral.

In other application scenarios, the emotional features may also include positive and negative, such as the dialog message "i would like to know" the corresponding emotional feature is more positive than the dialog message "say it again next". Thus, the emotional characteristics can further mine the behavior tendency in the dialogue information sent by the consultant, such as in the case that the entity characteristics are the same and the intention characteristics all indicate that the understanding intention exists, the emotional tendency about the understanding intention can be defined through the emotional characteristics, for example, positive initiative, negative acceptance and the like. In the embodiment, in the process of processing the dialogue information, an emotion recognition process is added, so that the state information vector determined based on the dialogue information contains emotion features, the emotion motivation of the consultant is further subdivided according to the emotion features, and the accuracy of dialogue information analysis is improved.

Further, the features included in the dialog information are parsed to determine a state information vector corresponding to the dialog information that includes the multi-dimensional features. Specifically, the dialog information vector may be subjected to feature extraction by a BERT optimizer, a bidirectional long-short term memory network (blstm) and a conditional random field network (crf), and then a state information vector including multidimensional features corresponding to the dialog information vector is determined according to the feature vector extracted by the neural network, as shown in fig. 4, fig. 4 is a schematic diagram of a neural network model structure for determining the state information vector according to an exemplary embodiment of the present application, an exemplary neural network model for determining the state information vector includes a plurality of layers of bidirectional long-short term memory networks and a layer of conditional random field network, after the dialog information to be analyzed is dictionary-converted into a word vector, the word vector is input into the neural network model including the bidirectional long-short term memory network and the conditional random field network for feature extraction, where the conditional random field network serves as a decoding layer of the model, and further optimizing and adjusting the output result of the bidirectional long-short term memory network according to the relation among the response strategies, thereby obtaining the characteristic value output corresponding to the input dialogue information.

The BERT (bidirectional Encoder retrieval for transformations) optimizer executes semantic parsing on the dialog information by means of a bidirectionally trained language model, namely two independent deep attention mechanisms are arranged in the BERT optimizer, the whole text sequence in the dialog information is read at one time, the method is different from the prior art that the whole text sequence is read in the left-to-right or right-to-left sequence, and the BERT optimizer is equivalent to the context relationship between words in the dialog information in bidirectional parsing, so that the text information in the dialog information is extracted efficiently.

In the practical application process, the information of the characteristics extracted by the BERT optimizer can be a word vector sequence corresponding to the word vector after word segmentation processing, in the practical application process, the word vector corresponding to the dialog information can be determined through an Embedding layer in the BERT optimizer, and specifically, performing word segmentation processing on the conversation information sent by the consultant to determine a plurality of non-overlapping words after the conversation information is dictionary-formed, further determining word vectors corresponding to a plurality of non-overlapping words according to a preset WordEmbedding matrix, inputting the dialogue information represented by the word vector form into a neural network model comprising a BERT optimizer, a two-way long-short term memory network and a conditional random field, the efficiency of extracting the features in the dialogue information by the neural network model is improved by expressing the features contained in the dialogue information and the semantic relation information among the features in a dense matrix with lower dimensionality.

Step 303, determining whether the state information vector is clarified based on a preset speech template, if not, entering step 304, otherwise, entering step 305.

Step 304, sending a question response for dialectical clarification to the counselor, extracting a feature vector from the received reply information returned by the counselor to supplement and update the state information vector, and returning to step 303 after the update is completed.

In an embodiment, it may be determined whether a state information vector extracted according to the dialog information is complete through a dialog clarification mechanism, and in the case that the state information vector is incomplete, a question response for the dialog clarification is constructed based on the missing features, so that the consultant replies to the question response, and then the state information vector is complementarily updated according to the reply information.

Specifically, a dialect template for clarifying the state information vector can be determined, information filling is performed on word slots in the dialect template based on the state information vector, a question response for clarifying the dialect is constructed according to empty word slots in the filled dialect template, and the constructed question response is sent to the consultant so as to receive reply information of the consultant on the question response and add the reply information to the state information vector.

Such as when it is determined that the received dialog information is "what is the audi A8", it is determined that the matched dialect template is "what is the < vehicle type > of the audi A8", and "what is the < recent activity > of the audi A8", since the null word groove "vehicle type" and "recent activity" exist in the matched dialect template, a question response for dialect clarification, for example "asking whether the question asks about the vehicle type of the audi A8 or the recent activity" or "what information about the audi A8 you want to obtain" or the like, may be constructed based on the null word groove, and the constructed question response is transmitted to the counselor. Accordingly, in the case of receiving the reply information about the specific information about the audia 8 that the counselor wishes to know, the features included in the reply information can be extracted based on the neural network model, and then the feature vector corresponding to the reply information is supplemented to the state information vector corresponding to the original dialog information.

Further, the dialogical template for performing the dialogical clarification may be determined according to the extracted features in the dialog information, or may be determined according to historical dialog information in the same round of dialog as the dialog information, and the determination manner of the dialogical template in the present application is not limited.

And 305, inputting the state information vector of the completion of the dialect clarification and the historical dialogue information vector of the historical dialogue information in the same round as the current dialogue into a response strategy matching model, and determining the mapped preset number of response strategies according to the extracted feature vectors by the response strategy matching model.

And analyzing the dialogue information to obtain a state information vector of the multidimensional characteristic, inputting the obtained state information vector of the multidimensional characteristic and a historical dialogue information vector in the same round with the dialogue information into a response strategy matching model trained in advance, and performing characteristic extraction on the input information by the response strategy matching model.

In an embodiment, a reward function is added to an output layer behind a full connection layer in the response strategy matching model, so that normalization processing is performed on values, obtained by the full connection layer, corresponding to a plurality of response strategies, and then prediction probabilities of the response strategies are obtained. FIG. 5 is a diagram illustrating a reward mechanism added after a fully connected layer according to an exemplary embodiment of the present application, such as the weight vector w and the feature vector w trained in advance in FIG. 5

The values of all the response strategies meet the requirement of vector point multiplication operation and then are mapped to the response strategies of k categories

Namely, it is

Furthermore, the value of each response strategy can be obtained by adding a bias term on the basis of a point multiplication result of the weight term vector w and the feature vector, and the bias term allows the function to move left and right, so that the fitting effect of the neural network model is improved.

In the practical application process, the added reward mechanism may be a softmax function, and accordingly, in the process of mapping the feature vectors to the preset number of response strategies, the feature values of the feature vectors after the probability normalization processing may be determined as the values of the response strategies, and the feature values of the feature vectors after the probability normalization processing are determined as the confidence degrees corresponding to the response strategies. Specifically, the probability normalization function for determining the feature value corresponding to the feature vector is

In a function of probability normalization

The feature vector is a natural constant to the power of the value, k is the total number of categories of the dialog strategy, such as shown in fig. 5, and a softmax mechanism is added to an output layer, so that the value of each response strategy is converted into a probability value after normalization processing.

Furthermore, historical dialogue information in the same round as the current dialogue can be recorded, so that the dialogue information in the current dialogue is input into the response strategy matching model, the historical dialogue information containing the state information vector corresponding to the dialogue information is input into the response strategy matching model, and the dialogue information and the historical dialogue information in the same round as the dialogue information are subjected to feature extraction by the response strategy matching model to determine a response strategy corresponding to the dialogue information; similarly, the response information determined according to the response policy with the highest value in the response policies may be recorded in the peer-to-peer call to implement updating the session progress of the session state according to the session information sent by the counselor and the response information replied to the counselor, please refer to fig. 6, fig. 6 is a schematic diagram of a session state tracking process according to an exemplary embodiment of the present application, as shown in fig. 6, after the session information input by the counselor is analyzed, the session progress is updated based on the features obtained after the analysis, and then the historical session information vector containing the historical session information of the session information after the progress update is input into a response policy matching model, and the response information for information reply is determined based on the matched response policy, on the one hand, the response information is replied to the counselor, on the other hand, the session state is updated based on the response information, to ensure that the state of the conversation contains the latest progress of the conversation.

And step 306, sending the response information based on the response strategy with the highest value in the preset number of response strategies.

And determining the correlation between each response strategy and the characteristic vector based on the value of the response strategy, and further determining the response strategy with higher correlation with the characteristic vector as the response strategy for replying the dialogue information. In the application, the correlation degree between each response strategy and the dialogue information can be determined based on the value of the response strategy, and then the response information is sent based on the response strategy with the highest value in each response strategy.

Taking one dialog in the current round of dialog as an example, such as that a consultant in the dialog sends dialog information "good, i go to participate in the activity", through the preprocessing process of the dialog information and historical dialog information in the same round as the dialog information, the feature extraction executed by the response strategy matching model, and the like, it can be determined that the entity feature slot in the dialog information is related to the audi a4, the intention feature Intent is willing to participate in the activity, and the emotion feature sentment is positive, as the following analysis results obtained by the model:

Intent:interest,prob:0.75

Sentiment:pos,prob:0.90

neutral,prob:0.7

Neg,prob:0.2

Slot:A4

Action1:invitation,prob:0.97

Action2:deny,prob:0.3

aiming at the conversation initiated by the consultant, the determined response strategy comprises a response strategy 1: invitation to consultant, response policy 2: and rejecting the consultant, wherein the probability value corresponding to the value of the response strategy 1 is 0.97, and the probability value corresponding to the value of the response strategy 2 is 0.3, and sending response information based on the response strategy 1, namely sending information about inviting the consultant to participate in the activity.

Through the embodiment, after the process of the received dialogue information from the counseling party is completed, whether the state information vector completes the dialogue clarification is judged based on the preset dialogue template, and the question response for the dialogue clarification is sent to the counseling party under the condition that the dialogue clarification is not completed, so that the feature integrity of the state information vector in the input response strategy matching model is ensured, the low accuracy of the matched response strategy caused by the feature loss is avoided, even the rework is caused by the failure of determining the response strategy, and the determination efficiency of the corresponding response information is improved.

Fig. 7 is a flowchart of another method for training a response strategy matching model for dialog responses according to an exemplary embodiment of the present application, which may include the following steps, as shown in fig. 7:

step 701, determining a dialog information sample set for training a response strategy matching model.

In one embodiment, the set of dialog information samples used to train the response strategy matching model includes dialog information samples containing response strategy annotation information.

In the practical application process, after a dialogue information sample set containing response strategy labeling information can be established, dialogue information samples for establishing a training set, a verification set and a test set can be selected in the dialogue information sample set according to a preset proportion value, wherein the sample set of the training set is mainly used for training parameters in a neural network, after the training of the neural network based on the training set is finished, the performance of each model can be compared and judged through the verification set, evaluation indexes of the neural network model are determined by means of the samples of the test set, and training optimization of hyper-parameters which cannot be optimized based on the training set can be realized based on the verification set. Specifically, 75% of samples in the dialog information sample set may be used as a training set, 10% of samples may be used as a verification set, and the rest 5% of samples may be used as a test set, or 80% of samples in the dialog information sample set may be used as the training set, 10% of samples may be used as the verification set, and the rest 10% of samples may be used as the test set, and the like, and the application does not limit specific proportional values.

Step 702, processing the dialog information samples in the dialog information sample set to determine a state information vector containing multi-dimensional features.

In one embodiment, the processing of the dialog information samples in the dialog information sample set may include entity recognition, intention recognition and emotion recognition, and accordingly, entity feature vectors, intention feature vectors and emotion feature vectors corresponding to the dialog information samples may be determined.

In a practical application process, the entity feature determined after the entity identification may be a subject consulted in the dialogue information, the intention feature determined after the intention identification may correspond to a consultation intention of the consulting party, and the emotion feature determined after the emotion identification corresponds to an emotion of the consulting party in the dialogue information, such as if the received dialogue information sent by the consulting party is "ask about what activity the audi a4 has recently been", the entity feature determined after the entity identification of the dialogue information may be "audi a 4", the intention feature determined after the intention identification may be "acquire activity information", and accordingly, the emotion feature may be neutral.

Furthermore, in the process of analyzing the multidimensional feature vectors contained in the dialogue information sample, feature extraction can be performed on the dialogue information vector through a BERT optimizer, a bidirectional long-short term memory network (bilstm for short) and a conditional random field network (crf for short), and then the state information vector containing the multidimensional feature corresponding to the dialogue information vector is determined according to the feature vector extracted by the neural network.

In the practical application process, the information of the characteristics extracted by the BERT optimizer can be a word vector sequence corresponding to the word vector after word segmentation processing, in the practical application process, the word vector corresponding to the dialog information sample can be determined through an Embedding layer in the BERT optimizer, specifically, performing word segmentation processing on the conversation information sent by the consultant to determine a plurality of non-overlapping words after the conversation information is dictionary-formed, further determining word vectors corresponding to a plurality of non-overlapping words according to a preset WordEmbedding matrix, inputting the dialogue information represented by the word vector form into a neural network model comprising a BERT optimizer, a two-way long-short term memory network and a conditional random field, the efficiency of extracting the features in the dialogue information by the neural network model is improved by expressing the features contained in the dialogue information and the semantic relation information among the features in a dense matrix with lower dimensionality.

Step 703, inputting the state information vector and the dialogue information vector of the historical dialogue information in the same round as the dialogue information sample into a response strategy matching model.

Step 704, determining a response strategy prediction information vector corresponding to the dialog information sample according to the features extracted by the response strategy matching model.

In an embodiment, a reward function is added to an output layer behind a full connection layer in the response strategy matching model, so that normalization processing is performed on values, obtained by the full connection layer, corresponding to a plurality of response strategies, and then prediction probabilities of the response strategies are obtained.

Specifically, the added reward function may be a softmax function, and accordingly, in the process of mapping the feature vectors to the preset number of response strategies, the feature values of the feature vectors after the probability normalization processing may be determined as values of the response strategies, and the feature values of the feature vectors after the probability normalization processing are determined as confidence degrees corresponding to the response strategies. Further, the dialog policy prediction information with the highest confidence value among the plurality of dialog policy prediction information may be determined as the dialog policy prediction information vector corresponding to the dialog information sample.

Step 705, based on the response strategy prediction information vector, the response strategy labeling information vector and the loss function corresponding to the response strategy matching model, judging whether the loss value of the loss function is lower than a preset threshold value, if so, completing the training of the response strategy matching model, otherwise, adjusting the model parameters of the response strategy matching model based on the difference between the response strategy prediction information vector and the information vector of the response strategy labeling information.

In one embodiment, a loss function corresponding to the response strategy matching model is determined, and when it is determined based on the loss function that the error between the conversation strategy prediction information vector and the conversation strategy marking information vector is greater than a preset threshold value, the parameters of the response strategy matching model are updated based on an error back propagation algorithm.

In particular, the distance between the probability distribution corresponding to the reply policy prediction information vector and the probability distribution corresponding to the information vector of the reply policy annotation information, such as where the desired probability distribution corresponding to the reply policy prediction information vector is q (x), may be calculated by a cross entropy function_i) The expected probability distribution of the information vector of the response strategy labeling information is p (x)_i) In the case of (2), then the cross entropy function

For example, when the number of types of response strategies is 3, the expected probability distribution p (x) of the information vector of the response strategy labeling information labeled by the dialogue information sample as the training sample_i) Is (1, 0, 0) and the response strategy predicts the desired probability distribution q (x) for the information vector_i) Is (0.5, 0.2, 0.3), the cross entropy function value is

Further, under the condition that the loss value determined based on the response strategy prediction information vector, the response strategy labeling information vector and the loss function is lower than the preset threshold value, the training of the corresponding response strategy matching model is determined to be completed, further, the dialogue information input by the consultant can be analyzed according to the trained response strategy matching model, the response strategy matched with the dialogue information is determined, and the response strategy is used as the response information corresponding to the dialogue information.

FIG. 8 is a schematic block diagram of an electronic device in an exemplary embodiment in accordance with the subject application. Referring to fig. 8, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, but may also include hardware required for other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the dialogue response device on a logic level. Of course, besides the software implementation, the present application does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

Referring to fig. 9, fig. 9 is a block diagram of a dialog response device according to an exemplary embodiment of the present application, and in a software implementation, the dialog response device may include:

a processing unit 901, configured to perform vectorization processing on session information sent by a consultant in a current session during a current session with the consultant, so as to obtain a session information vector corresponding to the session information;

an parsing unit 902 that parses the dialog information vector to determine a state information vector comprising multidimensional features;

an input unit 903, which inputs the state information vector and the historical dialogue information vector of the historical dialogue information corresponding to the dialogue information in the current round of dialogue into a response strategy matching model, wherein the response strategy matching model is trained by adopting a dialogue information sample containing response strategy labeling information and the historical dialogue information corresponding to the dialogue information sample in the same round of dialogue in advance;

a first determining unit 904, which determines a plurality of response policies corresponding to the feature vectors extracted by the response policy matching model and the confidence degrees corresponding to the respective response policies;

second determining section 905 determines, as response information corresponding to the session information, a response policy with the highest confidence level among the plurality of response policies.

Optionally, the processing unit 901 is specifically configured to:

Optionally, the parsing unit 902 is specifically configured to:

Optionally, the method further includes:

a third determining unit 906 that determines a dialect template matching the state information vector;

an information filling unit 907 for performing information filling on word slots in the dialoging template based on the state information vector;

a dialect clarifying unit 908, configured to construct a question response for dialect clarification according to the dialect information corresponding to the empty word slot if the filled dialect template has the empty word slot;

an information adding unit 909 that receives reply information of the consultant to the question answer to add the reply information to the state information vector.

Optionally, the first determining unit 904 is specifically configured to:

FIG. 10 is a schematic block diagram of another electronic device in an exemplary embodiment in accordance with the subject application. Referring to fig. 10, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, but may also include hardware required for other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form a training device of a response strategy matching model for the dialogue response on a logic level. Of course, besides the software implementation, the present application does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

Referring to fig. 11, fig. 11 is a block diagram of a training apparatus for a response strategy matching model of a dialog response according to an exemplary embodiment of the present application, and in a software implementation, the training apparatus for the response strategy matching model of the dialog response may include:

a first determining unit 1101 that determines a dialog information sample set as a training sample, where the dialog information sample set includes dialog information samples including response policy labeling information and historical dialog information training completion corresponding to the dialog information samples in the same round of dialog;

the processing unit 1102 is configured to perform vectorization processing on the session information samples to obtain session information sample vectors corresponding to the session information samples;

an analyzing unit 1103, analyzing the dialog information sample vector to determine a state information vector containing multi-dimensional features;

an input unit 1104 that inputs the state information vector and the session information vector of the historical session information into a response policy matching model to perform feature extraction on the state information vector and the session information vector of the historical session information by the response policy matching model;

a second determining unit 1105, configured to determine, according to the extracted features, a response policy prediction information vector corresponding to the dialog information sample;

a parameter adjusting unit 1106 configured to adjust a model parameter of the response policy matching model based on a difference between the response policy prediction information vector and the information vector of the response policy labeling information;

the information response unit 1107 analyzes the dialog information input by the consultant according to the trained response strategy matching model, determines a response strategy matched with the dialog information, and uses the response strategy as the response information corresponding to the dialog information.

The device corresponds to the method, and more details are not repeated.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A dialog response method, characterized in that the method comprises:

2. The method as claimed in claim 1, wherein the vectorizing the session information sent by the counselor in the current session to obtain the session information vector corresponding to the session information comprises:

3. The method of claim 1, wherein parsing the dialog information vector to determine a state information vector containing multidimensional features comprises:

4. The method of claim 1, wherein the dialog information vector for the current round of historical dialog comprises the dialog information vector.

5. The method of claim 1, further comprising, before inputting the state information vector and the dialog information vector of the current round of historical dialog into a response policy matching model:

determining a dialoging template that matches the state information vector;

6. The method of claim 1, wherein the determining the confidence levels corresponding to the plurality of response strategies and the respective response strategies corresponding to the feature vectors extracted by the response strategy matching model comprises:

7. The method of claim 1, wherein the response strategy matching model is a learning model in reinforcement learning or a recurrent neural network model in deep learning.

8. A training method for a response strategy matching model for dialog responses, the training method comprising:

9. A dialog response device, characterized in that the device comprises:

10. A training apparatus for a response strategy matching model for a dialog response, the training apparatus comprising:

11. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured with executable instructions to implement the method of any one of claims 1-8.

12. A computer-readable storage medium having stored thereon computer instructions, which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 8.