Disclosure of Invention
The invention provides a multi-turn dialogue device and a multi-turn dialogue method for solving the technical problems in the prior art.
In order to achieve the above object, the present invention provides a multi-round dialog device, which includes a data processing module, a characterization module, a feature extraction module, a question-answer feature similarity module, and an objective function module, wherein:
the data processing module is used for analyzing the multi-round conversation data of the historical chat to obtain input data: the above dialogue text data, question data, and answer data;
the representation module is used for mapping input data to obtain a sentence vector set;
the feature extraction module is used for analyzing the sentence vector set to obtain the above feature vector, the question feature vector and the answer feature vector;
the question-answer feature similarity module is used for processing the above feature vectors, the question feature vectors and the answer feature vectors to obtain a scoring matrix;
and the target function module is used for setting a target function suitable for the multi-turn dialogue device according to the grading matrix.
Further, the mapping the input data by the characterization module comprises:
dividing each sentence of the above dialogue text data, the question data and the answer data into words;
the position of each word is represented by ID;
representing each ID by a random vector with N dimensions;
and obtaining a sentence vector set.
Further, the question-answer feature similarity module is configured to process the feature vectors to obtain a scoring matrix, and the scoring matrix includes:
splicing and summing the above feature vectors and the problem feature vectors;
and carrying out matrix multiplication on the answer feature vector and the features obtained after splicing to obtain a scoring matrix.
Further, the target function module obtains a target function by using softmax as an activation function and carrying out derivation on the cross entropy by using a loss function.
Furthermore, the feature extraction module is formed by stacking a plurality of double encoder modules, wherein each double encoder module is structured by a self-attention layer, a normalization layer, a feedforward neural network layer and a normalization layer which are sequentially connected.
Further, the normalization layer is formed by performing normalization processing on the output vectors after residual connection and input vector addition residual connection.
The invention also discloses a multi-turn dialogue method, which is applied to a multi-turn dialogue device and comprises the following steps:
converting the current input sound of the user into a natural language text;
inputting a multi-turn dialogue device by combining the historical dialogue state and the current natural language text;
predicting the current conversation state by the multi-turn conversation device according to the historical conversation state and the current natural language text;
outputting corresponding system behaviors according to the current conversation state;
converting the system behavior into natural language text or voice to form a round of conversation;
waiting for the next round of voice input by the user to carry out the next round of conversation;
the multi-turn dialog device is any one of the above multi-turn dialog devices.
The invention also discloses a multi-turn dialogue method, which is applied to a multi-turn dialogue device and comprises the following steps:
receiving a natural language text input by a current user;
combining the historical dialogue state and the current natural language text information to input a multi-turn dialogue device;
predicting the current conversation state by the multi-turn conversation device according to the historical conversation state and the current natural language text;
outputting corresponding system behaviors according to the current conversation state;
converting the system behavior into natural language text or voice to form a round of conversation;
waiting for the natural language text input by the user in the next round to carry out the next round of conversation;
the multi-turn dialog device is any one of the above multi-turn dialog devices.
The present invention also discloses an electronic device, comprising: the multi-turn dialog method comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor and the storage medium are communicated through the bus, and the processor executes the machine-readable instructions to execute the multi-turn dialog method.
The invention also discloses a storage medium, wherein a computer program is stored on the storage medium, and the computer program is executed by a processor to execute the multi-turn dialogue method.
In practical applications, the modules in the method and system disclosed by the invention can be deployed on one target server, or each module can be deployed on different target servers independently, and particularly, the modules can be deployed on cluster target servers according to needs in order to provide stronger computing processing capacity.
Therefore, even under the condition that the sample size is not large, the multi-round conversation device can learn good context characteristics, so that the problems of the user can be predicted more accurately and answers can be provided, the network structure is simple, a lightweight memory and an energy-saving model are realized, the model can be trained under a small amount of conversation linguistic data, and the natural language understanding capability of the robot is further improved.
In order that the invention may be more clearly and fully understood, specific embodiments thereof are described in detail below with reference to the accompanying drawings.
Detailed Description
Referring to fig. 1, fig. 1 shows a schematic structural diagram of a multi-turn dialog device,
the multi-turn dialogue device is characterized in that the previous scenes, the current problems and the corresponding reply contents of historical multi-turn dialogue are respectively put into a shared transform encoder to obtain representations based on the previous scenes, the problems and the answers, then the representations of the previous scenes and the current problems are fused to obtain fusion characteristics, then the fusion characteristics and the answer representations are interacted to obtain a similarity matrix, and finally the number of turns of the next turn of dialogue is predicted by the representations of the similarity matrix to construct an objective function.
The multi-turn dialog device constructed as described above can learn good contextual characteristics even under a condition that the sample size is not large, thereby being able to predict the user's question and provide an answer more accurately.
As an implementation manner, the multi-round dialog device in the embodiment of the present application includes a data processing module, a characterization module, a feature extraction module, a question-answer feature similarity module, and an objective function module, where:
the data processing module splits and analyzes the multi-round conversation data of the historical chat, divides the multi-round conversation data into the text (namely historical conversation content), the questions and the answers, and constructs the text data, the question data and the answer data of the conversation, the human questions and the robot replies, namely the text data, the question data and the answer data of the conversation, and the text data, the question data and the answer data are used as the input data of the model or used for training the model.
The human question refers to a question which is provided by a user to the chat robot or the intelligent client question-answering system, and the answer data replied by the robot refers to answer data replied by the chat robot or the intelligent client question-answering system according to the question provided by the user.
The representation module maps or converts the input data, namely the above dialogue text data, the question data and the answer data, to obtain a sentence vector set.
As a preferred embodiment, the realization mode is as follows:
the words of each sentence of the above dialogue text data, question data and answer data are firstly segmented, then the position or address of each word is represented by ID, and the ID information of each word is represented by N-dimensional (for example, 512-dimensional) random vector, so as to construct a sentence vector set of the above dialogue text data, question data and answer data, and then the sentence vector set can be input into the feature extraction module for feature extraction or extraction.
In this embodiment, the position or address of each word is represented by an ID, so that in the multi-turn dialogue apparatus of the present application, it is used to select a mask for predictive training at the time of pre-training. When the method is specifically implemented, a certain probability algorithm can be designed, each ID has a certain probability or is randomly selected to be replaced by being shielded, and then the front and rear words of the words to be shielded can be used for guessing what the words to be shielded are.
In addition, the turns of the current conversation and the occluded words are used as labels for sentences and words, respectively.
The feature extraction module is used for analyzing the sentence vector set to obtain the above feature vector, the question feature vector and the answer feature vector.
As a preferred implementation mode, the feature extraction module is formed by stacking a plurality of double-encoder modules, a double encoder sharing 4 layers is adopted, each layer adopts a self-attention mechanism, and the structure of each double encoder is composed of a self-attention layer, a normalization layer, a feedforward neural network layer and a normalization layer which are connected in sequence.
In this embodiment, the feature extraction module uses a self-attention layer of a self-attention mechanism as a part of its technical solution, the self-attention mechanism may fully consider semantics and grammatical relations between different words in a sentence, and the word vector obtained by such calculation may further consider a relation between contexts, for example, "the birdcanflyblueriesutiawing" in a sentence, in which the machine may be able to link it and bird, and in a system of multiple rounds of conversations, the semantics of the contexts may be understood more.
All be provided with the normalization layer after self-attention layer and feedforward neural network layer, the advantage of normalization makes the characteristic distribute in less controllable value domain space, reduces the search range, not only accelerates the speed of training, has also improved the stability of training moreover, lets the many rounds of dialogue device convergence of this application faster, and the rate of accuracy is higher. The method and the device can realize a lightweight design system, and the multi-turn dialogue device can well understand the context and predict the context even under the condition that the sample size is not large.
In a more preferred embodiment, the normalization layer is performed after the output vector is subjected to residual concatenation and the input vector addition residual concatenation. In the preferred embodiment, the technical effect obtained by residual connection is to prevent information loss when the number of network layers is too large, and further improve the accuracy of the present application.
The question-answer feature similarity module is used for processing the above feature vectors, the question feature vectors and the answer feature vectors to obtain a scoring matrix.
Referring to fig. 2, as a preferred implementation manner, the question-answer feature similarity module in the embodiment of the present application fuses the above feature vectors and the question feature vectors, that is, performs splicing summation, so as to obtain richer semantic features, that is, global features of the above dialog and local features of the current question.
And then carrying out matrix multiplication on the answer feature vector and the spliced above and question features to obtain a scoring matrix. The purpose of the scoring matrix is to enable the combined features formed by the spliced text and the question features to be distributed more closely to the answer features, so that the characterization capability of the multi-turn dialogue device is improved, and the combined features are also used as input features of a cross entropy loss function of the dynamic turn number label.
And the target function module is used for setting a target function suitable for the multi-turn dialogue device according to the grading matrix. In a preferred embodiment, the objective function module of the present application obtains the objective function by derivation using softmax as an activation function and a loss function as a cross entropy.
Based on the multi-turn dialogue device of the above embodiment, the present application also discloses a multi-turn dialogue method, which includes the steps of:
converting the current input sound of the user into a natural language text;
inputting a multi-turn dialogue device by combining the historical dialogue state and the current natural language text;
predicting the current conversation state by the multi-turn conversation device according to the historical conversation state and the current natural language text;
outputting corresponding system behaviors according to the current conversation state;
converting the system behavior into natural language text or voice to form a round of conversation;
waiting for the next round of voice input by the user to carry out the next round of conversation;
the multi-turn dialog device used is the multi-turn dialog device of the above-described embodiment.
In addition, based on the above embodiment, a variation of the multi-turn dialog method includes:
receiving a natural language text input by a current user;
combining the historical dialogue state and the current natural language text information to input a multi-turn dialogue device;
predicting the current conversation state by the multi-turn conversation device according to the historical conversation state and the current natural language text;
outputting corresponding system behaviors according to the current conversation state;
converting the system behavior into natural language text or voice to form a round of conversation;
waiting for the natural language text input by the user in the next round to carry out the next round of conversation;
the multi-turn dialog device used is the multi-turn dialog device of the above-described embodiment.
The present application further provides an electronic device, comprising: the system comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor is communicated with the storage medium through the bus, and the processor executes the machine-readable instructions to execute the method according to the embodiment.
The present application also provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the method as described in the above embodiments.
It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.