WO2020177282A1

WO2020177282A1 - Machine dialogue method and apparatus, computer device, and storage medium

Info

Publication number: WO2020177282A1
Application number: PCT/CN2019/103612
Authority: WO
Inventors: 吴壮伟
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-03-01
Filing date: 2019-08-30
Publication date: 2020-09-10
Also published as: CN110046221A; CN110046221B

Abstract

The embodiment of the present invention relates to the technical field of artificial intelligence. Disclosed are a machine dialogue method and apparatus, a computer device, and a storage medium. The method comprises the following steps: acquiring language information input by a current user; inputting the language information into a pre-set intention recognition model, and acquiring a dialogue intention output by the intention recognition model in response to the language information; inputting the dialogue intention into a pre-set response decision model, and acquiring a response strategy output by the response decision model in response to the dialogue intention; and inputting the language information into a response generation model having a mapping relationship with the response strategy, and acquiring response information input by the response generation model in response to the language information. Performing Intention recognition, determining a response generation model and generating responses of different types realize that the dialogue is diversified and more interesting.

Description

Machine dialogue method, device, computer equipment and storage medium

cross reference

This application is based on the Chinese invention patent application filed on March 1, 2019 with the application number 201910154323.9, titled "a machine dialogue method, device, computer equipment and storage medium", and claims its priority.

Technical field

The present invention relates to the field of artificial intelligence technology, in particular to a machine dialogue method, device, computer equipment and storage medium.

Background technique

With the development of artificial intelligence technology, chatbots have gradually emerged. A chatbot is a program used to simulate human conversations or chats. It can be used for practical purposes, such as customer service, consultation and Q&A, and some social robots are used to chat with people.

Some chatbots will be equipped with natural language processing systems, but more often extract keywords from input sentences, and then retrieve answers based on keywords from the database. The answers of these chat bots are usually pretty, non-emotional, and the chat mode is the same, causing people to be less interested in chatting with them, and the utilization rate of chat bots is also low.

Summary of the invention

The invention provides a machine dialogue method, device, computer equipment and storage medium to solve the same problem that a chat robot answers.

A machine dialogue method includes the following steps:

Get the language information entered by the current user;

Inputting the language information into a preset intention recognition model, and obtaining a dialogue intention output by the intention recognition model in response to the language information;

The dialogue intention is input into a preset response decision model, and the response strategy output by the response decision model in response to the dialogue intention is obtained, wherein the response decision model is used to obtain a response strategy from a plurality of preset candidate response strategies Select a response strategy corresponding to the dialogue intention in the dialog;

The language information is input into a response generation model having a mapping relationship with the response strategy, and the response information input by the response generation model in response to the language information is obtained.

A machine dialogue device, including:

The acquisition module is used to acquire the language information input by the current user;

A recognition module, which inputs the language information into a preset intention recognition model, and obtains a dialogue intention output by the intention recognition model in response to the language information;

The calculation module inputs the dialog intention into a preset response decision model, and obtains the response strategy output by the response decision model in response to the dialog intention, wherein the response decision model is used to obtain a response from a plurality of preset Selecting a response strategy corresponding to the dialogue intention among candidate response strategies;

A generating module inputs the language information into a response generation model that has a mapping relationship with the response strategy, and obtains response information input by the response generation model in response to the language information.

A computer device, comprising a memory and a processor, the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor executes the steps of the machine dialogue method described above .

A computer-readable storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the processor executes the steps of the machine dialogue method described above.

The beneficial effects of the embodiments of the present invention are: by acquiring the language information input by the current user; inputting the language information into a preset intention recognition model, and acquiring the dialogue intention output by the intention recognition model in response to the language information; The dialogue intention is input into a preset response decision model, and the response strategy output by the response decision model in response to the dialogue intention is obtained, wherein the response decision model is used to obtain a response strategy from a plurality of preset candidate response strategies Select the response strategy corresponding to the dialogue intention; input the language information into a response generation model that has a mapping relationship with the response strategy, and obtain the response information input by the response generation model in response to the language information. Through the identification of the intention of the input sentence, the response generation model is determined, and the reinforcement learning network model is introduced in the process of determining the response generation model. For different intentions, different response generation models are used to generate different types of responses, so that the dialogue is diversified and more Interesting.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those skilled in the art, without creative work, other drawings can be obtained based on these drawings.

FIG. 1 is a schematic diagram of the basic flow of a machine dialogue method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a process flow of determining a response strategy using a Q-value matrix in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a process flow of determining a response strategy using a Q value reinforcement learning network according to an embodiment of the present invention;

4 is a schematic diagram of the training process of an LSTM-CNN neural network model according to an embodiment of the present invention;

5 is a block diagram of the basic structure of a machine dialogue device according to an embodiment of the present invention;

Fig. 6 is a block diagram of the basic structure of a computer device according to an embodiment of the present invention.

detailed description

In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention.

In some processes described in the specification and claims of the present invention and the above-mentioned drawings, multiple operations appearing in a specific order are included, but it should be clearly understood that these operations may not be performed in the order in which they appear in this document. Execution or parallel execution, the sequence numbers of operations, such as 101, 102, etc., are only used to distinguish different operations, and the sequence numbers themselves do not represent any execution order. In addition, these processes may include more or fewer operations, and these operations may be executed sequentially or in parallel. It should be noted that the descriptions of "first" and "second" in this article are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, nor do they limit the "first" and "second" Are different types.

The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present invention.

Example

Those skilled in the art can understand that the term "terminal" and "terminal equipment" used herein include both wireless signal receiver equipment, equipment that only has wireless signal receivers without transmitting capability, and equipment receiving and transmitting hardware. A device, which has a device capable of performing two-way communication receiving and transmitting hardware on a two-way communication link. Such equipment may include: cellular or other communication equipment, which has a single-line display or multi-line display or cellular or other communication equipment without a multi-line display; PCS (Personal Communications Service, personal communication system), which can combine voice and data Processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notebooks, calendars and/or GPS (Global Positioning System (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device, which has and/or includes a radio frequency receiver, a conventional laptop and/or palmtop computer or other device. The "terminal" and "terminal equipment" used here may be portable, transportable, installed in vehicles (aviation, sea and/or land), or suitable and/or configured to operate locally, and/or In a distributed form, it runs on the earth and/or any other location in space. The "terminal" and "terminal device" used here can also be communication terminals, Internet terminals, music/video playback terminals, such as PDA, MID (Mobile Internet Device, mobile Internet device) and/or music/video playback Functional mobile phones can also be devices such as smart TVs and set-top boxes.

The terminal in this embodiment is the aforementioned terminal.

Specifically, please refer to FIG. 1, which is a schematic diagram of the basic flow of a machine dialogue method in this embodiment.

As shown in Figure 1, a machine dialogue method includes the following steps:

S101. Acquire language information input by the current user;

The language information input by the user is acquired through the interactive page on the terminal. The received information can be text information or voice information. The voice information is converted into text information through a voice recognition device.

S102. Input the language information into a preset intention recognition model, and obtain a dialogue intention output by the intention recognition model in response to the language information;

Input the textual language information into the preset intention recognition model to recognize the user's dialogue intention. The recognition of the dialogue intention can be based on keywords, for example, to determine whether the intent is task-based or chat-type. The task-type is the dialogue intention that requires robots to answer questions. It can be determined whether the input language information contains query keywords, such as "?" "What", "How much", "Where", "How" and other interrogative mood particles. You can also use a regular matching algorithm to determine whether the input language information is a question sentence. A regular expression is a logical formula for string manipulation. It uses predefined specific characters and combinations of these specific characters to form a "rule" String", this "rule string" is used to express a kind of filtering logic for string.

When the input language information is not an interrogative sentence, it is judged that the dialogue intention is a chat type. Further, dialogue intentions can be subdivided. For example, the chat type can be subdivided into positive types, including emotions such as affirmation, praise, and thanks, and negative types, including emotions such as complaints, complaints, and accusations. The subdivided dialogue intentions can be judged by the preset keyword list. For each dialogue intention, a keyword list is preset. When the keywords in the extracted input language information are in the keyword list corresponding to a certain dialogue intention When the words match, it is considered that the input language information corresponds to the dialogue intention.

In the embodiment of the present invention, the dialogue intention recognition is performed through the pre-trained LSTM-CNN neural network model. Specifically, for the input content, first perform Chinese word segmentation, use the basic word segmentation library, and sequentially enter to remove stop words, punctuation, etc., obtain the word embedding vector through the word vector model, and pass it to the neural network model based on LSTM-CNN. The word embedding vector enters the multi-layer LSTM neural unit to obtain the state vector and output of each stage; then, based on the state vector of each stage, perform convolution operation and pooling operation (CNN) to obtain the integrated vector index; then the integrated vector index Enter the softmax function to get the probability of the corresponding intention. The intention with the highest probability is the dialogue intention corresponding to the input language information. Specifically, please refer to Figure 4 for the training process of the LSTM-CNN neural network model.

S103. Input the dialog intention into a preset response decision model, and obtain a response strategy output by the response decision model in response to the dialog intention, wherein the response decision model is used to obtain a response from a plurality of preset candidates Select a response strategy corresponding to the dialogue intention in the response strategy;

After the processing of step S102, the dialogue intention of the input language information is obtained, and the dialogue intention is input into the response decision model to determine the response strategy for the input language information. In order to make the dialogue more emotional and more interesting, different response strategies can be preset for different dialogue intentions, for example, for task-based intentions, the response strategy is question answering, and for negative intentions, the response strategy is emotional resolution. Type, for positive intentions, the response strategy is emotional empathy. Different response strategies correspond to different response generation models.

In the embodiment of the present invention, the Q value is calculated to determine the response strategy to be adopted for the dialogue intention. The Q value is used to measure the value of a certain response strategy for a certain dialogue intention to the entire chat process. For example, we examine the degree of pleasure of the chat. The degree of pleasure can be accounted for by the negative intention sentences of the entire dialogue process as the user’s input in the current round of dialogue. Measured by the ratio of the number, the Q value is the value of a certain response strategy for a certain round of dialogue to chat pleasure.

A Q-value matrix can be preset through empirical values, the elements of which are q(s,a), s∈S, a∈A, where S is the dialogue intention space and A is the response strategy space.

q(1,1)... q(1,a)

……

q(s,1)... q(s,a)

In some embodiments, the Q value is calculated by a Q value reinforcement learning network model. The input of Q-value reinforcement learning network model is s, which is the dialogue intention, and the output is Q(s, a). That is, starting from state s and adopting strategy a, the expected benefits can be obtained. The training of the Q-value reinforcement learning network model takes the convergence of the first loss function as the training objective, and the first loss function is

Among them, s is the dialogue intention, a is the response strategy, w is the network parameter of the Q value reinforcement learning network model, and Q is the true value.

Is the predicted value. When the first loss function converges, w is the network parameter trained by the Q-value reinforcement learning network model.

The response decision model is the aforementioned Q-value matrix or Q-value reinforcement learning network model.

S104. Input the language information to a response generation model that has a mapping relationship with the response strategy, and obtain response information input by the response generation model in response to the language information.

For each response strategy, a corresponding response generation model is preset. For example, the response strategy is a question answering type, and the corresponding response generation model includes a question and answer database, and matches the corresponding answer by searching for keywords in the input language information. For the response strategy of emotional resolution, the corresponding response generation model adopts the trained Seq2Seq model. The specific training process is to prepare the training corpus, that is, prepare the input sequence and the corresponding output sequence, input the input sequence into the Seq2Seq model, and calculate the output For the probability of the sequence, adjust the parameters of the Seq2Seq model so that the entire sample, that is, all input sequences, has the highest probability of outputting the corresponding output sequence after Seq2Seq. The training corpus prepared here requires the sentiment of the input sentence to be negative and the sentiment of the output sentence to be positive.

As shown in Fig. 2, when the preset Q value matrix is used to determine the response strategy corresponding to the dialogue intention, step S103 further includes the following steps:

S111. Query the Q value matrix according to the dialogue intention;

Query the q value of each candidate response strategy corresponding to the dialogue intention in the Q value matrix.

S112: Determine that the candidate response strategy corresponding to the largest q value in the Q value matrix is the response strategy of the dialogue intention.

The candidate response strategy with the largest q value is the response strategy corresponding to the dialogue intention.

As shown in Fig. 3, when the pre-trained Q-value reinforcement learning network model is used to determine the response strategy corresponding to the dialogue intention, step S103 further includes the following steps:

S121: Input the candidate response strategy and the dialogue intention into the Q-value reinforcement learning network model in turn, and obtain the Q value corresponding to each candidate response strategy output by the Q-value reinforcement learning network model;

When calculating the Q value of each candidate response strategy, the candidate response strategy and the dialogue intention are input into the Q value reinforcement learning network model to obtain the Q value of the dialogue intention using the response strategy.

S122: Determine that the candidate response strategy with the largest Q value is the response strategy of the dialogue intention.

It is determined that the candidate response strategy with the largest Q value is the response strategy that the dialogue intention should adopt.

As shown in Figure 4, the training of the LSTM-CNN neural network model in the embodiment of the present invention includes the following steps:

S131. Obtain training samples marked with dialogue intention categories, where the training samples are language information marked with different dialogue intention categories;

Prepare training samples. The training samples are labeled with the category of dialogue intent. In the embodiment of the present invention, the types of training sample marks are task type and chat type. The task type responds to user needs for answering questions, and the chat type responds to applications and needs for small talk.

S132: Input the training sample into the LSTM-CNN neural network model to obtain the dialogue intention reference category of the training sample;

First perform Chinese word segmentation on the training samples. You can use the basic word segmentation database, and then enter to remove stop words, punctuation, etc., obtain the word embedding vector through the word vector model, and input it into the LSTM-CNN neural network model, that is, the word embedding vector , Enter the multi-layer LSTM neural unit, get the state vector and output of each stage; then, based on the state vector of each stage, perform convolution and pooling operations (CNN) to obtain the integrated vector index; then enter the integrated vector index into softmax Function to get the probability of the corresponding intention.

S133. Compare whether the dialogue intention reference category of different samples in the training sample is consistent with the dialogue intention category by a second loss function, where the second loss function is:

Among them, N is the number of training samples. For the i-th sample, the corresponding label Yi is the final intent recognition result, h=(h1,h2,...,hc) is the prediction result of sample i, where C is all The number of categories;

In the embodiment of the present invention, the neural network model of LSTM-CNN takes the convergence of the second loss function as the training target, that is, by adjusting the weight of each node in the neural network model, the second loss function reaches the minimum value. When the weight is continuously adjusted, the loss When the value of the function no longer decreases, but instead increases, the training ends.

S134: When the dialogue intention reference category is inconsistent with the dialogue intention category, iteratively update the weights in the LSTM-CNN neural network model repeatedly and iteratively until the second loss function reaches a minimum value.

The second loss function is used to measure whether the conversation intention of the training sample predicted by the LSTM-CNN neural network model is consistent with the conversation intention category marked by the training sample. If the second loss function does not converge, adjust the neural network through the gradient descent method The weight of each node in the model ends when the reference type of dialogue intention predicted by the neural network is consistent with the type of dialogue intention marked by the training sample. That is to continue to adjust the weight, the value of the loss function no longer decreases, but increases instead, the training ends.

To solve the above technical problems, the embodiment of the present invention also provides a machine dialogue device. Please refer to FIG. 5 for details. FIG. 5 is a block diagram of the basic structure of the machine dialogue device of this embodiment.

As shown in FIG. 5, a machine dialogue device includes: an acquisition module 210, an identification module 220, a calculation module 230, and a generation module 240. Wherein, the obtaining module 210 is used to obtain the language information input by the current user; the recognition module 220 is used to input the language information into a preset intention recognition model, and obtain the dialogue output by the intention recognition model in response to the language information. Intention; calculation module 230, input the dialogue intent into a preset response decision model, and obtain the response strategy output by the response decision model in response to the dialogue intention, wherein the response decision model is used from the preset The response strategy corresponding to the dialogue intention is selected among the multiple candidate response strategies; the generation module 240 inputs the language information into the response generation model that has a mapping relationship with the response strategy, and obtains the response information of the response generation model The response information entered while describing the language information.

The embodiment of the present invention obtains the language information input by the current user; inputs the language information into a preset intention recognition model, and obtains the dialogue intention output by the intention recognition model in response to the language information; Input into a preset response decision model to obtain the response strategy output by the response decision model in response to the dialogue intention, wherein the response decision model is used to select from a plurality of preset candidate response strategies and A response strategy corresponding to the dialogue intention; the language information is input to a response generation model that has a mapping relationship with the response strategy, and the response information input by the response generation model in response to the language information is obtained. Through the identification of the intention of the input sentence, the response generation model is determined, and the reinforcement learning network model is introduced in the process of determining the response generation model. For different intentions, different response generation models are used to generate different types of responses, so that the dialogue is diversified and more Interesting.

In some embodiments, the response decision model in the machine dialogue device is based on a preset Q-value matrix, wherein the element q in the Q-value matrix is used to evaluate the value of each candidate response strategy for each dialogue intention. The machine dialogue device further includes: a first query submodule and a first confirmation submodule, wherein the first query submodule is used for querying the Q-value matrix according to the dialogue intention; the first confirmation submodule is used for It is determined that the candidate response strategy corresponding to the largest q value in the Q value matrix is the response strategy of the dialogue intention.

In some implementations, the response decision model in the machine dialogue device is based on a pre-trained Q-value reinforcement learning network model, wherein the Q-value reinforcement learning network model is characterized by the following first loss function:

Is the predicted value; adjusting the value of the network parameter w of the Q-value reinforcement learning network model, so that when the first loss function reaches the minimum value, the Q-value reinforcement learning network model defined by the value of the network parameter w is determined to be Pre-trained Q value reinforcement learning network model.

In some embodiments, the machine dialogue device further includes: a first processing submodule and a second confirmation submodule. Wherein, the first processing sub-module is configured to sequentially input candidate response strategies and the dialogue intention into the Q-value reinforcement learning network model, and obtain Q corresponding to each candidate response strategy output by the Q-value reinforcement learning network model. Value; the second confirmation sub-module is used to determine that the candidate response strategy with the largest Q value is the response strategy of the dialogue intention.

In some embodiments, the preset intention recognition model in the machine dialogue device uses a pre-trained LSTM-CNN neural network model, and the machine dialogue device further includes: a first acquisition submodule, a second processing submodule, and a A comparison sub-module and a first execution sub-module, wherein the first acquisition sub-module is used to acquire training samples marked with dialogue intention categories, and the training samples are language information marked with different dialogue intention categories; second processing The sub-module is used to input the training samples into the LSTM-CNN neural network model to obtain the reference category of the dialogue intention of the training samples; the first comparison sub-module is used to compare the differences in the training samples through the second loss function Whether the sample dialogue intention reference category is consistent with the dialogue intention category, wherein the second loss function is:

Among them, N is the number of training samples. For the i-th sample, the corresponding label Yi is the final intent recognition result, h=(h1,h2,...,hc) is the prediction result of sample i, where C is all The number of categories; the first execution sub-module is used to repeatedly update the weights in the LSTM-CNN neural network model when the dialogue intention reference category is inconsistent with the dialogue intention category to the second It ends when the loss function reaches its minimum value.

In some embodiments, the preset intent recognition model in the machine dialogue device adopts a regular matching algorithm, wherein the rule character string used by the regular matching algorithm includes at least a question feature string, and the machine dialogue device also It includes a first matching sub-module, which is used to perform a regular matching operation between the language information and the rule string. When the result is a match, it is determined that the dialogue intention is task-based, otherwise, it is determined that the dialogue intention is chat-type .

In some embodiments, the response generation model in the machine dialogue device includes at least a pre-trained Seq2Seq model, and the machine dialogue device further includes a second acquisition submodule and a third processing submodule, wherein the second acquisition submodule , Used to obtain training corpus, the training corpus includes an input sequence and an output sequence; a third processing sub-module, used to input the input sequence into the Seq2Seq model, adjust the parameters of the Seq2Seq model, and make the Seq2Seq model respond to the input The probability of outputting the output sequence is the greatest.

To solve the above technical problems, the embodiments of the present invention also provide computer equipment. Please refer to FIG. 6 for details. FIG. 6 is a block diagram of the basic structure of the computer device in this embodiment.

As shown in Figure 6, a schematic diagram of the internal structure of the computer equipment. As shown in Figure 6, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus. Wherein, the non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions. The database may store control information sequences. When the computer-readable instructions are executed by the processor, the processor can make the processor realize the above The machine conversation method described in any embodiment. The processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment. The computer readable instructions may be stored in the memory of the computer device, and when the computer readable instructions are executed by the processor, the processor can cause the processor to execute the machine dialogue method described in any of the foregoing embodiments. The network interface of the computer device is used to connect and communicate with the terminal. Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

In this embodiment, the processor is used to execute the specific content of the acquisition module 210, the recognition module 220, the calculation module 230, and the generation module 240 in FIG. 5, and the memory stores the program codes and various data required to execute the above modules. The network interface is used for data transmission between user terminals or servers. The memory in this embodiment stores the program codes and data required to execute all the sub-modules in the machine dialogue method, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.

The computer device obtains the language information input by the current user; inputs the language information into a preset intention recognition model, and obtains the dialogue intention output by the intention recognition model in response to the language information; and inputs the dialogue intention into In the preset response decision model, the response strategy output by the response decision model in response to the dialogue intention is obtained, wherein the response decision model is used to select the dialogue intention from a plurality of preset candidate response strategies Corresponding response strategy; input the language information to a response generation model that has a mapping relationship with the response strategy, and obtain response information input by the response generation model in response to the language information. Through the identification of the intention of the input sentence, the response generation model is determined, and the reinforcement learning network model is introduced in the process of determining the response generation model. For different intentions, different response generation models are used to generate different types of responses, so that the dialogue is diversified and more Interesting.

The present invention also provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the machine conversation method described in any of the above embodiments. A step of.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

It should be understood that, although the various steps in the flowchart of the drawings are shown in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least part of the steps in the flowchart of the drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a part of sub-steps or stages of other steps.

The above are only part of the embodiments of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.

Claims

A machine dialogue method, characterized in that it comprises the following steps:

Get the language information entered by the current user;

Inputting the language information into a preset intention recognition model, and obtaining a dialogue intention output by the intention recognition model in response to the language information;

The dialogue intention is input into a preset response decision model, and the response strategy output by the response decision model in response to the dialogue intention is obtained, wherein the response decision model is used to obtain a response strategy from a plurality of preset candidate response strategies Select a response strategy corresponding to the dialogue intention in the dialog;

The language information is input into a response generation model having a mapping relationship with the response strategy, and the response information input by the response generation model in response to the language information is obtained.
The machine dialogue method according to claim 1, wherein the response decision model is based on a preset Q value matrix, wherein the element q in the Q value matrix is used to evaluate each candidate response strategy for each dialogue intention The value of inputting the dialogue intention into the preset response decision model and obtaining the response strategy output by the response decision model in response to the dialogue intention specifically includes the following steps:

Query the Q-value matrix according to the dialogue intention;

It is determined that the candidate response strategy corresponding to the largest q value in the Q value matrix is the response strategy of the dialogue intention.
The machine dialogue method according to claim 1, wherein the response decision model is based on a pre-trained Q-value reinforcement learning network model, wherein the Q-value reinforcement learning network model is characterized by the following first loss function:

Among them, s is the dialogue intention, a is the response strategy, w is the network parameter of the Q value reinforcement learning network model, and Q is the true value.
Is the predicted value;

Adjust the value of the network parameter w of the Q-value reinforcement learning network model so that the first loss function reaches the minimum value, and determine that the Q-value reinforcement learning network model defined by the value of the network parameter w is the pre-trained Q Value reinforcement learning network model.
The machine dialogue method according to claim 3, wherein, in the step of inputting the dialogue intention into a preset response decision model, and obtaining the response strategy output by the response decision model in response to the dialogue intention , Specifically including the following steps:

Sequentially inputting the candidate response strategy and the dialogue intention into the Q value reinforcement learning network model, and obtaining the Q value corresponding to each candidate response strategy output by the Q value reinforcement learning network model;

It is determined that the candidate response strategy with the largest Q value is the response strategy of the dialogue intention.
The machine dialogue method of claim 1, wherein the preset intention recognition model adopts a pre-trained LSTM-CNN neural network model, wherein the LSTM-CNN neural network model is trained through the following steps :

Acquiring training samples marked with dialogue intention categories, where the training samples are language information marked with different dialogue intention categories;

Inputting the training sample into the LSTM-CNN neural network model to obtain the dialogue intention reference category of the training sample;

The second loss function is used to compare whether the dialogue intention reference category of different samples in the training sample is consistent with the dialogue intention category, wherein the second loss function is:

Among them, N is the number of training samples. For the i-th sample, the corresponding label Yi is the final intent recognition result, h=(h1,h2,...,hc) is the prediction result of sample i, where C is all The number of categories;

When the dialogue intention reference category is inconsistent with the dialogue intention category, the weights in the LSTM-CNN neural network model are updated repeatedly and iteratively until the second loss function reaches the minimum value.
The machine dialogue method according to claim 1, wherein the preset intent recognition model adopts a regular matching algorithm, wherein the rule string used by the regular matching algorithm contains at least a question feature string, and the The step of inputting the language information into a preset intention recognition model, and obtaining the dialogue intention output by the intention recognition model in response to the language information includes the following steps:

Perform a regular matching operation between the language information and the rule character string, and when the result is a match, it is determined that the dialogue intention is a task type; otherwise, it is determined that the dialogue intention is a chat type.
The machine dialogue method of claim 1, wherein the response generation model includes at least a pre-trained Seq2Seq model, wherein the Seq2Seq model is trained through the following steps:

Acquiring a training corpus, the training corpus including an input sequence and an output sequence;

The input sequence is input into the Seq2Seq model, and the parameters of the Seq2Seq model are adjusted so that the Seq2Seq model has the greatest probability of outputting the output sequence in response to the input sequence.
A machine dialogue device, characterized by comprising:

The acquisition module is used to acquire the language information input by the current user;

A recognition module, which inputs the language information into a preset intention recognition model, and obtains a dialogue intention output by the intention recognition model in response to the language information;

The calculation module inputs the dialog intention into a preset response decision model, and obtains the response strategy output by the response decision model in response to the dialog intention, wherein the response decision model is used to obtain a response from a plurality of preset Selecting a response strategy corresponding to the dialogue intention among candidate response strategies;

A generating module inputs the language information into a response generation model that has a mapping relationship with the response strategy, and obtains response information input by the response generation model in response to the language information.
The machine dialogue device according to claim 8, wherein the response decision model in the machine dialogue device is based on a preset Q value matrix, wherein the element q in the Q value matrix is used to evaluate each candidate response The value of the strategy to each dialogue intention, the machine dialogue device also includes:

The first query sub-module is used to query the Q-value matrix according to the dialogue intention;

The first confirmation sub-module is used to determine that the candidate response strategy corresponding to the largest q value in the Q value matrix is the response strategy of the dialogue intention.
The machine dialogue device according to claim 8, wherein the response decision model in the machine dialogue device is based on a pre-trained Q value reinforcement learning network model, wherein the Q value reinforcement learning network model is based on the following first The loss function is characterized by:

Among them, s is the dialogue intention, a is the response strategy, w is the network parameter of the Q value reinforcement learning network model, and Q is the true value.
Is the predicted value;

Adjust the value of the network parameter w of the Q-value reinforcement learning network model so that the first loss function reaches the minimum value, and determine that the Q-value reinforcement learning network model defined by the value of the network parameter w is the pre-trained Q Value reinforcement learning network model.
A computer device, characterized by comprising a memory and a processor, the memory storing computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor is caused to perform the following steps:

Get the language information entered by the current user;

Inputting the language information into a preset intention recognition model, and obtaining a dialogue intention output by the intention recognition model in response to the language information;

The dialogue intention is input into a preset response decision model, and the response strategy output by the response decision model in response to the dialogue intention is obtained, wherein the response decision model is used to obtain a response strategy from a plurality of preset candidate response strategies Select a response strategy corresponding to the dialogue intention in the dialog;

The language information is input into a response generation model having a mapping relationship with the response strategy, and the response information input by the response generation model in response to the language information is obtained.
The computer device according to claim 11, wherein the response decision model is based on a preset Q value matrix, wherein the element q in the Q value matrix is used to evaluate the response of each candidate response strategy to each dialogue intention Value, in the step of inputting the dialogue intention into the preset response decision model, and obtaining the response strategy output by the response decision model in response to the dialogue intention, specifically includes the following steps:

Query the Q-value matrix according to the dialogue intention;

It is determined that the candidate response strategy corresponding to the largest q value in the Q value matrix is the response strategy of the dialogue intention.
The computer device according to claim 11, wherein the response decision model is based on a pre-trained Q-value reinforcement learning network model, wherein the Q-value reinforcement learning network model is characterized by the following first loss function:

Among them, s is the dialogue intention, a is the response strategy, w is the network parameter of the Q value reinforcement learning network model, and Q is the true value.
Is the predicted value;

Adjust the value of the network parameter w of the Q-value reinforcement learning network model so that the first loss function reaches the minimum value, and determine that the Q-value reinforcement learning network model defined by the value of the network parameter w is the pre-trained Q Value reinforcement learning network model.
The computer device according to claim 13, wherein in the step of inputting the dialogue intention into a preset response decision model, and obtaining the response strategy output by the response decision model in response to the dialogue intention, Specifically include the following steps:

Sequentially inputting the candidate response strategy and the dialogue intention into the Q value reinforcement learning network model, and obtaining the Q value corresponding to each candidate response strategy output by the Q value reinforcement learning network model;

It is determined that the candidate response strategy with the largest Q value is the response strategy of the dialogue intention.
The computer device according to claim 11, wherein the preset intention recognition model adopts a pre-trained LSTM-CNN neural network model, wherein the LSTM-CNN neural network model is trained through the following steps:

Acquiring training samples marked with dialogue intention categories, where the training samples are language information marked with different dialogue intention categories;

Inputting the training sample into the LSTM-CNN neural network model to obtain the dialogue intention reference category of the training sample;

The second loss function is used to compare whether the dialogue intention reference category of different samples in the training sample is consistent with the dialogue intention category, wherein the second loss function is:

Among them, N is the number of training samples. For the i-th sample, the corresponding label Yi is the final intent recognition result, h=(h1,h2,...,hc) is the prediction result of sample i, where C is all The number of categories;

When the dialogue intention reference category is inconsistent with the dialogue intention category, the weights in the LSTM-CNN neural network model are updated repeatedly and iteratively until the second loss function reaches the minimum value.
A computer-readable storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the following steps are implemented:

Get the language information entered by the current user;

Inputting the language information into a preset intention recognition model, and obtaining a dialogue intention output by the intention recognition model in response to the language information;

The dialogue intention is input into a preset response decision model, and the response strategy output by the response decision model in response to the dialogue intention is obtained, wherein the response decision model is used to obtain a response strategy from a plurality of preset candidate response strategies Select a response strategy corresponding to the dialogue intention in the dialog;

The language information is input into a response generation model having a mapping relationship with the response strategy, and the response information input by the response generation model in response to the language information is obtained.
The computer device according to claim 16, wherein the response decision model is based on a preset Q value matrix, wherein the element q in the Q value matrix is used to evaluate the response of each candidate response strategy to each dialogue intention Value, in the step of inputting the dialogue intention into the preset response decision model, and obtaining the response strategy output by the response decision model in response to the dialogue intention, specifically includes the following steps:

Query the Q-value matrix according to the dialogue intention;

It is determined that the candidate response strategy corresponding to the largest q value in the Q value matrix is the response strategy of the dialogue intention.
The computer device according to claim 16, wherein the response decision model is based on a pre-trained Q-value reinforcement learning network model, wherein the Q-value reinforcement learning network model is characterized by the following first loss function:

Among them, s is the dialogue intention, a is the response strategy, w is the network parameter of the Q value reinforcement learning network model, and Q is the true value.
Is the predicted value;

Adjust the value of the network parameter w of the Q-value reinforcement learning network model so that the first loss function reaches the minimum value, and determine that the Q-value reinforcement learning network model defined by the value of the network parameter w is the pre-trained Q Value reinforcement learning network model.
18. The computer device according to claim 18, wherein in the step of inputting the dialogue intention into a preset response decision model, and obtaining the response strategy output by the response decision model in response to the dialogue intention, Specifically include the following steps:

Sequentially inputting the candidate response strategy and the dialogue intention into the Q value reinforcement learning network model, and obtaining the Q value corresponding to each candidate response strategy output by the Q value reinforcement learning network model;

It is determined that the candidate response strategy with the largest Q value is the response strategy of the dialogue intention.
The computer device according to claim 16, wherein the preset intention recognition model adopts a pre-trained LSTM-CNN neural network model, wherein the LSTM-CNN neural network model is trained through the following steps:

Acquiring training samples marked with dialogue intention categories, where the training samples are language information marked with different dialogue intention categories;

Inputting the training sample into the LSTM-CNN neural network model to obtain the dialogue intention reference category of the training sample;

The second loss function is used to compare whether the dialogue intention reference category of different samples in the training sample is consistent with the dialogue intention category, wherein the second loss function is:

Among them, N is the number of training samples. For the i-th sample, the corresponding label Yi is the final intent recognition result, h=(h1,h2,...,hc) is the prediction result of sample i, where C is all The number of categories;

When the dialogue intention reference category is inconsistent with the dialogue intention category, the weights in the LSTM-CNN neural network model are updated repeatedly and iteratively until the second loss function reaches the minimum value.