CN114416949A

CN114416949A - Dialogue generation model training method, dialogue reply generation method, dialogue generation device, dialogue reply generation medium

Info

Publication number: CN114416949A
Application number: CN202210059369.4A
Authority: CN
Inventors: 李浩然
Original assignee: Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2022-04-29

Abstract

The present disclosure provides a dialog generation model training method, a dialog reply generation device and a storage medium, wherein the method comprises the following steps: generating a training sample based on the historical round question information and the current question information, processing the training sample by using a dialogue generation model, determining an importance score corresponding to the historical round question information and the current question information, and determining a target word generation probability corresponding to the current round question information according to the importance score; generating reply prediction information corresponding to the current question information based on the target word generation probability; and determining a loss function according to the reply prediction information, and adjusting the dialogue generating model based on the loss function. According to the method, the device and the storage medium, the historical turn dialogue information is weighted by adopting a hierarchical attention mechanism, so that the weight between the related historical turn dialogue information and the question information of the current turn dialogue is improved, and the generation quality of the current turn reply is improved.

Description

Dialogue generation model training method, dialogue reply generation method, dialogue generation device, dialogue reply generation medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a device for training a conversation generation model and generating a conversation reply and a storage medium.

Background

The dialog generation task is a task for generating a reply according to the dialog history and the questions in the current turn. The conversation history is typically made up of multiple rounds of conversations, each consisting of a question and a reply. In the existing training process of the dialogue generation model, the input is to splice the question and the reply of each turn in the dialogue history into a long text, and the question and the reply of each turn are not distinguished. In practical applications, the questions and responses in each round are different in importance, the important questions and responses can play a positive role in the current response generation, and the questions and responses irrelevant to the current response generation are noise information and can influence the accuracy of the current response generation.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a dialog generation model training method, a dialog reply generation device, and a storage medium.

According to a first aspect of the present disclosure, there is provided a dialog generation model training method, including: generating historical round question information corresponding to the historical dialogue information according to question information and reply information in the historical dialogue information; generating a training sample based on the historical round question information and the current question information; processing the training sample by using a dialogue generating model, and determining importance scores corresponding to the historical round question information and the current question information; determining attention weight according to the importance score, and determining target word generation probability corresponding to the current round of question-asking information by using the dialogue generation model and according to the attention weight; generating reply prediction information corresponding to the current question information based on the target word generation probability; and determining a loss function according to the reply prediction information, and adjusting the dialogue generating model based on the loss function.

Optionally, the processing the training sample by using a dialogue generating model, and the determining the importance score corresponding to the historical turn question information and the current question information includes: using a question separator and a reply separator to perform isolation processing on the question information and the reply information in the training sample; encoding the historical round question information by using an encoder of the dialogue generating model to obtain encoded hidden layer state information corresponding to the historical round question information; and obtaining the importance score between the current question information and the historical round question information according to the coded hidden layer state information.

Optionally, the encoding hidden layer state information includes: questioning the coded hidden layer state information and replying the coded hidden layer state information; the obtaining of the importance score between the current question information and the historical turn question information according to the encoded hidden layer state information includes: obtaining a first excitation function value corresponding to the question coding hidden layer state information by using a first excitation function of the dialogue generating model; obtaining a second excitation function value corresponding to the reply encoding hidden layer state information by using the first excitation function; obtaining the importance score based on the first excitation function value, the second excitation function value, and the second activation function.

Optionally, the determining an attention weight according to the importance score comprises: determining an initial weight of the historical round question information; taking the product of the initial weight and the corresponding importance score as a new weight; and summing new weights of all historical round questioning information to obtain the attention weight.

Optionally, the determining an initial weight of the historical round questioning information includes: obtaining a third excitation function value corresponding to the historical round question information by using a third activation function of the dialogue generating model; obtaining the initial weight based on the third excitation function value and the second activation function.

Optionally, the determining, by using the dialog generation model and according to the attention weight, a target word generation probability corresponding to the current round of question asking information includes: decoding by using a decoder of the dialogue generating model according to the attention weight to obtain decoding hidden layer state information corresponding to the current round question information; and obtaining the target word generation probability based on the decoding hidden layer state information and the second activation function.

Optionally, the generating reply prediction information corresponding to the current question information based on the target word generation probability includes: and selecting the candidate word with the highest target word generation probability as the target word to generate the reply prediction information.

Optionally, the determining a loss function according to the reply prediction information, and the adjusting the dialog generation model based on the loss function includes: and determining a cross entropy loss function according to the reply prediction information, and adjusting the dialogue generating model based on the cross entropy loss function.

According to a second aspect of the present disclosure, there is provided a dialog reply generation method, including: generating historical round question information corresponding to the historical dialogue information according to question information and reply information in the historical dialogue information; generating dialogue prediction information based on the historical round question information and the current question information; acquiring a trained dialog generation model, processing the dialog prediction information by using the dialog generation model, and generating reply prediction information corresponding to the current question information; wherein the dialogue generating model is obtained by training through the training method of any one of claims 1 to 8.

According to a third aspect of the present disclosure, there is provided a dialogue generating model training apparatus including: the query turn generation module is used for generating historical turn query information corresponding to the historical dialogue information according to the query information and the reply information in the historical dialogue information; the training sample generating module is used for generating a training sample based on the historical round question information and the current question information; the importance scoring module is used for processing the training samples by using a conversation generation model and determining importance scores corresponding to the historical round question information and the current question information; an attention weight module for determining an attention weight according to the importance score; a generation probability determination module, configured to determine, by using the dialog generation model and according to the attention weight, a generation probability of a target word corresponding to the current round of question asking information; the prediction information generation module is used for generating reply prediction information corresponding to the current question information based on the target word generation probability; and the model adjusting and processing module is used for determining a loss function according to the reply prediction information and adjusting the conversation generation model based on the loss function.

Optionally, the importance scoring module is configured to perform isolation processing on the question information and the reply information in the training sample by using a question splitter and a reply isolator; encoding the historical round question information by using an encoder of the dialogue generating model to obtain encoded hidden layer state information corresponding to the historical round question information; and obtaining the importance score between the current question information and the historical round question information according to the coded hidden layer state information.

Optionally, the encoding hidden layer state information includes: questioning the coded hidden layer state information and replying the coded hidden layer state information; the importance scoring module is further configured to obtain a first excitation function value corresponding to the question coding hidden layer state information by using a first excitation function of the dialog generation model; obtaining a second excitation function value corresponding to the reply encoding hidden layer state information by using the first excitation function; obtaining the importance score based on the first excitation function value, the second excitation function value, and the second activation function.

Optionally, the attention weighting module is configured to determine an initial weight of the historical round of question information; taking the product of the initial weight and the corresponding importance score as a new weight; and summing new weights of all historical round questioning information to obtain the attention weight.

Optionally, the attention weighting module is further configured to obtain a third excitation function value corresponding to the historical round question information by using a third activation function of the dialog generation model; obtaining the initial weight based on the third excitation function value and the second activation function.

Optionally, the generation probability determining module is configured to perform decoding processing according to the attention weight by using a decoder of the dialog generation model, and obtain decoding hidden layer state information corresponding to the current round question information; and obtaining the target word generation probability based on the decoding hidden layer state information and the second activation function.

Optionally, the prediction information generation module is specifically configured to select a candidate word with the highest target word generation probability as the target word, and generate the reply prediction information.

Optionally, the model adjustment processing module is specifically configured to determine a cross entropy loss function according to the reply prediction information, and perform adjustment processing on the dialog generation model based on the cross entropy loss function.

According to a fourth aspect of the present disclosure, there is provided a dialog reply generation apparatus, including: the historical information generating module is used for generating historical round question information corresponding to the historical dialogue information according to question information and reply information in the historical dialogue information; the dialogue information generation module is used for generating dialogue prediction information based on the historical round question information and the current question information; the prediction information processing module is used for acquiring a trained dialogue generating model, processing the dialogue prediction information by using the dialogue generating model and generating reply prediction information corresponding to the current question information; wherein, the dialogue generating model is obtained by training through the training method.

According to a fifth aspect of the present disclosure, there is provided a dialogue generating model training apparatus, including: a memory; and a processor coupled to the memory, the processor configured to perform the method as described above based on instructions stored in the memory.

According to a sixth aspect of the present disclosure, there is provided a dialog reply generation apparatus, comprising: a memory; and a processor coupled to the memory, the processor configured to perform the method as described above based on instructions stored in the memory.

According to a seventh aspect of the present disclosure, there is provided a computer readable storage medium storing computer instructions for execution by a processor to perform the method as described above.

According to the method and the device for training the conversation generation model and generating the conversation reply and the storage medium, the history turn conversation information is weighted by adopting a hierarchical attention mechanism, so that the weight between the related history turn conversation information and the question information of the current turn conversation is improved, the weight between the history turn conversation information irrelevant to the question information of the current turn conversation is reduced, the effect of information filtering is realized, the generation quality of the current turn reply is improved, the accuracy of the reply information is improved, and the use feeling of a user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive exercise.

FIG. 1 is a schematic flow chart diagram illustrating one embodiment of a dialog generation model training method according to the present disclosure;

FIG. 2 is a schematic flow chart diagram illustrating the determination of importance scores in one embodiment of a dialog generation model training method according to the present disclosure;

FIG. 3 is a schematic flow chart diagram illustrating the determination of importance scores in another embodiment of a dialogue generating model training method according to the present disclosure;

FIG. 4 is a schematic flow chart diagram illustrating the determination of attention weights in one embodiment of a dialog generation model training method according to the present disclosure;

FIG. 5 is a flow diagram of one embodiment of a dialog reply generation method according to the present disclosure;

FIG. 6 is a block diagram of one embodiment of a dialog generation model training apparatus according to the present disclosure;

FIG. 7 is a block diagram of one embodiment of a dialog reply generation apparatus according to the present disclosure;

FIG. 8 is a block diagram representation of another embodiment of a dialog generation model training apparatus according to the present disclosure;

fig. 9 is a block diagram of another embodiment of a dialog reply generation apparatus according to the present disclosure.

Detailed Description

The present disclosure now will be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the disclosure are shown. The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

The terms "first", "second", and the like are used hereinafter only for descriptive distinction and have no other special meaning.

Fig. 1 is a schematic flow chart diagram of an embodiment of a dialogue generating model training method according to the present disclosure, as shown in fig. 1:

step 101, generating historical round question information corresponding to the historical dialogue information according to the question information and the reply information in the historical dialogue information. The historical round question information comprises the questions in the one-time historical question and the corresponding reply information.

And 102, generating a training sample based on the historical round question information and the current question information. The current question information is the question information in the current turn of conversation.

And 103, processing the training sample by using a dialogue generation model, and determining the importance score corresponding to the historical round question information and the current question information. The dialogue generating model can be various models, such as a Transformer model. The Transformer model is a model proposed by google.

And 104, determining attention weight according to the importance score, and determining the generation probability of the target words corresponding to the current round of question information by using a dialogue generation model and according to the attention weight.

And 105, generating reply prediction information corresponding to the current question information based on the target word generation probability.

And 106, determining a loss function according to the reply prediction information, and adjusting the dialogue generating model based on the loss function.

The dialogue generation model training method improves an encoder-decoder framework based on a Transformer model, weights texts in a historical turn by utilizing an improved hierarchical attention mechanism, improves the weight of a dialogue turn related to a current turn dialogue, reduces the weight of a turn dialogue unrelated to the current turn dialogue, achieves an information filtering effect, and improves the generation quality of a current turn dialogue reply.

The question and reply text in a turn of a dialog is important when the turn is important for the current turn, otherwise, the question and reply text in the turn are both noise. Firstly, obtaining an importance score corresponding to each turn of conversation and current question information, and then regularizing an attention mechanism of a text in each turn of conversation by using the importance score to obtain an attention weight.

If the questions in the questioning text in the history kth turn of conversation are very close to the questions in the current questioning text in the current turn of conversation, the relevance between the kth turn of conversation and the current turn of conversation is large; if the question in the questioning text in the past kth turn of dialog is not close to the question in the current questioning text in the current turn of dialog, then it is likely that the answer to the kth turn of dialog will not resolve the question of the current turn of dialog (because the current turn of dialog is texting again the answer to the kth turn of dialog), then the kth turn of dialog is likely to be noisy.

For example, the kth turn of the dialog: asking questions: "what material is this watch mirror"? The answer is: "mineral crystal glass". The current turn of conversation: asking questions: how does the watch mirror hardness? The answer is: "very high hardness, military grade". As can be seen from the above, the question of the current turn of dialog is related to the question of the kth turn of dialog, and the kth turn of dialog facilitates the generation of the reply of the current turn of dialog.

Conversation of kth round: asking questions: "how hard the watch mirror material is"? The answer is: the mirror material is mineral crystal glass. The current turn of conversation: asking questions: "what is mineral crystal glass"? The answer is: "is a tempered glass, and has high transparency and excellent abrasion resistance". As can be seen from the above, the question of the current turn of dialog is related to the reply of the kth turn of dialog, the kth turn of dialog does not solve the user problem, and the kth turn of dialog does not contribute to the generation of the reply of the current turn of dialog.

In one embodiment, given a piece of historical dialog information, including K rounds of dialog, each round of dialog is asked by a question q_iAnd a reply r_iComposition, question text q of the i-th turn of dialog_i＝(q_i1,q_i2,…,q_im) Wherein q is_i1Reply text r representing a word, i-th turn of conversation_i＝(r_i1,r_i2,…,r_in) Wherein r is_i1Representing a word.

When the dialog generation model training is performed, it is assumed that the training target is a reply of the 5 th dialog, that is, the output target of the dialog generation model is y ═ r (r)₅₁,r₅₂,…,r_5n) The input is history information of the previous 4 rounds of conversations and the question of the 5 th round of conversations (current question information), that is, each training sample is:

x＝(q₁₁,…,q_1m,S_q1,r₁₁,…,r_1n,S_r1,q₂₁,…,q_2m,S_q2,r₂₁,…,r_2n,S_r2,…,q₅₁,…,q_5m) (1-1)；

wherein a question divider S is used_qiAnd reply to the separator S_riThe symbols are used as separators to separate the question and reply messages of each turn of dialog in the training sample.

Determining the importance score corresponding to the historical round of question information and the current question information may employ a variety of methods. FIG. 2 is a schematic flow chart of determining importance scores in an embodiment of a dialogue generating model training method according to the present disclosure, as shown in FIG. 2:

step 201, using the question separator and the reply isolator to perform isolation processing on the question information and the reply information in the training sample.

And step 202, carrying out encoding processing on the historical round question information by using an encoder of the dialogue generation model to obtain encoded hidden layer state information corresponding to the historical round question information.

And step 203, obtaining the importance score between the current question information and the historical round question information according to the coded hidden layer state information.

Fig. 3 is a schematic flowchart of determining an importance score according to another embodiment of the dialog generation model training method of the present disclosure, where the encoded hidden state information includes question encoded hidden state information and reply encoded hidden state information, as shown in fig. 3:

step 301, using a first excitation function of the dialog generation model, obtaining a first excitation function value corresponding to the question coding hidden layer state information.

Step 302, using the first excitation function, obtaining a second excitation function value corresponding to the reply encoding hidden layer state information.

Step 303, obtaining an importance score based on the first excitation function value, the second excitation function value and the second activation function.

In one embodiment, encoder f uses a dialog generation model, a Transformer model_encEncoding the historical round question information x to generate an encoded hidden layer state h:

h_i＝f_enc(x_i) (1-2)；

wherein x is_iFor the historical round questioning information of the ith round, h_iEncoding hidden layer state information corresponding to the historical round questioning information of the ith round.

Solving the importance between the question information of the historical turn and the question information of the current turn:

wherein h is_qkIs the question segmentation symbol S of the k-th turn of the dialog_qkCorresponding encoded hidden layer vector, h_rkIs the question segmentation symbol S of the k-th turn of the dialog_rkThe corresponding encoded hidden layer vector. h is_qk、h_rkIs a vector of 1 × p dimension, V_qP is a matrix, s_t-1Is a vector of p x 1 dimensions, s_t-1Is the hidden layer vector of the decoder at time t-1, h_qkV_qs_t-1A real number is obtained after vector multiplication; wherein h is_qk、h_rk、S_rk、V_q、s_t-1Is random initialization data generated for the model.

The sigmoid function is a first excitation function, the softmax function is a second activation function,

that is, the importance score of the input kth turn dialog (corresponding to the current question information) at time t.

Determining the attention weight may take a variety of approaches. FIG. 4 is a schematic flow chart of determining attention weights in an embodiment of a dialogue generating model training method according to the present disclosure, as shown in FIG. 4:

step 401, determining an initial weight of the historical round questioning information.

Step 402, the product of the initial weight and the corresponding importance score is used as a new weight.

And step 403, summing the new weights of all the historical round questioning information to obtain the attention weight.

Determining the initial weight of the historical round questioning information may employ a variety of methods. For example, a third excitation function value corresponding to the history turn question information is obtained using a third activation function of the dialogue generating model, and the initial weight is obtained based on the third excitation function value and the second activation function.

Various methods can be adopted for determining the generation probability of the target words corresponding to the current round of question-asking information. For example, a decoder using a dialog generation model performs decoding processing according to the attention weight to obtain decoded hidden layer state information corresponding to the current round question information, and a target word generation probability is obtained based on the decoded hidden layer state information and the second activation function.

In one embodiment, the word-level attention weight is solved using the following formula:

wherein,

W_a、V_aare all weight matrices, random initialization data generated for the model.

tanh is a third activation function of the dialog generation model, the transform model.

Wherein,

and (5) initial weight of the historical round questioning information of the ith round.

Wherein,

namely the attention weight of the ith word in the history round questioning information of the kth round. Decoding in this way:

s_t＝f_dec(s_t-1,y_t-1,c_t) (1-9)；

wherein,

to take care of the vector after the effort mechanism,

i.e. time t to input x_i(or h)_i) Attention weight of (1), representing x_iImportance at time t; y _ t-1 is the standard output at time t-1.

The decoder calculates a generation probability meter of the target word w according to the decoding hidden layer state s:

P_vocab(w)＝softmax(W_bs_t) (1-10)

wherein, W_bIs a weight matrix.

And selecting the candidate word with the highest target word generation probability as the target word to generate the reply prediction information. And determining a cross entropy loss function according to the reply prediction information, and adjusting the dialogue generating model based on the cross entropy loss function.

For example, the loss function is a cross-entropy loss function: l ═ Σ_ilog(P_vocab(w))。

And adjusting the dialogue generating model based on the existing cross entropy loss function. Cross Entropy (Cross Entropy) is used for measuring difference information between two probability distributions, the Cross Entropy can be used as a loss function in machine learning to measure similarity between two values, and when a sigmoid function is used, the problem that the learning rate of a mean square error loss function is reduced when the gradient is reduced can be solved.

Fig. 5 is a flowchart illustrating a dialog reply generation method according to an embodiment of the present disclosure, as shown in fig. 5:

step 501, generating historical round question information corresponding to the historical dialogue information according to the question information and the reply information in the historical dialogue information.

Step 502, generating dialogue forecast information based on the historical round question information and the current question information.

Step 503, obtaining the trained dialog generating model, processing the dialog prediction information by using the dialog generating model, and generating reply prediction information corresponding to the current question information. The dialog generation model is obtained by training through the dialog generation model training method in any one of the above embodiments.

In one embodiment, as shown in fig. 6, the present disclosure provides a dialog generation model training apparatus 60, which includes a questioning turn generation module 61, a training sample generation module 62, an importance scoring module 63, an attention weight module 64, a generation probability determination module 65, a prediction information generation module 66, and a model adjustment processing module 67.

The question round generation module 61 generates history round question information corresponding to the history dialogue information according to the question information and the reply information in the history dialogue information. Training sample generation module 62 generates training samples based on the historical round question information and the current question information. The importance scoring module 63 processes the training samples using the dialogue generation model to determine an importance score corresponding to the historical round question information and the current question information.

The attention weight module 64 determines an attention weight based on the importance score. The generation probability determination module 65 determines the target word generation probability corresponding to the current round of question information based on the attention weight using the dialogue generation model. The prediction information generation module 66 generates reply prediction information corresponding to the current question information based on the target word generation probability. The model adjustment processing module 67 determines a loss function according to the reply prediction information, and performs adjustment processing on the dialogue generating model based on the loss function.

In one embodiment, the importance scoring module 63 uses the question segmenter and the reply delimiter to isolate the question information and the reply information in the training sample. The importance scoring module 63 performs encoding processing on the historical round question information by using an encoder of the dialogue generation model, and obtains encoded hidden layer state information corresponding to the historical round question information. The importance scoring module 63 obtains importance scores between the current question information and the historical turn question information according to the encoded hidden layer state information.

The coded hidden layer state information comprises question coded hidden layer state information and reply coded hidden layer state information. The importance scoring module 63 obtains a first excitation function value corresponding to the question-coding hidden-layer state information using a first excitation function of the dialog generation model. The importance scoring module 63 obtains a second excitation function value corresponding to the reply encoding hidden layer state information using the first excitation function. The importance score module 63 obtains an importance score based on the first excitation function value, the second excitation function value, and the second activation function.

In one embodiment, attention weight module 64 determines an initial weight for the historical round of questioning information, taking the product of the initial weight and the corresponding importance score as the new weight. The attention weight module 64 sums the new weights of all the historical round questioning information to obtain the attention weight.

The attention weighting module 64 obtains a third excitation function value corresponding to the historical round question information using a third activation function of the dialogue generating model. The attention weight module 64 obtains an initial weight based on the third excitation function value and the second activation function.

In one embodiment, the generation probability determination module 65 uses a decoder of the dialog generation model to perform decoding processing according to the attention weight, and obtains decoded hidden layer state information corresponding to the current round question information. The generation probability determination module 65 obtains the target word generation probability based on the decoded hidden layer state information and the second activation function.

The prediction information generation module 66 selects the candidate word with the highest target word generation probability as the target word, and generates the reply prediction information. The model adjustment processing module 67 determines a cross entropy loss function according to the reply prediction information, and performs adjustment processing on the dialogue generating model based on the cross entropy loss function.

In one embodiment, as shown in fig. 7, the present disclosure provides a dialog reply generation apparatus 70 including a history information generation module 71, a dialog information generation module 72, and a prediction information processing module 73. The historical information generating module 71 generates historical round question information corresponding to the historical dialogue information according to the question information and the reply information in the historical dialogue information. The dialogue information generation module 72 generates dialogue prediction information based on the historical round question information and the current question information. The prediction information processing module 73 processes the dialogue prediction information using the trained dialogue generating model, and generates reply prediction information corresponding to the current question information.

FIG. 8 is a block diagram of another embodiment of a dialog generating model training apparatus according to the present disclosure. As shown in fig. 8, the apparatus may include a memory 81, a processor 82, a communication interface 83, and a bus 84. The memory 81 is used for storing instructions, the processor 82 is coupled to the memory 81, and the processor 82 is configured to execute a training method for implementing the above-described dialog generation model based on the instructions stored in the memory 81.

The memory 81 may be a high-speed RAM memory, a non-volatile memory (non-volatile memory), or the like, and the memory 81 may be a memory array. The storage 81 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules. The processor 82 may be a central processing unit CPU, or an application Specific Integrated circuit asic, or one or more Integrated circuits configured to implement the dialog generation model training method of the present disclosure.

Fig. 9 is a block diagram of another embodiment of a dialog reply generation apparatus according to the present disclosure. As shown in fig. 9, the apparatus may include a memory 91, a processor 92, a communication interface 93, and a bus 94. The memory 91 is used for storing instructions, the processor 92 is coupled to the memory 91, and the processor 92 is configured to execute the dialog reply generation method described above based on the instructions stored in the memory 91.

The memory 91 may be a high-speed RAM memory, a non-volatile memory (non-volatile memory), or the like, and the memory 91 may be a memory array. The storage 91 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules. The processor 92 may be a central processing unit CPU, or an application Specific Integrated circuit asic, or one or more Integrated circuits configured to implement the dialog reply generation method of the present disclosure.

In one embodiment, the present disclosure provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement a dialog generation model training method as in any of the above embodiments.

In the method and the device for training the dialog generation model and generating the dialog reply in the embodiment and the storage medium, the historical round dialog information is weighted by adopting a hierarchical attention mechanism, so that the weight between the relevant historical round dialog information and the question information of the current round dialog is improved, the weight between the historical round dialog information which is irrelevant to the question information of the current round dialog is reduced, the effect of information filtering is realized, the generation quality of the current round reply is improved, the accuracy of the reply information is improved, and the use experience of a user is improved.

The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A dialog generation model training method, comprising:

generating historical round question information corresponding to the historical dialogue information according to question information and reply information in the historical dialogue information;

generating a training sample based on the historical round question information and the current question information;

processing the training sample by using a dialogue generating model, and determining importance scores corresponding to the historical round question information and the current question information;

determining attention weight according to the importance score, and determining target word generation probability corresponding to the current round of question-asking information by using the dialogue generation model and according to the attention weight;

generating reply prediction information corresponding to the current question information based on the target word generation probability;

and determining a loss function according to the reply prediction information, and adjusting the dialogue generating model based on the loss function.

2. The method of claim 1, the processing the training sample using a dialog generation model, the determining an importance score for the historical round of question information corresponding to the current question information comprising:

using a question separator and a reply separator to perform isolation processing on the question information and the reply information in the training sample;

encoding the historical round question information by using an encoder of the dialogue generating model to obtain encoded hidden layer state information corresponding to the historical round question information;

and obtaining the importance score between the current question information and the historical round question information according to the coded hidden layer state information.

3. The method of claim 2, the encoding the hidden layer state information comprising: questioning the coded hidden layer state information and replying the coded hidden layer state information; the obtaining of the importance score between the current question information and the historical turn question information according to the encoded hidden layer state information includes:

obtaining a first excitation function value corresponding to the question coding hidden layer state information by using a first excitation function of the dialogue generating model;

obtaining a second excitation function value corresponding to the reply encoding hidden layer state information by using the first excitation function;

obtaining the importance score based on the first excitation function value, the second excitation function value, and the second activation function.

4. The method of claim 3, the determining an attention weight from the importance score comprising:

determining an initial weight of the historical round question information;

taking the product of the initial weight and the corresponding importance score as a new weight;

and summing new weights of all historical round questioning information to obtain the attention weight.

5. The method of claim 4, the determining an initial weight for the historical round of questioning information comprising:

obtaining a third excitation function value corresponding to the historical round question information by using a third activation function of the dialogue generating model;

obtaining the initial weight based on the third excitation function value and the second activation function.

6. The method of claim 4, wherein determining a target word generation probability corresponding to the current round of question information using the dialog generation model and according to the attention weight comprises:

decoding by using a decoder of the dialogue generating model according to the attention weight to obtain decoding hidden layer state information corresponding to the current round question information;

and obtaining the target word generation probability based on the decoding hidden layer state information and the second activation function.

7. The method of claim 1, the generating reply prediction information corresponding to the current question information based on the target word generation probability comprising:

and selecting the candidate word with the highest target word generation probability as the target word to generate the reply prediction information.

8. The method of any of claims 1 to 7, wherein determining a loss function from the reply prediction information, and adjusting the dialog generation model based on the loss function comprises:

and determining a cross entropy loss function according to the reply prediction information, and adjusting the dialogue generating model based on the cross entropy loss function.

9. A dialog reply generation method, comprising:

generating dialogue prediction information based on the historical round question information and the current question information;

acquiring a trained dialog generation model, processing the dialog prediction information by using the dialog generation model, and generating reply prediction information corresponding to the current question information;

wherein the dialogue generating model is obtained by training through the training method of any one of claims 1 to 8.

10. A dialog generation model training apparatus comprising:

the query turn generation module is used for generating historical turn query information corresponding to the historical dialogue information according to the query information and the reply information in the historical dialogue information;

the training sample generating module is used for generating a training sample based on the historical round question information and the current question information;

the importance scoring module is used for processing the training samples by using a conversation generation model and determining importance scores corresponding to the historical round question information and the current question information;

an attention weight module for determining an attention weight according to the importance score;

a generation probability determination module, configured to determine, by using the dialog generation model and according to the attention weight, a generation probability of a target word corresponding to the current round of question asking information;

the prediction information generation module is used for generating reply prediction information corresponding to the current question information based on the target word generation probability;

and the model adjusting and processing module is used for determining a loss function according to the reply prediction information and adjusting the conversation generation model based on the loss function.

11. The apparatus of claim 10, wherein,

the importance scoring module is used for isolating the question information and the reply information in the training sample by using a question separator and a reply isolator; encoding the historical round question information by using an encoder of the dialogue generating model to obtain encoded hidden layer state information corresponding to the historical round question information; and obtaining the importance score between the current question information and the historical round question information according to the coded hidden layer state information.

12. The apparatus of claim 11, the encoding the hidden layer state information comprising: questioning the coded hidden layer state information and replying the coded hidden layer state information;

the importance scoring module is further configured to obtain a first excitation function value corresponding to the question coding hidden layer state information by using a first excitation function of the dialog generation model; obtaining a second excitation function value corresponding to the reply encoding hidden layer state information by using the first excitation function; obtaining the importance score based on the first excitation function value, the second excitation function value, and the second activation function.

13. The apparatus of claim 12, wherein,

the attention weight module is used for determining the initial weight of the historical round question information; taking the product of the initial weight and the corresponding importance score as a new weight; and summing new weights of all historical round questioning information to obtain the attention weight.

14. The apparatus of claim 13, wherein,

the attention weight module is further used for obtaining a third excitation function value corresponding to the historical round question information by using a third activation function of the dialogue generating model; obtaining the initial weight based on the third excitation function value and the second activation function.

15. The apparatus of claim 13, wherein,

the generation probability determination module is used for decoding by using a decoder of the dialogue generation model according to the attention weight to obtain decoding hidden layer state information corresponding to the current round question information; and obtaining the target word generation probability based on the decoding hidden layer state information and the second activation function.

16. The apparatus of claim 10, wherein,

the prediction information generation module is specifically configured to select a candidate word with the highest target word generation probability as the target word, and generate the reply prediction information.

17. The apparatus of any one of claims 10 to 16,

the model adjustment processing module is specifically configured to determine a cross entropy loss function according to the reply prediction information, and adjust the dialog generation model based on the cross entropy loss function.

18. A dialog reply generation apparatus comprising:

the historical information generating module is used for generating historical round question information corresponding to the historical dialogue information according to question information and reply information in the historical dialogue information;

the dialogue information generation module is used for generating dialogue prediction information based on the historical round question information and the current question information;

the prediction information processing module is used for acquiring a trained dialogue generating model, processing the dialogue prediction information by using the dialogue generating model and generating reply prediction information corresponding to the current question information;

19. A dialog generation model training apparatus comprising:

a memory; and a processor coupled to the memory, the processor configured to perform the method of any of claims 1-8 based on instructions stored in the memory.

20. A dialog reply generation apparatus comprising:

21. A computer-readable storage medium having stored thereon computer instructions for execution by a processor of the method of any one of claims 1 to 8.