CN113495943B

CN113495943B - Man-machine dialogue method based on knowledge tracking and transferring

Info

Publication number: CN113495943B
Application number: CN202010253520.9A
Authority: CN
Inventors: 陈竹敏; 孟川; 任鹏杰; 孙维纬; 任昭春
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2020-04-02
Filing date: 2020-04-02
Publication date: 2023-07-14
Anticipated expiration: 2040-04-02
Also published as: CN113495943A

Abstract

The invention discloses a man-machine dialogue method based on knowledge tracking and transferring, which comprises the following steps: step one, constructing a model with knowledge tracking and transferring functions; the model adopts a coding-decoding framework based on deep learning, and comprises a coding layer, a knowledge tracking transfer layer and a decoding layer; training model parameters by using a priori-posterior dual learning mechanism according to the constructed model; and thirdly, after training, fixing all model parameters, and then carrying out actual dialogue application. The method disclosed by the invention further improves the suitability of knowledge selection, and helps the model generate replies with higher user experience; secondly, an unsupervised prior-posterior dual learning mechanism not only enhances interaction between knowledge tracking and transfer, ensures that prediction accuracy of the knowledge tracking and the transfer is improved simultaneously, but also remarkably reduces dependence of a model on manual annotation data.

Description

Man-machine dialogue method based on knowledge tracking and transferring

Technical Field

The invention belongs to the field of intelligent man-machine conversation, and particularly relates to a man-machine conversation method based on knowledge tracking and transferring.

Background

Man-machine conversations, i.e., humans can interact naturally with machines in the form of natural language (i.e., human language). The degree of intelligence of a human-machine dialog system can often be used to measure the extent of development of today's artificial intelligence technology, so building a sufficiently intelligent human-machine dialog model is a long-term goal of the artificial intelligence era. At present, related products of man-machine conversation are gradually applied to human real life, and great convenience is brought to human life.

The dialogue model has many challenges to be solved, and the information of the generated reply of the model is guaranteed. Currently, many research methods promote the "information content" of replies by introducing an external knowledge base (text fragments can be crawled from on-line resources such as the hundred degrees encyclopedia) containing a large number of knowledge text fragments, and this method is a knowledge-based dialogue method. Specifically, knowledge-based dialog methods are to solve two tasks: 1. knowledge selection, namely selecting a piece of knowledge to be chat from a knowledge base according to a dialogue environment; 2. and generating a reply, and generating a final reply according to the selected knowledge piece. Since the selected knowledge content directly determines the topic of the reply content, improper knowledge can directly lead to improper replies, so that the improvement of the knowledge selection suitability of the knowledge-based dialogue method is important.

However, the current mainstream methods still have two significant disadvantages in knowledge selection. First, in terms of model construction, almost all methods take only dialogue context (user's current input and previous dialogue history) to match knowledge in the knowledge base, without explicit modeling knowledge tracking and transfer. Knowledge tracking and transfer is to locate the already chatted knowledge in the dialogue history (knowledge tracking) and then to infer the next chatted knowledge together based on the chatted knowledge and the dialogue context (knowledge transfer). Knowledge tracking and transfer may additionally capture interactions and inferences relationships between already and about to chat knowledge, which may further promote suitability of knowledge selection as compared to using only a single context information.

Secondly, in the aspect of model training, the current mainstream method is data-driven, and knowledge selection and reply generation are all very dependent on large-scale manual annotation data for supervised learning, so that the acquisition cost of the data is very high. However, few studies are currently performed to explore the use of an unsupervised learning method to promote knowledge selection, thereby reducing reliance on annotation data.

Disclosure of Invention

In order to solve the technical problems, the invention provides a man-machine conversation method based on knowledge tracking and transferring, which further improves the suitability of knowledge selection and helps a model to generate a reply with higher user experience; secondly, an unsupervised prior-posterior dual learning mechanism not only enhances interaction between knowledge tracking and transfer, ensures that prediction accuracy of the knowledge tracking and the transfer is improved simultaneously, but also remarkably reduces dependence of a model on manual annotation data.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

a man-machine conversation method based on knowledge tracking and transferring comprises the following steps:

step one, constructing a model with knowledge tracking and transferring functions;

the model adopts a coding-decoding framework based on deep learning, and comprises a coding layer, a knowledge tracking transfer layer and a decoding layer;

training model parameters by using a priori-posterior dual learning mechanism according to the constructed model;

and thirdly, after training, fixing all model parameters, and then carrying out actual dialogue application.

In the above scheme, the coding layer comprises a BERT encoder for encoding the knowledge base and the dialog context into hidden state representations respectively;

the knowledge tracking transfer layer comprises a priori knowledge tracker pri, a knowledge translator shi and a posterior knowledge tracker pos, wherein the priori knowledge tracker pri takes the hidden state representation of the dialogue context as input, predicts a priori knowledge tracking distribution on all text fragments in a knowledge base, and can sample a tracked knowledge (i.e. "chatty knowledge") and the hidden state representation thereof; the knowledge transferor shi takes the tracked knowledge and the hidden state representation of the dialogue context as input together, predicts a knowledge transfer distribution over all text segments in the knowledge base, from which a transferred knowledge (i.e. "knowledge to chat") and its hidden state representation can be sampled; the posterior knowledge tracker pos additionally takes as input a hidden state representation of the transferred knowledge, predicts a posterior knowledge tracking distribution over all text segments in the knowledge base, from which a tracked knowledge and its hidden state representation can be sampled;

the decoding layer includes a transform decoder that uses the transferred knowledge as input along with the hidden state representation of the dialog context to generate a final reply word by word.

In the above scheme, a dual closed loop is formed between the posterior knowledge tracker pos and the knowledge transferor shi, and the posterior knowledge tracker pos is only executed during model training, but not executed during model application.

In the above scheme, in the training process of the second step, the posterior knowledge tracker pos and the knowledge transferer shi are mutually dual tasks, so that the posterior knowledge tracker pos and the knowledge transferer shi are mutually guided and lifted in an unsupervised manner, iterated for a plurality of times until convergence is achieved, and the prior knowledge tracker pri benefits from dual interaction of the posterior knowledge tracker pos and the knowledge transferer shi at the same time; the specific training process is as follows:

step1: training the warming up, namely maximizing the probability of marking data in a training set by using maximum likelihood estimation, and ending the warming up training after the training is performed until parameters are converged;

step2: starting single-round iteration, firstly guiding and lifting a knowledge transferor shi through a posterior knowledge tracker pos;

step3: lifting the posterior knowledge tracker pos through knowledge transferer shi guidance;

step4: using KL divergence loss, enabling the prior knowledge tracking distribution to simulate and approximate the posterior knowledge tracking distribution, and ensuring that the prior knowledge tracker pri can acquire benefits from dual learning even if the prior knowledge tracker pri is not in a closed loop of the dual learning; so far, the single round iteration is ended;

step5: step2-Step4 are repeatedly executed to form a plurality of iterations until the model parameters further converge.

In the above scheme, step three, in the actual dialogue application, given the knowledge base and the dialogue context including the user input, the BERT encoder is firstly executed to obtain the hidden state representations of the knowledge base and the dialogue context, then the prior knowledge tracker pri and the knowledge transferer shi are sequentially executed to complete knowledge tracking and transferring, and finally the transferred knowledge representation pushed by the knowledge transferer shi is fed to the converter decoder to generate the final reply.

Through the technical scheme, the man-machine dialogue method based on knowledge tracking and transferring provided by the invention has the advantages that in the aspect of model construction, knowledge tracking and knowledge transferring are explicitly modeled; in the aspect of model training, an unsupervised prior-posterior dual learning mechanism is used for training model parameters, the learning mechanism regards knowledge tracking and transfer as dual tasks, and the knowledge tracking and transfer are automatically guided by each other in an unsupervised manner, so that the model parameters are lifted together, and a plurality of iterations are performed until convergence. Compared with the prior art, the method has the following beneficial effects:

1. explicit modeling knowledge tracking and knowledge transfer may additionally capture interactions and reasoning relationships between already and about to chat knowledge, and additional cues may further promote suitability of knowledge selection, as compared to using only a single context information, thereby helping the model to generate replies for higher user experiences.

2. The prior-posterior dual learning mechanism enables knowledge tracking and transferring to be guided automatically and mutually in an unsupervised mode, and the knowledge tracking and transferring are promoted jointly. The method not only enhances interaction between knowledge tracking and transfer, ensures that prediction accuracy of the knowledge tracking and the transfer is improved simultaneously, but also obviously reduces dependence of the model on manual annotation data.

3. The invention further distinguishes knowledge tracking into priori knowledge tracking and posterior knowledge tracking, and uses the priori-posterior dual learning mechanism to match and optimize, which successfully solves the problem of dual incompatibility existing in model training and application (the posterior knowledge tracking can not obtain transferred knowledge as input when the model is applied, thus only the priori knowledge tracking can be executed), and ensures that the priori knowledge tracking can acquire benefits from dual learning even if the prior knowledge tracking is not in a closed loop of dual learning.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a schematic flow diagram of a human-machine conversation method based on knowledge tracking and transfer according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the method of the present invention during model training;

FIG. 3 is a schematic diagram of an optimization knowledge translator in a single iteration of the method of the present invention during model training;

FIG. 4 is a schematic diagram of an optimization posterior knowledge tracker in a single iteration of the method of the present invention during model training;

FIG. 5 is a schematic diagram of a priori knowledge tracking distribution approaching posterior knowledge tracking distribution during model training in the method of the present invention;

FIG. 6 is a schematic representation of the method of the present invention when the model is applied.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

The invention provides a man-machine dialogue method based on knowledge tracking and transferring, which specifically comprises the following steps:

step1: and constructing a model with knowledge tracking and transferring functions.

In the τ -th round of the dialogue, a knowledge base composed of |K| text segments is given

(text segment K) _i From |K _i Phrase composition), given dialog context C _τ ＝(X _τ-1 ,Y _τ-1 ,X _τ ) (X is the user's input and Y is the model's reply, where interaction records for the user at τ -1 round and the model's input at τ are defined as contexts), the task of the model is to first select an appropriate text segment from the knowledge base KThen generate a text segment from the text segment _τ Reply of word composition

At the coding layer, for dialog context C _τ Using BERT _base Encoder and averaging pooling operations to obtain its hidden state representation

Where d represents the hidden state dimension of 768 dimensions and p represents the average pooling operation. Knowledge base is built by the same method

Coded as hidden state representation +.>

At the knowledge tracking and transfer layer, the a priori knowledge tracker pri represents the hidden state of the dialog context of τ -1 round

As input, a priori knowledge tracking distribution P (k|pri) on the knowledge base K is predicted:

wherein Mlp (·) =.w+b represents the multi-layer perceptron (Multilayer Perceptron), W and b are trainable parameters. [ ·; and (4) representing vector splicing operation, and T representing matrix transposition operation.

Knowledge transferor shi presents the hidden state of the dialog context

And hidden state representation of the tracked knowledge

As input, a knowledge transfer distribution P (K|K) on a knowledge base K is predicted _tra ,shi)：

The posterior knowledge tracker pos represents the hidden state of the tau-1 round of dialog context

And the implicit state representation of the knowledge transferred +.>

As input, a posterior knowledge tracking profile P (K|K) on the knowledge base K is predicted _shi ,pos):

Generating Y word by word at the decoding layer using a transducer as a decoder _τ . The decoder will talk context X _τ And knowledge K transferred to _shi Corresponding unadjusted pooled hidden state representation

And->

As input. During model training, K _shi The transferred knowledge marked in the training set; k when the model is applied _shi The knowledge transfer distribution P (k|k) predicted for knowledge transfer shi _tra Shi) the most probable knowledge text segment.

Since the decoding process is a multi-time-step cyclic process, the decoder will be described in detail below to generate the word y at the t decoding time step _τ,t Is a detailed procedure of (2). Given the hidden state vector of the t-th decoding time step

Wherein emb (·) represents the word embedding representation of the fetch · h is expressed as _τ,t Mapping to a predefined vocabulary v= { V ₁ ,v ₂ ,…,v _|V| On }, a probability distribution P (y) _τ,t )：

P(y _τ,t )＝softmax(Mlp(hτ _,t ))∈R ^|V| , (13)

At the time of model application, P (y _τ,t ) Word v with maximum corresponding probability in distribution as generated word y of t-th time step _τ,t . So far, the calculation of the t time step is finished, and the decoder is updated to obtain the decoder hidden state h tau of the t+1st time step _,t+1 A new cycle is started:

when the model is applied, after the decoding cycle is finished, the words output by each step sequentially form a complete final reply Y _τ 。

Step2: model parameters are trained using a priori-posterior dual learning mechanism from the model obtained in step1 using knowledge-based dialogue datasets currently disclosed in industry and academia. Specifically, the training mechanism makes the posterior knowledge tracker pos and the knowledge translator shi be dual tasks, makes the posterior knowledge tracker pos and the knowledge translator shi guide and promote each other in an unsupervised manner, makes the posterior knowledge tracker pos and the knowledge translator shi iterate until convergence (in each iteration, the posterior knowledge tracker pos and the knowledge translator shi are alternately optimized), and ensures that the prior knowledge tracker pri can benefit from dual interactions. Figure 2 shows a schematic representation of the method of the invention during model training.

This step is achieved by the following procedure.

Step1: the goal of the warm-up training is to maximize the probability of labeling data in the training set using maximum likelihood estimation (MLE for short), where the following loss functions are defined:

L _pri (θ)＝-logP(K _{tra_label} |pri), (15)

L _pos (θ)＝-logP(K _{tra_label} |K _{shi_label} ,pos), (16)

L _shi (θ)＝-logP(K _{shi_label} |K _{tra_label} ,shi), (17)

wherein θ is all trainable parameters in the model, and tra_label and shi_label correspond to the tracked knowledge and the transferred knowledge marked in the training set respectively. L (L) _pri (θ) is a priori knowledge tracking loss, L _pos (θ) is posterior knowledge tracking loss, L _shi (θ) is knowledge transfer loss, L _g And (θ) is the recovery generation loss.

Obtaining a final loss function L (theta):

L(θ)＝L _pri (θ)+L _pos (θ)+L _shi (θ)+L _g (θ). (19)

all parameters of the model and the word embedding matrix are then updated with a back propagation algorithm commonly used for deep learning (abbreviated as BP algorithm) to reduce the loss. After the training is converged, the warm-up training phase is ended.

Step2: a single iteration starts. A training sample is sampled from the training set, and then a posterior knowledge tracker pos is used for guiding and lifting the knowledge transferor shi. As shown in FIG. 3, the tracked knowledge of the annotations is represented

As input to the knowledge transferor shi, the knowledge transferor shi generates a knowledge transfer profile P (k|k _{tra_label} Shi). Sampling a knowledge from the distribution as transferred knowledge K _{shi_sample} And express it +.>

Feeding a posterior knowledge tracker pos to obtain annotated tracked knowledge K _{tra_label} Is the inverse probability P (K) _{tra_label} |K _{shi_sample} Pos), consider the thrust back probability as a "reward":

E(R)＝E[RlogP(K _{shi_sample} |K _{tra_label} ,shi)], (20)

R＝log[P(K _{tra_label} |K _{shi_sample} ,pos)], (21)

wherein R is the output distribution of the secondary shiP(K|K _{tra_label} Knowledge K of the transfer of samples in shi) _{shi_sample} Is a "reward" of (a). E [. Cndot.]To achieve the desired result. Subsequently, the strategy gradient (policy gradient) method is used to maximize the desired E (R) of the reward and calculate the parameter θ ₁ ＝[θ _embedding ,θ _encoder ,θ _shi ](θ _embedding For word embedding matrix, θ _encoder θ, the parameter of the encoder _sｈi Parameters for knowledge transferor shi):

then according to the gradient

Updating the parameter θ ₁ 。

Step3: similar to Step2, the posterior knowledge tracker pos will be lifted in a similar manner by the knowledge transferor shi, as shown in FIG. 4, to finally optimize the parameter θ ₂ ＝[θ _embedding ,θ _encoder ,θ _pos ](θ _pos Parameters for a posterior knowledge tracker pos).

Step4: after the above-mentioned dual optimization of the knowledge transferer shi and the a priori knowledge tracker pos, the a priori knowledge tracker pos needs to be transferred from the revenue in the process to the a priori knowledge tracker pri. As shown in FIG. 5, the knowledge K to which the annotations in the training set are transferred _{shi_label} Latent state representation of (1)

As input to the already optimized posterior tracker pos, a posterior tracking profile P (k|k) _{shi_label} Pos). Causing the distribution P (K|pri) to approach P (K|K) _{shi_label} Pos), loss of divergence by KL: />

Because training is done in an unsupervised manner, inaccurate "rewards" are difficult to avoid. To mitigate the adverse effects of this phenomenon, KL divergence loss L is reduced _kl (θ) and MLE loss [ L ] _pos (θ),L _shi (θ),L _g (θ)]Linear summation for joint training:

L(θ)＝L _kl (θ)+γ[L _pos (θ)+L _sｈi (θ)+L _g (θ)] (24)

where γ is a hyper-parameter that acts to control the proportion of MLE loss. The setting of γ in this method is 0.5. So far, a single iteration ends.

Step5: step2-Step4 are repeatedly executed to form a plurality of iterations until the model parameters further converge. Convergence is the end of the whole training process.

Step3: actual dialog applications.

After model training is completed, parameters of the model are all fixed. At this point, the model can be applied to the actual dialog scene.

As shown in fig. 6, a given knowledge base K is associated with a dialog context C containing user input _τ The encoder is executed to acquire hidden state representations of the two, the prior knowledge tracker pri and the knowledge transferer shi are executed in sequence to complete knowledge tracking and transferring, and finally the transferred knowledge representation deduced by the knowledge transferer shi is fed to the decoder to generate a final reply Y _τ 。

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A man-machine conversation method based on knowledge tracking and transferring is characterized by comprising the following steps:

step three, after training is completed, all model parameters are fixed, and then actual dialogue application is carried out;

the coding layer comprises a BERT coder which respectively codes the knowledge base and the dialogue context into a hidden state representation;

the knowledge tracking transfer layer comprises a priori knowledge tracker pri, a knowledge translator shi and a posterior knowledge tracker pos, wherein the priori knowledge tracker pri takes the hidden state representation of the dialogue context as input, predicts a priori knowledge tracking distribution on all text fragments in a knowledge base, and can sample a tracked knowledge and the hidden state representation thereof from the distribution; the knowledge transferor shi takes the tracked knowledge and the hidden state representation of the dialogue context as input together, predicts a knowledge transfer distribution over all text segments in the knowledge base, from which a transferred knowledge and its hidden state representation can be sampled; the posterior knowledge tracker pos additionally takes as input a hidden state representation of the transferred knowledge, predicts a posterior knowledge tracking distribution over all text segments in the knowledge base, from which a tracked knowledge and its hidden state representation can be sampled;

the decoding layer comprises a transform decoder which takes the transferred knowledge and the hidden state representation of the dialogue context as inputs together, and generates a final reply word by word;

in the training process of the second step, the posterior knowledge tracker pos and the knowledge transferer shi are mutually dual tasks, so that the posterior knowledge tracker pos and the knowledge transferer shi are mutually guided and lifted in an unsupervised mode, multiple iterations are performed until convergence is achieved, and the prior knowledge tracker pri benefits from dual interaction of the posterior knowledge tracker pos and the knowledge transferer shi at the same time; the specific training process is as follows:

step4: performing joint training on the linear sum of KL divergence loss and MLE loss, so that the prior knowledge tracking distribution is simulated and approximates to posterior knowledge tracking distribution; so far, the single round iteration is ended;

2. A human-machine conversation method based on knowledge tracking and transferring as claimed in claim 1 wherein a dual loop is formed between a posterior knowledge tracker pos and a knowledge transferor shi, and the posterior knowledge tracker pos is only executed during model training and is not executed during model application.

3. The method according to claim 1, wherein, in the actual dialogue application, given the knowledge base and the dialogue context including the user input, the BERT encoder is firstly executed to obtain the hidden state representation of the knowledge base and the dialogue context, then the prior knowledge tracker pri and the knowledge transferor shi are sequentially executed to complete the knowledge tracking and transferring, and finally the transferred knowledge pushed by the knowledge transferor shi is fed to the transferor decoder to generate the final reply.