CN111046134A

CN111046134A - Dialog generation method based on replying person personal feature enhancement

Info

Publication number: CN111046134A
Application number: CN201911062516.8A
Authority: CN
Inventors: 贺瑞芳; 王瑞芳; 常金鑫; 王龙标; 党建武
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-11-03
Filing date: 2019-11-03
Publication date: 2020-04-21
Anticipated expiration: 2039-11-03
Also published as: CN111046134B

Abstract

The invention discloses a dialog generating method based on replying person personal feature enhancement, which comprises the following steps: 1) constructing 2 encoder-decoder basic frameworks; 2) constructing a VAE model based on vMF distribution on an encoder-decoder model by utilizing vMF distribution as a personal feature extractor to obtain a replier personal feature latent variable based on context; 3) and constructing a CVAE generation model on an encoder-decoder model by utilizing the personal characteristic latent variables and vMF distributed on the encoder-decoder model as an information enhancement generator to obtain a response fusing the personal characteristic latent variables and context of the replying person. According to the dialog generation method, the response of the personal characteristics of the respondent can be effectively reflected and a better result can be obtained on the relevant evaluation indexes by modeling the personal characteristics and the context of the respondent.

Description

Dialog generation method based on replying person personal feature enhancement

Technical Field

The invention relates to the technical field of natural language processing and a dialogue system, in particular to a dialogue generating method based on replying person personal feature enhancement.

Background

With the continuous rise of artificial intelligence in recent two years, in many fields, more and more artificial intelligence products slowly appear in industrial services, and the conversation system is more and more concerned by people as a new field. Open field oriented dialog system^[1]Is an important direction in man-machine conversation, and aims to make the generated conversation response more natural, fluent and diverse as possible.

In recent years, the development of the research related to dialog generation is greatly promoted by the continuous progress of deep learning technology, so that the dialog generation does not rely on the modes of template matching, retrieval and the like. In recent years, the dialogue system method mainly comprises: (1) the generation-based method mainly comprises a Seq2Seq model adopting an Encoder-Decoder framework^[2]Generative models based on a neural variational encoder^[3]Etc.; (2) based on the method of retrieval, responses are selected primarily from candidate responses. The key is message response matching, and a matching algorithm must overcome semantic difference between a message and a response; (3) hybrid approaches, combining neural generative models with search-based models, combine the advantages of both search and generation-based models, and are attractive in performance.

The method mainly considers the diversity of the response, and rarely considers the consistency of the response generated by a replier; the problem of KL divergence disappearance exists in a neural variation encoder model, so that potential space cannot be effectively utilized^[4]And the space contains more personal characteristics of the respondent.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a dialog generation method based on personal feature enhancement of a replying person. According to the method, vMF distribution is introduced by using an encoder-decoder framework to construct VAE and CVAE, personal characteristics and context information of a replier in a conversation context are fused, and the finally obtained conversation generation result is the best result in 5 indexes of Average, Greedy, Extreme, Distinct-1 and Distinct-2 compared with the existing model.

The purpose of the invention is realized by the following technical scheme:

a dialog generation method based on replying person personal feature enhancement comprises the following steps:

(1) 2 encoder-decoder basic frameworks are constructed:

2 encoder-decoder basic frames are respectively used for reconstructing sentences of a relevant replying person in each section of dialogue context of the training corpus and responses in each section of dialogue;

(2) constructing a personal feature extractor:

latent variable z for extracting personal characteristics of respondent by personal characteristic extractor_r(ii) a vMF distribution is introduced to an encoder-decoder basic framework for reconstructing each dialog context related replier sentence in a training corpus to construct a VAE generation model based on vMF distribution, and a personal characteristic latent variable z is obtained_r；

(3) Constructing an information enhancement generator:

information enhancement generator for obtaining fusion replying person personal characteristic latent variable z_rAnd a response y for context x; the information enhancement generator introduces vMF distribution and respondent personal feature latent variable z on the encoder-decoder basic framework for reconstructing each dialog response in the corpus_rA CVAE generative model based on vMF distribution was constructed to obtain the response y.

Further, in the step (1), in the training corpus, each dialog is composed of a context sentence and a response; wherein the context sentence in each dialog is represented as x ═ (x)₁，x₂，…，x_n),x_iRepresenting the ith sentence in the context, n representing n sentences contained in the dialog context, in particular in the form of x_i＝(w_i,1,w_i,2,…w_i,j,…,w_i,Ni),w_i,jRepresenting the jth word, N, in the ith sentence_iRepresenting a sentence x_iThe number of words in (1); the response in each dialog is denoted as y ═ w (w)_y,1,w_y,2,…w_y,j,…,w_y,Ny),w_y,jIndicating the jth word, N, in the response_yRepresents the number of words contained in response y; sentences related to the replying person information extracted from each dialog context are represented as

l denotes the number of sentences of the relevant reverter in the dialog context.

Further, the following processing is required to obtain the corpus:

(101) deleting sentences of which the original dialogue length is less than 3 or more than 10 in the corpus to normalize the dialogue length;

(102) the last sentence of each dialog in the corpus is considered as a response, and the rest sentences are considered as contexts.

Further, in step (2) (3), vMF distribution, i.e. von Mises-Fisher distribution, is used to represent the probability distribution on the unit sphere, and the probability density function is as follows:

wherein,

d represents the

The dimension of the space, x represents a unit random vector of d dimensions;

representing a direction vector on a unit sphere, | | μ | | ═ 1; kappa.gtoreq.0 represents a concentration parameter; i is_ρA modified Bessel function representing the order ρ, where ρ ═ d/2-1; the distribution indicates the distribution of the unit vectors on the spherical surface;

further, in the step (2), the specific steps are as follows:

the personal feature extractor consists of a sentence encoder, a local context encoder, an vMF distribution and reply decoder.

First, a sentence encoder encodes a sentence x about reverter information using a bi-directional RNN layered encoder^rIt will x^rEach sentence in (1)

Coded as a vector

Then x is put^rAll of

The coded vector is used as the input of a local context coder, and finally, a sentence x related to the reverter information is obtained^rPotential vector of

Next, vMF is used to distribute the sentence x for the information about the reverter^rHidden state of

Learning to obtain the distribution of the representation of personal characteristics of the replying person, and then performing rejection sampling on the distribution to obtain the latent variable z_rThe sampling formula is as follows:

wherein ω ∈ [ -1,1](ii) a Will z_rThe reconstructed sentence x of the replying information is obtained as the input of the replying decoder^rThe calculation formula is as follows:

where l represents the sentence x in the context about the reverter information^rThe number of (2); n is a radical of_iIs x^rLength of the ith sentence, w_i,jIs x^rThe ith sentence

A representation of the jth word of (a);

finally, optimizing the model by using an ELBO formula:

wherein,

which is indicative of the error of the reconstruction,

for calculating KL divergence between a posterior distribution and a prior distribution subject to

Posterior distribution compliance

Wherein

Is a parameter of the posterior distribution,

is set to a constant;

the calculation formula is as follows:

wherein

Is a linear function, and | is | · | | | is used to ensure regularization;

the KL divergence is calculated as follows:

where Γ (·) represents a Gama distribution.

Further, in step (3), the information enhancement generator is used for generating the latent variable z by combining the personal characteristics_rAnd the dialog context x ultimately generates a response y; the method specifically comprises the following steps:

the information enhancement generator includes a sentence encoder, a global context encoder, an vMF distribution and response decoder.

First, all context sentences x are encoded using a sentence encoder₁,x₂,…,x_nIs composed of

The coded response y being a vector

Will be provided with

Deriving context potential vectors as input to a global context encoder

Second, the context latent vector

And vectors generated in response

As input to vMF distribution, get distribution representation, sample output context latent variable z, processThe following were used:

wherein ω ∈ [ -1,1 ];

finally, the context x, the context latent variable z and the replying person characteristic latent variable z_rGenerating a response y as a response decoder input;

the generation process is represented as follows:

wherein σ represents a sigmoid function;

is a word-embedded representation of the ith word in response y;

representing the hidden state of the t step; v and b are parameters to be learned; p is a radical of_vocabThe generation probability of the word list is shown; p is a radical of_vocab(w_y,i) Means to generate a word w_y,iThe probability of (d); n is a radical of_yRepresents the length of response y; equation (11) represents the generation probability of the response y.

The optimization procedure using CVAE based on vMF distribution is expressed as follows:

wherein

The process of generation is shown as being performed,

which is indicative of the error of the reconstruction,

indicating the KL divergence between the posterior distribution and the prior distribution, the posterior distribution being

A priori distribution of

vMF distribution parameters in the above equation

After setting as constant, test parameter

Prior parameter

The calculation is as follows:

wherein the posterior distribution of CVAE is obeyed

A priori distributed compliance

Is based on x; the following formula of KL divergence is obtained according to the prior and the posterior:

compared with the prior art, the technical scheme of the invention has the following beneficial effects:

1. in order to solve the problem that potential space cannot be effectively utilized due to disappearance of KL divergence in VAE and CVAE, vMF distribution is used for replacing Gaussian distribution in the VAE and CVAE model based on the encoder-decoder framework in the step (2) and the step (3) of the invention^[5]The KL divergence in the model using Gaussian distribution is calculated by using the mean and variance of the Gaussian distribution, but the problem of KL divergence disappearance is caused by the continuous change of the mean and variance in training; therefore, vMF distribution is used to replace Gaussian distribution, KL divergence in the model is determined by a parameter kappa, and the parameter kappa is a constant and cannot be changed in training, so that the problem of KL divergence disappearance cannot be caused, and potential space can be fully used; experiments show that the introduction of the vMF distribution can solve the problem of KL divergence disappearance.

2. In order to improve the consistency of respondents in response, the method utilizes a VAE model based on vMF distribution to represent respondent information in the context by vMF distribution in step (2), and obtains a potential variable z of the personal characteristics of the respondents by sampling_rAnd applying the response to the final response generation, so that the final response contains the relevant information of the replier in the context; experiments show that the extraction of personal characteristics of respondents in the context can obviously improve the consistency of respondents in response.

3. In order to enhance the information amount in the response, the invention extracts the information of the global context by using the CVAE model based on vMF distribution in step (3), and combines the information with the replier information in the context to act on the generation process, and the information contained in the response can be effectively enhanced by inputting the global context information into the generation process; experiments show that the introduction of the information quantity can effectively improve the Distingt-1 and Distingt-2 indexes, and the introduction of the item is beneficial to enhancing the information quantity in response.

Drawings

FIG. 1 is a frame diagram of a dialog generation method based on replying person personal feature enhancement provided by the present invention;

FIG. 2 is SSVN_Gau、SSVN_Gau-E、SSVN_Gau-GAnd calculating the KL divergence degree of the model in training.

FIG. 3 is a graph showing the results of the corresponding performance of the present invention at different λ values;

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

2 Dialogue data sets Cornell Movie Dialogs Corpus and Ubuntu Dialogue Corpus^[6]The method of carrying out the invention is given as an example. The overall framework of the method is shown in figure 1. The whole system algorithm flow comprises 3 steps of constructing an encoder-decoder model, extracting personal characteristics by using VAE and generating response by using CVAE.

The method comprises the following specific steps:

(1) constructing the input of an encoder-decoder model:

cornell Movie dimensions cores contains sessions in over 80000 movies; ubuntu dialogue Corpus contains approximately 500000 rounds of conversations collected from Ubuntu Internet replayed Chat, each starting with an unresolved technical problem and then a corresponding answer to the solution. The invention takes the two dialogue data sets as the original corpus to construct an encoder-decoder model and process the original corpus according to the following steps: (1) deleting the dialogs with the number of dialog rounds less than 3 or more than 10 in the dialog data set; (2) the last sentence in each dialog is taken as the response, and the preceding sentences are taken as the dialog context. Table 1 shows the detailed statistics of the two data sets. 91271 Dialogs for training, 871 Dialogs for verification and 702 Dialogs for testing are available in Cornell Movie Dialogs Corpus, wherein each dialog contains 5.04 of average sentences, 16.91 of average words and 10000 of word list size; 448833 dialogues for training, 19584 dialogues for verification, and 18920 dialogues for testing in Ubuntu Internet replayed Chat, wherein each dialog contains an average number of sentences of 4.94, an average number of words of 23.67, and a vocabulary size of 20000.

TABLE 1 dialog data set statistics

(2) Personal feature extraction using VAE

In order to obtain personal characteristics z of respondents in each conversation_rWe require that vMF distribution be added among the encoder-decoder models to construct the VAE model and train the model according to the following objective function:

the symbols in the formula have the meanings as described above. Prior distribution formula

The posterior distribution formula

Finally, the personal characteristics z of the respondents are obtained_r。

(3) Generating responses using CVAE

To get the final output, we require the use of a CVAE model based on vMF, with context x as input, and the respondent personal characteristics latent variable z_rAs condition variables, the generation process is trained with the following objective function:

on the upper part

The equation represents the training target for the entire model. The symbols in the formula have the meanings as described above.

In a specific implementation process, taking a Cornell Movie dimensions kernel dataset as an example, various parameters are set in advance, a word vector has a dimension of 200 and is initialized randomly, a sentence encoder adopts a 2-layer bidirectional GRU structure, wherein each layer comprises 600 hidden neurons, z and z are_rIs set to 50, updates the parameters at an initial learning rate of 0.001 using Adam algorithm, and in training we select the best model using the lower bound of variation on the validation set using early-stop strategy.

Table 2 shows the present model (SSVN), a simplified version of the model (SVN, SSVN)_Gau、SSVN_Gau-E、SSVN_Gau-G) And results of other models (S2SA, HRED, VHRED, HVMN) on two datasets and five evaluation indexes (Average, Greedy, Extreme, Distingt-1, Distingt-2).

TABLE 2-1 automated assessment results of Cornell Movie dialog Corpus dialog dataset

TABLE 2-2 Ubuntu Dialogue Corpus Dialogue data set automated evaluation results

TABLE 2-3 model ablation Performance on Cornell Movie scales Corpus dialogue datasets

The comparative experimental algorithms in the table are described below:

s2 SA: a standard seq2seq model with attention mechanism;

HRED: a layered coding framework of a multi-convolution dialog model;

VHRED: a layered codec having latent random variables;

HVMN: a codec network comprising a hierarchy and a variable memory;

SSVN_Gau、SSVN_Gau-E、SSVN_Gau-G: is the 3 degradation models we propose;

remarking: the method provided by the invention is characterized in that SSVN, Gau represents Gaussian distribution, vMF represents vMF distribution, and the SSVN, Gau represents Gaussian distribution and vMF represents vMF distribution are distribution representations in a potential space; thereby producing a series of degradation models for SSVN.

FIG. 2 shows SSVN_Gau、SSVN_Gau-E、SSVN_Gau-GResults in resolving KL divergence disappearance.

table 3 shows an example of the above method:

TABLE 3 example generated on Cornell Movie scales dialog corps dialog dataset

As can be seen from the experimental results in table 2, the personal features of the respondent are extracted and fused with the context text, so that the automatic evaluation standard of the dialog generation method can be greatly improved. As can be seen from the experimental results of table 3 in the specific examples, the responses generated by the present invention are closer in result to the personal characteristics of the respondents, and the responses generated are more diverse and natural than the dialog generation methods previously proposed.

The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.

Reference documents:

[1]Lifeng Shang,Zhengdong Lu,and Hang Li.2015.Neural respondingmachine for short-text conversation.In Proceedings of the 53rd Annual Meetingof the Association for Computational Linguistics(ACL),pages 1577–1586.

[2]Ilya Sutskever,Oriol Vinyals,and Quoc V Le.2014.Sequence tosequence learning with neural networks.In Advances in Neural InformationProcessing Systems 27(NIPS),pages 3104–3112.

[3]D.P.Kingma,D.J.Rezende,S.Mohamed,and M.Welling.2014.Semi-supervised learning with deep generative models.In Advances in NeuralInformation Processing Systems 27(NIPS),pages 3581–3589.

[4]Tiancheng Zhao,Ran Zhao,and Maxine Eskenazi.2017.Learningdiscourse-level diversity for neural dialog models using conditionalvariational autoencoders.In Proceedings of the 55th Annual Meeting of theAssociation for Computational Linguistics(ACL),pages 654–664.

[5]Jiacheng Xu and Greg Durrett.2018.Spherical latent spaces forstable variational autoencoders.In Proceedings of the 2018 Conference onEmpirical Methods in Natural Language Processing(EMNLP),pages 4503–4513.

[6]Hongshen Chen,Zhaochun Ren,Jiliang Tang,Yihong Eric Zhao,and DaweiYin.2018.Hierarchical variational memory network for dialogue generation.InProceedings of the 2018 World Wide Web Conference(WWW’18),pages 1653–1662.

Claims

1. a dialog generation method based on the personal feature enhancement of a replying person is characterized by comprising the following steps:

(1) 2 encoder-decoder basic frameworks are constructed:

(2) constructing a personal feature extractor:

(3) Constructing an information enhancement generator:

2. The dialog generation method based on the personal feature enhancement of the replying person of claim 1, characterized in that in the step (1), each dialog is composed of a context sentence and a response in the corpus; wherein the context sentence in each dialog is represented as x ═ (x)₁，x₂，...，x_n)，x_iRepresenting the ith sentence in the context, n representing n sentences contained in the dialog context, in particular in the form of x_i＝(w_i，1，w_i，2，...w_i，j，...，w_i，Ni)，w_i，jRepresenting the jth word, N, in the ith sentence_iRepresenting a sentence x_iThe number of words in (1); the response in each dialog is represented as

w_y，jIndicating the jth word, N, in the response_yRepresents the number of words contained in response y; sentences related to the replying person information extracted from each dialog context are represented as

3. The dialog generation method based on the replying person personal feature enhancement as claimed in claim 1 or 2, characterized in that the following processing is required to obtain the corpus:

4. The method according to claim 1, wherein in step (2) (3), vMF distribution, i.e. von mises-Fisher distribution, is used to represent the probability distribution on the unit sphere, and the probability density function is as follows:

wherein,

d represents the

The dimension of the space, x represents a unit random vector of d dimensions;

representing a direction vector on a unit sphere, | | μ | | ═ 1; kappa.gtoreq.0 represents a concentration parameter; i is_ρA modified Bessel function representing the order ρ, where ρ ═ d/2-1; the distribution indicates the distribution of the unit vectors on the spherical surface.

5. The dialog generation method based on the replying person personal feature enhancement as claimed in claim 1, wherein the specific steps in the step (2) are as follows:

Coded as a vector

Then x is put^rAll of

where l represents the sentence x in the context about the reverter information^rThe number of (2); n is a radical of_iIs x^rLength of the ith sentence, w_i，jIs x^rThe ith sentence

A representation of the jth word of (a);

finally, optimizing the model by using an ELBO formula:

wherein,

which is indicative of the error of the reconstruction,

Posterior distribution compliance

Wherein

Is a parameter of the posterior distribution,

is set to a constant;

the calculation formula is as follows:

wherein

Is a linear function, and | is | · | | | is used to ensure regularization;

the KL divergence is calculated as follows:

where Γ (·) represents a Gama distribution.

6. The dialog generating method based on the personal feature enhancement of the replying person as claimed in claim 1, wherein in the step (3), the information enhancement generator is based on the combination of the latent variable z of the personal feature_rAnd the dialog context x ultimately generates a response y; the method specifically comprises the following steps:

First, all context sentences x are encoded using a sentence encoder₁，x₂，...，x_nIs composed of

The coded response y being a vector

Will be provided with

Deriving context potential vectors as input to a global context encoder

Second, the context latent vector

And vectors generated in response

Combining the input as the vMF distribution to obtain a distribution representation, the output context latent variable z is sampled as follows:

wherein ω ∈ [ -1,1 ];

the generation process is represented as follows:

wherein σ represents a sigmoid function;

is a word-embedded representation of the ith word in response y;

representing the hidden state of the t step; v and b are parameters to be learned; p is a radical of_vocabThe generation probability of the word list is shown; p is a radical of_vocab(w_y，i) Means to generate a word w_y，iThe probability of (d); n is a radical of_yRepresents the length of response y; equation (11) represents the generation probability of the response y.