CN115495566A

CN115495566A - Dialog generation method and system for enhancing text features

Info

Publication number: CN115495566A
Application number: CN202211238085.8A
Authority: CN
Inventors: 王烨; 廖靖波; 于洪; 雷大江; 黄昌豪; 杨峻杰; 卞政轩
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-10-11
Filing date: 2022-10-11
Publication date: 2022-12-20

Abstract

The invention relates to the field of man-machine conversation, in particular to a conversation generation method and a system for enhancing text characteristics; the method comprises the steps of obtaining a problem text and a reply text, and extracting keywords in the problem text through a TextRank algorithm to obtain a keyword sequence; introducing a keyword encoder, wherein the keyword encoder encodes each keyword through an attention mechanism to obtain a corresponding keyword vector; splicing the keyword vector and the semantic vector and inputting the spliced keyword vector and the semantic vector into a first multilayer perceptron to obtain a keyword semantic vector containing rich semantics; splicing the keyword semantic vector and the problem text vector, and then obtaining an input vector through a second multilayer perceptron; training a dialogue generating model according to the input vector, calculating a loss value by adopting a loss function, performing back propagation, and adjusting parameters of the dialogue generating model; the invention strengthens the weight of the key words, enhances the feature expression of the text and achieves the aim of generating the dialog text with higher quality.

Description

Dialog generation method and system for enhancing text features

Technical Field

The invention relates to the field of man-machine conversation, in particular to an open field generation model for enhancing conversation feature expression, and specifically relates to a conversation generation method and a system for enhancing text features.

Background

Human-machine dialogues are mainly divided into task-oriented and non-task-oriented dialog (open domain) application systems. Compared with a task-based dialog system, the open-domain dialog system does not need to execute a specific task, and the generated reply is more random than the task-based dialog system. Chat robots can be currently classified into three types, namely, a search type, a generation type and a knowledge graph type. The retrieval type chat robot extracts the most appropriate reply from the existing dialogue corpus by using technologies such as sequencing, matching and the like, but the method can only generate the text in the corpus and cannot realize the diversity of the dialogue, and if more dialogues exist in the corpus, the reply generation speed is slowed down, and the experience of the chat is influenced.

With the deep development of the end-to-end deep learning model, the dialogue system model in the open field already solves the corresponding part of problems, so that the generated dialogue reply is richer. The end-to-end coder-decoder model used by the generative chat robot encodes the dialog into a specific feature vector, and each word of the generative dialog is obtained by sampling from a word list through the decoder, so that the dialog which is not generated in the generation corpus is generated, the defect that the retrieval dialog can only be generated according to the template of the corpus is overcome, and the reply is richer. However, because the generative model samples from the vocabulary and then combines the sampled words into a reply sentence according to the sampling sequence, the representation of the dialogue features is not complete enough, so that the generative model is easy to generate low-quality or irrelevant semantic reaction.

The problem is explained in detail by taking a Seq2Seq model as an example, and the Seq2Seq is the earliest end-to-end generation model and makes a great contribution to the field of text generation. The subsequent chat robots are basically based on the seq2seq paradigm. It contains two Recurrent Neural Networks (RNNs), an encoder and a decoder, respectively. The encoder encodes the input sequence into a semantic vector and the decoder decodes the semantic vector into an output sequence. However, the RNN coding sequence and the de-coding sequence are required to be sequentially performed from left to right in an autoregressive mode, so that the parallel operation is difficult, and the time complexity for processing a long sequence is high. Meanwhile, the RNN is difficult to establish a long-distance context-dependent model, so that extracted features lack important information.

The attention of the existing depth model to words at different positions in a text sequence is the same. Although attention is paid to solving the problem of word weight in a text sequence, the current dialogue system model still takes the maximum likelihood function of the generated text and the reference text as an optimization function, and a special optimization function is lacked to learn the attention weight, so that the importance weight between different words learned by the model is inaccurate. If the attention mechanism is not used, all words are equally weighted in the text feature, which is not consistent with the scenario of open conversation.

Disclosure of Invention

The invention provides a dialog generation method and system for enhancing text features, which aim to solve the problems that the features of a dialog text cannot be completely characterized by the existing generating model, the features are lost due to a recurrent neural network, and the weights of all words in a text sequence in the text features are the same, so that some general or inaccurate dialog replies are easily generated.

In a first aspect, the present invention provides a dialog generation method for enhancing text features, comprising the steps of:

s1, obtaining a problem text and a reply text, and extracting keywords in the problem text through a TextRank algorithm to obtain a keyword sequence; acquiring a problem text vector of a problem text through an input encoder;

s2, introducing a keyword encoder, and encoding a keyword sequence by the keyword encoder through an attention mechanism to obtain a keyword vector;

s3, splicing the keyword vectors and the semantic vectors and inputting the spliced keyword vectors and the semantic vectors into a first multilayer perceptron to obtain keyword semantic vectors containing rich semantics;

s4, splicing the keyword semantic vector and the problem text vector, and then passing through a second multilayer perceptron to obtain an input vector;

s5, training a dialogue generating model according to the input vector and the reply text, calculating a loss value by adopting a loss function, performing back propagation, and adjusting parameters of the dialogue generating model;

and S6, inputting the text to be replied into the trained dialogue generating model to generate the dialogue.

Further, step S4 further includes:

acquiring a reply text vector of a reply text by adopting an output encoder;

splicing the keyword semantic vector and the problem text vector, and then obtaining a first fusion characteristic through a second-layer perceptron, and inputting the first fusion characteristic into a prior network to obtain a prior distribution parameter;

splicing the keyword semantic vector, the problem text vector and the reply text vector, and then passing through a third-layer perceptron to obtain a second fusion characteristic, and inputting the second fusion characteristic into an identification network to obtain an approximate posterior distribution parameter;

and carrying out reparameterization on the approximate posterior distribution parameters to obtain hidden variables, and initializing the hidden variables through linear transformation to obtain input vectors.

Further, the keyword encoder encodes the keyword sequence through an attention mechanism to obtain a keyword vector, including:

h _t ＝Enc _key (e(K))

Enc _key (e(k _i ))＝LSTM(input _i )

wherein Enc _key () Denotes keyword encoder, K = K ₁ ,k ₂ ,...,k _t Representing a sequence of keywords, k _i Denotes the ith keyword, h _t Representing keyword vectors, e () representing word vectors for computing words, input _i Representing a keyword k _i The weighting vector of (e), LSTM () represents the long-short time memory network LSTM, alpha (e (k) _i ),e(k _j ) ) represents a keyword k _i And a keyword k _j 1 is less than or equal to j < i.

Further, obtaining the semantic vector by using a semantic encoder includes:

h _s ＝Enc _sem (e(S),(h ₀ ,c ₀ ))

wherein Enc _sem () Representation semantic encoder, h _s Representing a semantic vector, h ₀ Representing the initial hidden state of the semantic encoder, c ₀ And expressing the initial cell state of the semantic encoder, and S expresses the semantic text corresponding to the problem text.

In a second aspect, based on the method proposed in the first aspect, the present invention provides a dialog generation system for enhancing text features, including a sample module, a keyword extraction module, a coding module, a fusion module, and a training module, wherein:

the system comprises a sample module, a processing module and a display module, wherein the sample module is used for acquiring a plurality of groups of conversation samples, and the conversation samples comprise question texts and reply texts;

the keyword extraction module is used for extracting a plurality of keywords from the question text to form a keyword sequence;

the encoding module comprises an input encoder, an output encoder, a keyword encoder and a semantic encoder and is used for encoding data of the sample module and the keyword extraction module;

the fusion module comprises a first fusion module and a second fusion module:

the first fusion module is used for fusing the keyword vectors and the semantic vectors to obtain keyword semantic vectors containing rich semantics;

the second fusion module is used for fusing the keyword semantic vector and the question text vector;

and the training module is used for training the dialogue generating model, calculating loss by adopting a loss function, performing back propagation and adjusting parameters of the dialogue generating model.

Further, the system also comprises a third fusion module, a prior network and an identification network:

the third fusion module is used for fusing the keyword semantic vector, the question text vector and the reply text vector;

the prior network is used for receiving the output result of the second fusion module and acquiring prior distribution parameters;

and the identification network is used for receiving the output result of the third fusion module and acquiring the approximate posterior distribution parameters.

The invention has the beneficial effects that:

the invention provides an open type dialog generating method for enhancing dialog text feature expression. When the ordinary RNN is used for coding the text, the front part of the text is easy to ignore, and the weight of the content of the text which is more backward is larger, so the ordinary RNN cannot code the long text. The long text dependency problem can be solved by using LSTM, but with the same weight between all words of the text; if only the attention mechanism is used to solve the problem, since the optimization function for specifically optimizing the attention is not used, but the maximum likelihood optimization function generated by the dialogue is used, it is very difficult to learn the correct weight, which may result in the low weight of the key text and the high weight of the non-key text, thereby misleading the model optimization. The present invention chooses to explicitly extract keywords from text using a keyword extraction method and encode the keywords by an attention mechanism rather than the entire text. Specifically, attention calculation is carried out on the keyword features and the semantic category features which are coded by using an attention mechanism, and then the result is used as one part of text features and is fused into the text features. Therefore, the weight of the words which are more key in the text sequence in the text features is increased. The method not only can correctly enhance the weight of the key text, but also can avoid the situation that the model learns the wrong weight to cause the situation that the non-key vocabulary occupies higher weight.

Drawings

FIG. 1 is a flow diagram of a dialog generation method for enhancing text features in accordance with the present invention;

FIG. 2 is a schematic diagram of a Seq2Seq model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a CAVE model according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a dialog generating method and a dialog generating system for enhancing text features, which integrate keywords and category semantics to extract feature representation with richer semantics and more accurate weight, fuse the integrated features with original text features to obtain higher-quality input features, train a dialog generating model and obtain higher-quality dialog.

In an embodiment, a Seq2Seq model shown in fig. 2 is used to describe the dialog generation model training method for enhancing text features of the present invention, as shown in fig. 1, including:

s11, acquiring a question text C = C ₁ ,c ₂ ,...,c _m And reply text X = X ₁ ,x ₂ ,...,x _n ，c _u Represents the u ∈ {1, 2.., m } word, x in the question text _v Representing the v ∈ {1, 2.., n } word in the reply text, m representing the number of words in the question text, and n representing the number of words in the reply text; extracting keywords in the problem text through a TextRank algorithm to obtain a keyword sequence K = K ₁ ,k ₂ ,...,k _t ，k _i Represents the ith element {1,2,. And t } key word;

s12, acquiring a problem text vector of a problem text through an input encoder, and acquiring a semantic vector through a semantic encoder, wherein the semantic vector is expressed as:

h _ci ,c _ci ＝Enc _in (e(c _i ),(h _ci-1 ,c _ci-1 ))

h _s ＝Enc _sem (e(S),(h ₀ ,c ₀ ))

h _ci ,c _ci respectively representing the hidden state of the ith step of the input encoder and the cell state of the ith step, enc _in () Representing the input encoder, enc _sem () Representation semantic encoder, h _s Representing a semantic vector, h ₀ Representing the initial hidden state of the semantic encoder, c ₀ Representing the initial cell state of a semantic encoder, and S represents the semantic category of the question text;

s13, introducing a keyword encoder, and encoding each keyword by the keyword encoder through an attention mechanism to obtain a corresponding keyword vector;

s14, splicing the keyword vectors and the semantic vectors, inputting the spliced keyword vectors and semantic vectors into a first multilayer perceptron to obtain keyword semantic vectors containing rich semantics, wherein one problem text corresponds to one semantic vector and is expressed as follows:

h'＝MLP([h _t :h _s ])

h' represents a keyword semantic vector, h _t Represents a keyword vector, h _s Represents a semantic vector, MLP () represents a first multi-layer perceptron;

s15, splicing the keyword semantic vector and the problem text vector, and then obtaining an input vector through a second multilayer perceptron, wherein the input vector is expressed as:

representing the input vector of the decoder, i.e. the initial hidden state of the decoder, h _m Representing a problem text vector, MLP' () representing a second multi-layered perceptron;

and S16, training a dialogue generating model according to the input vector and the reply text, calculating a loss value by adopting a loss function, performing back propagation, and adjusting parameters of the dialogue generating model.

Specifically, as shown in FIG. 2, the first input to the decoder is<SOS>(Start of sense) tag and initial hidden state, i.e., input vector

The entry of each subsequent step being the word x in the reply sequence _t And the hidden state s output in the previous step _t-1 . And finally, mapping the output of the decoder to a word list space by using SoftMax, taking the word with the maximum probability in the word list as a generated result, and expressing the result as follows:

s _t ＝Dec(s _t-1 ,e(x _t ))；

s _t represents the hidden state of the decoder at step t, dec () represents the decoder,

representing the words generated by the t step of the decoder, and MLP () representing the words mapped and output to the word list space by the multi-layer perceptron;

specifically, the Seq2 Seq-based dialog generation model uses cross entropy as a loss function, expressed as:

x represents a reference reply text which is,

indicating that the decoder generated text.

In one embodiment, the CVAE model shown in fig. 3 is used to describe the dialog generation model training method for enhancing text features of the present invention, which includes:

s21, acquiring question text C = C ₁ ,c ₂ ,...,c _m And reply text X = X ₁ ,x ₂ ,...,x _n ，c _u Represents the u e {1, 2.., m } word, x in the question text _v Representing the v ∈ {1, 2., n } word in the reply text, m representing the number of words in the question text, and n representing the number of words in the reply text; extracting keywords in the problem text through a TextRank algorithm to obtain a keyword sequence K = K ₁ ,k ₂ ,...,k _t ，k _i Representing the ith element {1,2,..., t } key words;

s22, acquiring a problem text vector of a problem text through an input encoder, acquiring a semantic vector through a semantic encoder, and acquiring a reply text vector of a reply text through an output encoder, wherein the expression is as follows:

h _ci ,c _ci ＝Enc _in (e(c _i ),(h _ci-1 ,c _ci-1 ))

h _xi ,c _xi ＝Enc _out (e(x _i ),(h _xi-1 ,c _xi-1 ))

h _s ＝Enc _sem (e(S),(h ₀ ,c ₀ ))

h _ci ,c _ci respectively representing the hidden state of the input encoder step i and the cell state of the step i, h _xi ,c _xi Respectively representing the hidden state of the output encoder at step i and the cell state of step i, enc _out () Representing an output encoder, enc _in () Representing the input encoder, enc _sem () Representation semantic encoder, h _s Representing a semantic vector, h ₀ Representing the initial hidden state of the semantic encoder, c ₀ Representing the initial cell state of a semantic encoder, and S represents the semantic category of a problem text;

s23, a keyword encoder is introduced, and the keyword encoder encodes each keyword through an attention mechanism to obtain a corresponding keyword vector;

s24, splicing the keyword vectors and the semantic vectors, and inputting the spliced keyword vectors and the semantic vectors into a first multilayer perceptron to obtain keyword semantic vectors containing rich semantics, wherein the keyword semantic vectors are expressed as follows:

h'＝MLP([h _t :h _s ])

s25, splicing the keyword semantic vector and the problem text vector, and then obtaining a first fusion characteristic through a second multilayer sensing machine, and inputting the first fusion characteristic into a prior network to obtain a prior distribution parameter;

s26, splicing the keyword semantic vector, the problem text vector and the reply text vector, and then passing through a third-layer perceptron to obtain a second fusion characteristic, and inputting the second fusion characteristic into a recognition network to obtain an approximate posterior distribution parameter;

s27, carrying out reparameterization on the approximate posterior distribution parameters to obtain hidden variables, and initializing the hidden variables through linear transformation to obtain input vectors;

and S28, training a dialogue generating model according to the input vector and the reply text, calculating a loss value by adopting a loss function, performing back propagation, and adjusting parameters of the dialogue generating model.

Specifically, the CVAE-based dialog generation model uses reconstruction loss and KL distance as loss functions, expressed as:

wherein q is _φ (z | X, C, K, S) represents an approximate posterior distribution,

which represents a distribution a priori, and,

expressing the expectation of reconstructing a reply text X under the approximate posterior distribution, KL expressing KL divergence of two distributions, X expressing a reply text, C expressing a problem text, K expressing a keyword text, S expressing a semantic text and z expressing a hidden variable.

In one embodiment, the method for encoding each keyword by the keyword encoder through the attention mechanism to obtain a corresponding keyword vector includes:

h _t ＝Enc _key (e(K))

Enc _key (e(k _i ))＝LSTM(input _i )

wherein Enc _key () Denotes keyword encoder, K = K ₁ ,k ₂ ,...,k _t Representing a sequence of keywords, k _i Denotes the ith keyword, h _t Represents the hidden state of the t-th step of the keyword encoder, which is also a keyword vector in this embodiment, e () represents the calculated word vector, input _i Representing a keyword k _i The weighting vector of (a), LSTM (), represents the long-short time memory network LSTM, α (e (k) _i ),e(k _j ) ) represents a keyword k _i And a keyword k _j 1 is less than or equal to j < i.

In an embodiment, the present invention provides a dialog generation system for enhancing text features, which includes a sample module, a keyword extraction module, a coding module, a fusion module, and a training module, wherein:

the fusion module comprises a first fusion module and a second fusion module:

Specifically, the system further comprises a third fusion module, a prior network and an identification network:

Table 1 dailydalogs dataset evaluation results with behavior signatures

TABLE 2 evaluation results of the DAILYDIALOGS dataset with emotion tags

TABLE 3 Empatiticdialogues dataset evaluation results with emotion tags

Tables 1,2,3 show the results of automated evaluation of our model and the remaining models. Table 1 shows the results of using motivational semantics in the DailyDialog dataset. Table 2 shows the results of using sentiment semantics in the DailyDialog dataset. Table 3 shows the results of using emotion semantics in the Empathetical dialogs dataset.

It can be seen that our model achieves good results on 3 different data sets. Our model is superior to the other comparative models in both BLEU and METEOT, and is only lower than the Transformer in Rouge. This shows that our model can generate higher quality dialog text, especially on different data sets, so our method has good generalization capability. However, after the method is added, the error bar of the model becomes high, which shows that the addition of the method can improve the generating effect of the dialogue model, but the model becomes more complex, and the fluctuation of the model becomes large.

In the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "connected," "fixed," "rotated," and the like are to be construed broadly, e.g., as being fixedly connected, detachably connected, or integrated; can be mechanically or electrically connected; the terms may be directly connected or indirectly connected through an intermediate agent, and may be used for communicating the inside of two elements or interacting relation of two elements, unless otherwise specifically defined, and the specific meaning of the terms in the present invention can be understood by those skilled in the art according to specific situations.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A dialog generation method for enhancing text features, comprising the steps of:

s1, obtaining a problem text and a reply text, and extracting keywords in the problem text through a TextRank algorithm to obtain a keyword sequence; obtaining a question text vector of a question text through an input encoder;

2. The method of claim 1, wherein step S4 further comprises:

acquiring a reply text vector of a reply text by adopting an output encoder;

3. The method of claim 1, wherein the keyword encoder encodes the keyword sequence by an attention mechanism to obtain a keyword vector, comprising:

h _t ＝Enc _key (e(K))

Enc _key (e(k _i ))＝LSTM(input _i )

wherein Enc _key () Representing keyword encoder, K = K ₁ ,k ₂ ,...,k _t Representing a sequence of keywords, k _i Denotes the ith keyword, h _t Representing a keyword vector, e () representing a word vector of a computed word, input _i Representing a keyword k _i The weighting vector of (e), LSTM () represents the long-short time memory network LSTM, alpha (e (k) _i ),e(k _j ) ) represents a keyword k _i And a keyword k _j 1 is less than or equal to j < i.

4. The dialog generation method for enhancing text features according to claim 1, wherein obtaining semantic vectors by using a semantic encoder comprises:

h _s ＝Enc _sem (e(S),(h ₀ ,c ₀ ))

5. A dialog generation system for enhancing text features is characterized by comprising a sample module, a keyword extraction module, a coding module, a fusion module and a training module, wherein:

the fusion module comprises a first fusion module and a second fusion module:

the first fusion module is used for fusing the keyword vector and the semantic vector to obtain a keyword semantic vector containing rich semantics;

and the training module is used for training the dialogue generating model, calculating loss by adopting a loss function, reversely propagating and adjusting parameters of the dialogue generating model.

6. The dialog generation system according to claim 5, further comprising a third fusion module, a priori network, and a recognition network: