CN113626560B

CN113626560B - Diversity dialogue data enhancement method based on reinforcement learning

Info

Publication number: CN113626560B
Application number: CN202110885428.9A
Authority: CN
Inventors: 陈廷伟; 侯昊辰; 魏家富; 胡玥
Original assignee: Liaoning University
Current assignee: Liaoning University
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2024-05-07
Anticipated expiration: 2041-08-03
Also published as: CN113626560A

Abstract

A diversity dialogue data enhancement method based on reinforcement learning comprises the following steps: 1) Given an input dialog history, collecting sets of replies of different semantics or different expressions under the same dialog history, wherein the two sets are used for generating semantics and expression hidden spaces of dialog replies, sampling the two hidden spaces respectively, and generating final replies by combining dialog history coding information; 2) Two unique discriminators are used for discriminating the semantic and expression diversity of the generated sentences, so that the generated reply is ensured to be in the semantic and expression diversity; 3) And continuously and circularly training the model to finally obtain high-quality diversified dialogue samples, thereby achieving the purpose of data enhancement. The invention provides the diversity dialogue data enhancement method based on reinforcement learning, which ensures consistency of dialogue histories and enhances diversity of dialogue samples in terms of expression and semanteme.

Description

Diversity dialogue data enhancement method based on reinforcement learning

Technical Field

The invention belongs to the field of text generation in natural language processing, and particularly relates to a diversity dialogue data enhancement method based on reinforcement learning.

Background

With the development of deep learning technology, man-machine conversation robot construction has made great progress. However, the construction of the current conversation robot requires a large amount of high-quality data covering various conversation scenes, and the current environment lacks high-quality conversation data. The work of modeling high quality conversational robots over time can be divided into two aspects, small sample Learning (Few-Shot Learning) and data enhancement (Data Augmentation). A method of achieving rapid training and Learning of a model using high quality training samples is called small sample Learning (Few-Shot Learning). The small sample learning is very suitable for the human learning process, and for new task learning, a human can learn quickly by only needing a small number of samples. But in the same dialogue state in the actual scene, both people and machines have rich and diverse reactions. Models that learn from only a small number of samples, while allowing for rapid training, are not robust.

Compared with small sample learning, the data enhancement mode not only can enhance the training scale of data, but also can well balance the data distribution of training corpus and reduce the deviation of the data distribution. The dialogue-based data enhancement mode mainly uses a repeated method to enhance the diversity of expression modes for sentences with similar semantics. However, the current dialogue-based data enhancement mode does not consider the consistency of dialogue history at the same time, and the data enhancement model does not sufficiently consider the diversity problem of generated replies in terms of expression and semantics, so that the generated samples cannot fit a real dialogue scene, and training of the model is biased.

Disclosure of Invention

The invention aims to provide a diversity dialogue data enhancement method based on reinforcement learning, which not only ensures consistency of generated replies and dialogue histories, but also enhances diversity of dialogue samples from expression and semanteme.

In order to achieve the above purpose, the invention adopts the following technical scheme:

A diversity dialogue data enhancement method based on reinforcement learning comprises the following steps:

1) Given the input dialogue history, collecting different semantic sets or reply sets with different expressions under the same dialogue history, wherein the two sets are used for generating the semantic and expression hidden spaces of dialogue replies, sampling the two hidden spaces respectively, and generating final replies by combining dialogue history coding information;

2) Using two unique discriminators to discriminate the semantic meaning and the expression diversity of the generated sentences;

3) And continuously and circularly training the model to finally obtain high-quality diversified dialogue samples.

The specific method of the step 1) is as follows:

using H to represent dialogue history, R to represent corresponding real reply, firstly using three different encoders to encode dialogue history, semantics and expression, the encoder adopts a transducer to encode:

H^c＝E(H)+G(E(H)+MultiHead(E(H)，E(H)，E(H) (1)

Given the learned dialog semantics, the expressed vector representation, then the implicit semantics are obtained by adopting a self-encoder mode, and the expression distributions z ₁ and z ₂ are expressed; specifically, assume that z ₁ and z ₂ have the following formats:

Wherein, Semantic meaning and expressed vector representation corresponding to dialogue history; μ represents the expectation, σ represents the standard deviation.

For the followingAnd/>The calculation is as follows, where W represents the weight and b represents the bias:

logσ²＝H^x·W_σ+b_σ (4)

z＝μ+σ⊙∈ (5)

given z ₁,z₂ and H ^c, a final reply is generated:

the semantic implicit space representation and the expression implicit space representation are kept away as far as possible, the model learns the decoupled representation as far as possible, and the final objective function is:

at the same time, this step generates a dialogue reply at this stage.

The specific method of the step 2) is as follows:

Scoring by a semantic diversity scoring function and an expression diversity scoring function according to the replies generated in step 1):

(1) Semantic diversity scoring function:

firstly, a classifier is pre-trained on sentences and corresponding semantic tags on a training data set, in a scoring stage, the correct dialogue and generated replies are labeled by the pre-trained classifier, and meanwhile, the similarity of cosine is calculated by using dialogue history and replies:

If the correct dialogue reply semantic label is consistent with the generated dialogue reply semantic label, the dialogue reply semantic label is divided into 0; if the semantic tags are inconsistent and the cosine similarity of the dialogue history and the generated replies is higher than the cosine similarity of the correct replies, the semantic tags are classified as 1; if the semantic tags are inconsistent and the cosine similarity of the dialogue history and the generated replies is smaller than the cosine similarity of the correct replies, the semantic tags are classified as-1;

(2) Expression diversity scoring function:

And scoring by adopting a reply diversity evaluation index distinct, calculating the degree of expression diversity by calculating the number of different words in generated replies, and regularizing by utilizing the sentence length to be used as a scoring function of final expression diversity for generating a sentence with too long length.

And continuously feeding back the model of the previous step according to the scoring of semantic diversity and expression diversity, and updating the parameter theta.

The invention has the beneficial effects that:

The invention has the advantages that the improved model learns the hidden space of the dialogue reply on the semantics and the expression, and realizes the diversity of the dialogue reply on the two dimensions in a sampling mode. Compared with learning implicit vector representation, the method is easier to generate high-quality conversation samples with wide coverage. The invention designs a unique scoring model aiming at specific dimension characteristics, not only considers the authenticity of the generated sample, but also ensures that the generated sample is consistent with the dialogue history compared with the real sample, and generates a high-quality dialogue sample.

Drawings

FIG. 1 is a diagram of a framework of a diversity dialogue data enhancement technique for reinforcement learning.

Detailed Description

The following description of the embodiments of the present invention will be made with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention.

Example 1:

In order to ensure that the generated replies are consistent with the dialogue history and simultaneously enhance the diversity of dialogue samples from expression and semantics, the present disclosure provides a diversity dialogue data enhancement technology based on reinforcement learning, and the framework is shown in fig. 1, and the specific scheme is as follows:

given an input dialog history, a collection of replies of different semantics or different expressions under the same dialog history is collected, the two collections are used to generate the semantics and expression implication space of the dialog replies, the two implication spaces are respectively sampled, and the final replies are generated by combining the dialog history coding information. In order to ensure the diversity of the generated reply in the meaning and the expression, two unique discriminators are designed to discriminate the diversity of the meaning and the expression of the generated sentences.

As shown in fig. 1, the dialog history (dialogue history) and the corresponding true response (response) are represented using H and R, respectively, and the dialog history, semantic, expression is first encoded using three different encoders (Encoder), the Encoder encoder encoding with a transducer:

H^c＝E(H)+G(E(H)+MultiHead(E(H)，E(H)，E(H) (1)

e (-) represents the word embedding (embedding) vector representation corresponding to the dialog history. Specifically, the semantics, expressions, and overall vector representations corresponding to the dialog history are obtained by using a transducer operation, respectively, and are represented as

Given the learned dialog semantics, the expressed vector representations then derive implicit semantics in a self-encoder fashion, the expression distributions z ₁ and z ₂. In particular, assume that z ₁ and z ₂ have the following formats:

Where μ represents the expectation and σ represents the standard deviation.

logσ²＝H^x·W_σ+b_σ (4)

z＝μ+σ⊙∈ (5)

Given z ₁,z₂ and H ^c, a final reply is generated:

The semantic implicit space representation and the expression implicit space representation are kept away as far as possible, the model learns the decoupled representation as far as possible, and the final objective function is as follows:

at the same time, this step generates a dialogue reply at this stage.

The specific method of the step 2) is as follows:

Scoring by a semantic diversity scoring function and an expression diversity scoring function according to the replies generated in the previous step:

(1) Semantic diversity scoring function:

Firstly, a classifier is pre-trained on sentences and corresponding semantic tags on a training data set, in a scoring stage, the correct dialogue and generated replies are labeled by the pre-trained classifier, and the similarity of the cosine is calculated by using dialogue history and replies. If the correct dialogue reply semantic label is consistent with the generated dialogue reply semantic label, the dialogue reply semantic label is divided into 0; if the semantic tags are not consistent and the similarity of the dialog history and the generated reply is higher than the similarity of the correct reply, the score is 1. Otherwise, if the semantic tags are not consistent and the similarity of the dialog history and the generated reply is less than the similarity of the correct reply, the score is-1.

(2) Expression diversity scoring function:

and scoring by adopting distinct reply diversity evaluation indexes, and calculating the degree of expression diversity by calculating the number of different words in the generated replies. In order to compare the generation of too long sentences, regularization is performed with sentence length as a scoring function of the final expression diversity.

And continuously feeding back the model of the previous step according to the scoring of semantic diversity and expression diversity, and updating parameters.

Claims

1. The method for enhancing the diversity dialogue data based on reinforcement learning is characterized by comprising the following steps of:

H^c＝E(H)+G(E(H)+MultiHead(E(H)，E(H)，E(H) (1)

Wherein, Semantic meaning and expressed vector representation corresponding to dialogue history; μ represents the expectation, σ represents the standard deviation;

For the following And/>The calculation is as follows, where W represents the weight and b represents the bias:

logσ²＝H^x·W_σ+b_σ (4)

z＝μ+σ⊙∈ (5)

given z ₁,z₂ and H ^c, a final reply is generated:

At the same time, this step will generate a dialogue reply at this stage;

(1) Semantic diversity scoring function:

(2) Expression diversity scoring function:

Scoring by using a reply diversity evaluation index distinct, calculating the degree of expression diversity by calculating the number of different words in generated replies, and regularizing by using sentence length to be used as a scoring function of final expression diversity for generating a sentence with too long length;

continuously feeding back to the model of the previous step according to the scoring of semantic diversity and expression diversity, and updating the parameter theta;