CN113626560B - Diversity dialogue data enhancement method based on reinforcement learning - Google Patents
Diversity dialogue data enhancement method based on reinforcement learning Download PDFInfo
- Publication number
- CN113626560B CN113626560B CN202110885428.9A CN202110885428A CN113626560B CN 113626560 B CN113626560 B CN 113626560B CN 202110885428 A CN202110885428 A CN 202110885428A CN 113626560 B CN113626560 B CN 113626560B
- Authority
- CN
- China
- Prior art keywords
- dialogue
- diversity
- semantic
- expression
- replies
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 230000002787 reinforcement Effects 0.000 title claims abstract description 10
- 230000014509 gene expression Effects 0.000 claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 12
- 238000005070 sampling Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 18
- 238000009826 distribution Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims 1
- 238000010276 construction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
A diversity dialogue data enhancement method based on reinforcement learning comprises the following steps: 1) Given an input dialog history, collecting sets of replies of different semantics or different expressions under the same dialog history, wherein the two sets are used for generating semantics and expression hidden spaces of dialog replies, sampling the two hidden spaces respectively, and generating final replies by combining dialog history coding information; 2) Two unique discriminators are used for discriminating the semantic and expression diversity of the generated sentences, so that the generated reply is ensured to be in the semantic and expression diversity; 3) And continuously and circularly training the model to finally obtain high-quality diversified dialogue samples, thereby achieving the purpose of data enhancement. The invention provides the diversity dialogue data enhancement method based on reinforcement learning, which ensures consistency of dialogue histories and enhances diversity of dialogue samples in terms of expression and semanteme.
Description
Technical Field
The invention belongs to the field of text generation in natural language processing, and particularly relates to a diversity dialogue data enhancement method based on reinforcement learning.
Background
With the development of deep learning technology, man-machine conversation robot construction has made great progress. However, the construction of the current conversation robot requires a large amount of high-quality data covering various conversation scenes, and the current environment lacks high-quality conversation data. The work of modeling high quality conversational robots over time can be divided into two aspects, small sample Learning (Few-Shot Learning) and data enhancement (Data Augmentation). A method of achieving rapid training and Learning of a model using high quality training samples is called small sample Learning (Few-Shot Learning). The small sample learning is very suitable for the human learning process, and for new task learning, a human can learn quickly by only needing a small number of samples. But in the same dialogue state in the actual scene, both people and machines have rich and diverse reactions. Models that learn from only a small number of samples, while allowing for rapid training, are not robust.
Compared with small sample learning, the data enhancement mode not only can enhance the training scale of data, but also can well balance the data distribution of training corpus and reduce the deviation of the data distribution. The dialogue-based data enhancement mode mainly uses a repeated method to enhance the diversity of expression modes for sentences with similar semantics. However, the current dialogue-based data enhancement mode does not consider the consistency of dialogue history at the same time, and the data enhancement model does not sufficiently consider the diversity problem of generated replies in terms of expression and semantics, so that the generated samples cannot fit a real dialogue scene, and training of the model is biased.
Disclosure of Invention
The invention aims to provide a diversity dialogue data enhancement method based on reinforcement learning, which not only ensures consistency of generated replies and dialogue histories, but also enhances diversity of dialogue samples from expression and semanteme.
In order to achieve the above purpose, the invention adopts the following technical scheme:
A diversity dialogue data enhancement method based on reinforcement learning comprises the following steps:
1) Given the input dialogue history, collecting different semantic sets or reply sets with different expressions under the same dialogue history, wherein the two sets are used for generating the semantic and expression hidden spaces of dialogue replies, sampling the two hidden spaces respectively, and generating final replies by combining dialogue history coding information;
2) Using two unique discriminators to discriminate the semantic meaning and the expression diversity of the generated sentences;
3) And continuously and circularly training the model to finally obtain high-quality diversified dialogue samples.
The specific method of the step 1) is as follows:
using H to represent dialogue history, R to represent corresponding real reply, firstly using three different encoders to encode dialogue history, semantics and expression, the encoder adopts a transducer to encode:
Hc=E(H)+G(E(H)+MultiHead(E(H),E(H),E(H) (1)
Given the learned dialog semantics, the expressed vector representation, then the implicit semantics are obtained by adopting a self-encoder mode, and the expression distributions z 1 and z 2 are expressed; specifically, assume that z 1 and z 2 have the following formats:
Wherein, Semantic meaning and expressed vector representation corresponding to dialogue history; μ represents the expectation, σ represents the standard deviation.
For the followingAnd/>The calculation is as follows, where W represents the weight and b represents the bias:
logσ2=Hx·Wσ+bσ (4)
z=μ+σ⊙∈ (5)
given z 1,z2 and H c, a final reply is generated:
the semantic implicit space representation and the expression implicit space representation are kept away as far as possible, the model learns the decoupled representation as far as possible, and the final objective function is:
at the same time, this step generates a dialogue reply at this stage.
The specific method of the step 2) is as follows:
Scoring by a semantic diversity scoring function and an expression diversity scoring function according to the replies generated in step 1):
(1) Semantic diversity scoring function:
firstly, a classifier is pre-trained on sentences and corresponding semantic tags on a training data set, in a scoring stage, the correct dialogue and generated replies are labeled by the pre-trained classifier, and meanwhile, the similarity of cosine is calculated by using dialogue history and replies:
If the correct dialogue reply semantic label is consistent with the generated dialogue reply semantic label, the dialogue reply semantic label is divided into 0; if the semantic tags are inconsistent and the cosine similarity of the dialogue history and the generated replies is higher than the cosine similarity of the correct replies, the semantic tags are classified as 1; if the semantic tags are inconsistent and the cosine similarity of the dialogue history and the generated replies is smaller than the cosine similarity of the correct replies, the semantic tags are classified as-1;
(2) Expression diversity scoring function:
And scoring by adopting a reply diversity evaluation index distinct, calculating the degree of expression diversity by calculating the number of different words in generated replies, and regularizing by utilizing the sentence length to be used as a scoring function of final expression diversity for generating a sentence with too long length.
And continuously feeding back the model of the previous step according to the scoring of semantic diversity and expression diversity, and updating the parameter theta.
The invention has the beneficial effects that:
The invention has the advantages that the improved model learns the hidden space of the dialogue reply on the semantics and the expression, and realizes the diversity of the dialogue reply on the two dimensions in a sampling mode. Compared with learning implicit vector representation, the method is easier to generate high-quality conversation samples with wide coverage. The invention designs a unique scoring model aiming at specific dimension characteristics, not only considers the authenticity of the generated sample, but also ensures that the generated sample is consistent with the dialogue history compared with the real sample, and generates a high-quality dialogue sample.
Drawings
FIG. 1 is a diagram of a framework of a diversity dialogue data enhancement technique for reinforcement learning.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention.
Example 1:
In order to ensure that the generated replies are consistent with the dialogue history and simultaneously enhance the diversity of dialogue samples from expression and semantics, the present disclosure provides a diversity dialogue data enhancement technology based on reinforcement learning, and the framework is shown in fig. 1, and the specific scheme is as follows:
given an input dialog history, a collection of replies of different semantics or different expressions under the same dialog history is collected, the two collections are used to generate the semantics and expression implication space of the dialog replies, the two implication spaces are respectively sampled, and the final replies are generated by combining the dialog history coding information. In order to ensure the diversity of the generated reply in the meaning and the expression, two unique discriminators are designed to discriminate the diversity of the meaning and the expression of the generated sentences.
As shown in fig. 1, the dialog history (dialogue history) and the corresponding true response (response) are represented using H and R, respectively, and the dialog history, semantic, expression is first encoded using three different encoders (Encoder), the Encoder encoder encoding with a transducer:
Hc=E(H)+G(E(H)+MultiHead(E(H),E(H),E(H) (1)
e (-) represents the word embedding (embedding) vector representation corresponding to the dialog history. Specifically, the semantics, expressions, and overall vector representations corresponding to the dialog history are obtained by using a transducer operation, respectively, and are represented as
Given the learned dialog semantics, the expressed vector representations then derive implicit semantics in a self-encoder fashion, the expression distributions z 1 and z 2. In particular, assume that z 1 and z 2 have the following formats:
Where μ represents the expectation and σ represents the standard deviation.
For the followingAnd/>The calculation is as follows, where W represents the weight and b represents the bias:
logσ2=Hx·Wσ+bσ (4)
z=μ+σ⊙∈ (5)
Given z 1,z2 and H c, a final reply is generated:
The semantic implicit space representation and the expression implicit space representation are kept away as far as possible, the model learns the decoupled representation as far as possible, and the final objective function is as follows:
at the same time, this step generates a dialogue reply at this stage.
The specific method of the step 2) is as follows:
Scoring by a semantic diversity scoring function and an expression diversity scoring function according to the replies generated in the previous step:
(1) Semantic diversity scoring function:
Firstly, a classifier is pre-trained on sentences and corresponding semantic tags on a training data set, in a scoring stage, the correct dialogue and generated replies are labeled by the pre-trained classifier, and the similarity of the cosine is calculated by using dialogue history and replies. If the correct dialogue reply semantic label is consistent with the generated dialogue reply semantic label, the dialogue reply semantic label is divided into 0; if the semantic tags are not consistent and the similarity of the dialog history and the generated reply is higher than the similarity of the correct reply, the score is 1. Otherwise, if the semantic tags are not consistent and the similarity of the dialog history and the generated reply is less than the similarity of the correct reply, the score is-1.
(2) Expression diversity scoring function:
and scoring by adopting distinct reply diversity evaluation indexes, and calculating the degree of expression diversity by calculating the number of different words in the generated replies. In order to compare the generation of too long sentences, regularization is performed with sentence length as a scoring function of the final expression diversity.
And continuously feeding back the model of the previous step according to the scoring of semantic diversity and expression diversity, and updating parameters.
Claims (1)
1. The method for enhancing the diversity dialogue data based on reinforcement learning is characterized by comprising the following steps of:
1) Given the input dialogue history, collecting different semantic sets or reply sets with different expressions under the same dialogue history, wherein the two sets are used for generating the semantic and expression hidden spaces of dialogue replies, sampling the two hidden spaces respectively, and generating final replies by combining dialogue history coding information;
using H to represent dialogue history, R to represent corresponding real reply, firstly using three different encoders to encode dialogue history, semantics and expression, the encoder adopts a transducer to encode:
Hc=E(H)+G(E(H)+MultiHead(E(H),E(H),E(H) (1)
Given the learned dialog semantics, the expressed vector representation, then the implicit semantics are obtained by adopting a self-encoder mode, and the expression distributions z 1 and z 2 are expressed; specifically, assume that z 1 and z 2 have the following formats:
Wherein, Semantic meaning and expressed vector representation corresponding to dialogue history; μ represents the expectation, σ represents the standard deviation;
For the following And/>The calculation is as follows, where W represents the weight and b represents the bias:
logσ2=Hx·Wσ+bσ (4)
z=μ+σ⊙∈ (5)
given z 1,z2 and H c, a final reply is generated:
the semantic implicit space representation and the expression implicit space representation are kept away as far as possible, the model learns the decoupled representation as far as possible, and the final objective function is:
At the same time, this step will generate a dialogue reply at this stage;
2) Using two unique discriminators to discriminate the semantic meaning and the expression diversity of the generated sentences;
Scoring by a semantic diversity scoring function and an expression diversity scoring function according to the replies generated in step 1):
(1) Semantic diversity scoring function:
firstly, a classifier is pre-trained on sentences and corresponding semantic tags on a training data set, in a scoring stage, the correct dialogue and generated replies are labeled by the pre-trained classifier, and meanwhile, the similarity of cosine is calculated by using dialogue history and replies:
If the correct dialogue reply semantic label is consistent with the generated dialogue reply semantic label, the dialogue reply semantic label is divided into 0; if the semantic tags are inconsistent and the cosine similarity of the dialogue history and the generated replies is higher than the cosine similarity of the correct replies, the semantic tags are classified as 1; if the semantic tags are inconsistent and the cosine similarity of the dialogue history and the generated replies is smaller than the cosine similarity of the correct replies, the semantic tags are classified as-1;
(2) Expression diversity scoring function:
Scoring by using a reply diversity evaluation index distinct, calculating the degree of expression diversity by calculating the number of different words in generated replies, and regularizing by using sentence length to be used as a scoring function of final expression diversity for generating a sentence with too long length;
continuously feeding back to the model of the previous step according to the scoring of semantic diversity and expression diversity, and updating the parameter theta;
3) And continuously and circularly training the model to finally obtain high-quality diversified dialogue samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110885428.9A CN113626560B (en) | 2021-08-03 | 2021-08-03 | Diversity dialogue data enhancement method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110885428.9A CN113626560B (en) | 2021-08-03 | 2021-08-03 | Diversity dialogue data enhancement method based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113626560A CN113626560A (en) | 2021-11-09 |
CN113626560B true CN113626560B (en) | 2024-05-07 |
Family
ID=78382406
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110885428.9A Active CN113626560B (en) | 2021-08-03 | 2021-08-03 | Diversity dialogue data enhancement method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113626560B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992657A (en) * | 2019-04-03 | 2019-07-09 | 浙江大学 | A kind of interactive problem generation method based on reinforcing Dynamic Inference |
CN110737764A (en) * | 2019-10-24 | 2020-01-31 | 西北工业大学 | personalized dialogue content generating method |
CN112199481A (en) * | 2020-09-30 | 2021-01-08 | 中国人民大学 | Single-user personalized dialogue method and system adopting PCC dialogue model |
CN113158665A (en) * | 2021-04-02 | 2021-07-23 | 西安交通大学 | Method for generating text abstract and generating bidirectional corpus-based improved dialog text |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885756B (en) * | 2016-09-30 | 2020-05-08 | 华为技术有限公司 | Deep learning-based dialogue method, device and equipment |
-
2021
- 2021-08-03 CN CN202110885428.9A patent/CN113626560B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992657A (en) * | 2019-04-03 | 2019-07-09 | 浙江大学 | A kind of interactive problem generation method based on reinforcing Dynamic Inference |
CN110737764A (en) * | 2019-10-24 | 2020-01-31 | 西北工业大学 | personalized dialogue content generating method |
CN112199481A (en) * | 2020-09-30 | 2021-01-08 | 中国人民大学 | Single-user personalized dialogue method and system adopting PCC dialogue model |
CN113158665A (en) * | 2021-04-02 | 2021-07-23 | 西安交通大学 | Method for generating text abstract and generating bidirectional corpus-based improved dialog text |
Non-Patent Citations (1)
Title |
---|
一种基于记忆网络的多轮对话下的意图识别方法;杨成彪;吕荣荣;吴刚;;电子技术与软件工程(10);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113626560A (en) | 2021-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111859978B (en) | Deep learning-based emotion text generation method | |
CN110609891B (en) | Visual dialog generation method based on context awareness graph neural network | |
CN110321418B (en) | Deep learning-based field, intention recognition and groove filling method | |
CN111783459A (en) | Laos named entity recognition method based on improved transform + CRF | |
CN111897933A (en) | Emotional dialogue generation method and device and emotional dialogue model training method and device | |
CN112417894B (en) | Conversation intention identification method and system based on multi-task learning | |
CN111966800A (en) | Emotional dialogue generation method and device and emotional dialogue model training method and device | |
CN113065331A (en) | Entity emotion recognition method and system based on entity context discrimination | |
CN113032601A (en) | Zero sample sketch retrieval method based on discriminant improvement | |
CN115563314A (en) | Knowledge graph representation learning method for multi-source information fusion enhancement | |
CN114444519A (en) | Emotional dialogue generation method based on Seq2Seq model | |
CN112651225B (en) | Multi-item selection machine reading understanding method based on multi-stage maximum attention | |
CN116561325B (en) | Multi-language fused media text emotion analysis method | |
CN113626560B (en) | Diversity dialogue data enhancement method based on reinforcement learning | |
CN113222002A (en) | Zero sample classification method based on generative discriminative contrast optimization | |
CN116958700A (en) | Image classification method based on prompt engineering and contrast learning | |
CN113946670B (en) | Contrast type context understanding enhancement method for dialogue emotion recognition | |
CN113901172B (en) | Case-related microblog evaluation object extraction method based on keyword structural coding | |
CN116028606A (en) | Human-machine multi-round dialogue rewriting method based on transform pointer extraction | |
CN115422329A (en) | Knowledge-driven multi-channel screening fusion dialogue generation method | |
CN115510230A (en) | Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism | |
CN114548117A (en) | Cause-and-effect relation extraction method based on BERT semantic enhancement | |
CN114239575A (en) | Statement analysis model construction method, statement analysis method, device, medium and computing equipment | |
CN110619118B (en) | Automatic text generation method | |
CN115587909A (en) | Judicial text data amplification method based on generating type confrontation network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |