CN113626560B - Diversity dialogue data enhancement method based on reinforcement learning - Google Patents

Diversity dialogue data enhancement method based on reinforcement learning Download PDF

Info

Publication number
CN113626560B
CN113626560B CN202110885428.9A CN202110885428A CN113626560B CN 113626560 B CN113626560 B CN 113626560B CN 202110885428 A CN202110885428 A CN 202110885428A CN 113626560 B CN113626560 B CN 113626560B
Authority
CN
China
Prior art keywords
dialogue
diversity
semantic
expression
replies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110885428.9A
Other languages
Chinese (zh)
Other versions
CN113626560A (en
Inventor
陈廷伟
侯昊辰
魏家富
胡玥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning University
Original Assignee
Liaoning University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning University filed Critical Liaoning University
Priority to CN202110885428.9A priority Critical patent/CN113626560B/en
Publication of CN113626560A publication Critical patent/CN113626560A/en
Application granted granted Critical
Publication of CN113626560B publication Critical patent/CN113626560B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

A diversity dialogue data enhancement method based on reinforcement learning comprises the following steps: 1) Given an input dialog history, collecting sets of replies of different semantics or different expressions under the same dialog history, wherein the two sets are used for generating semantics and expression hidden spaces of dialog replies, sampling the two hidden spaces respectively, and generating final replies by combining dialog history coding information; 2) Two unique discriminators are used for discriminating the semantic and expression diversity of the generated sentences, so that the generated reply is ensured to be in the semantic and expression diversity; 3) And continuously and circularly training the model to finally obtain high-quality diversified dialogue samples, thereby achieving the purpose of data enhancement. The invention provides the diversity dialogue data enhancement method based on reinforcement learning, which ensures consistency of dialogue histories and enhances diversity of dialogue samples in terms of expression and semanteme.

Description

Diversity dialogue data enhancement method based on reinforcement learning
Technical Field
The invention belongs to the field of text generation in natural language processing, and particularly relates to a diversity dialogue data enhancement method based on reinforcement learning.
Background
With the development of deep learning technology, man-machine conversation robot construction has made great progress. However, the construction of the current conversation robot requires a large amount of high-quality data covering various conversation scenes, and the current environment lacks high-quality conversation data. The work of modeling high quality conversational robots over time can be divided into two aspects, small sample Learning (Few-Shot Learning) and data enhancement (Data Augmentation). A method of achieving rapid training and Learning of a model using high quality training samples is called small sample Learning (Few-Shot Learning). The small sample learning is very suitable for the human learning process, and for new task learning, a human can learn quickly by only needing a small number of samples. But in the same dialogue state in the actual scene, both people and machines have rich and diverse reactions. Models that learn from only a small number of samples, while allowing for rapid training, are not robust.
Compared with small sample learning, the data enhancement mode not only can enhance the training scale of data, but also can well balance the data distribution of training corpus and reduce the deviation of the data distribution. The dialogue-based data enhancement mode mainly uses a repeated method to enhance the diversity of expression modes for sentences with similar semantics. However, the current dialogue-based data enhancement mode does not consider the consistency of dialogue history at the same time, and the data enhancement model does not sufficiently consider the diversity problem of generated replies in terms of expression and semantics, so that the generated samples cannot fit a real dialogue scene, and training of the model is biased.
Disclosure of Invention
The invention aims to provide a diversity dialogue data enhancement method based on reinforcement learning, which not only ensures consistency of generated replies and dialogue histories, but also enhances diversity of dialogue samples from expression and semanteme.
In order to achieve the above purpose, the invention adopts the following technical scheme:
A diversity dialogue data enhancement method based on reinforcement learning comprises the following steps:
1) Given the input dialogue history, collecting different semantic sets or reply sets with different expressions under the same dialogue history, wherein the two sets are used for generating the semantic and expression hidden spaces of dialogue replies, sampling the two hidden spaces respectively, and generating final replies by combining dialogue history coding information;
2) Using two unique discriminators to discriminate the semantic meaning and the expression diversity of the generated sentences;
3) And continuously and circularly training the model to finally obtain high-quality diversified dialogue samples.
The specific method of the step 1) is as follows:
using H to represent dialogue history, R to represent corresponding real reply, firstly using three different encoders to encode dialogue history, semantics and expression, the encoder adopts a transducer to encode:
Hc=E(H)+G(E(H)+MultiHead(E(H),E(H),E(H) (1)
Given the learned dialog semantics, the expressed vector representation, then the implicit semantics are obtained by adopting a self-encoder mode, and the expression distributions z 1 and z 2 are expressed; specifically, assume that z 1 and z 2 have the following formats:
Wherein, Semantic meaning and expressed vector representation corresponding to dialogue history; μ represents the expectation, σ represents the standard deviation.
For the followingAnd/>The calculation is as follows, where W represents the weight and b represents the bias:
logσ2=Hx·Wσ+bσ (4)
z=μ+σ⊙∈ (5)
given z 1,z2 and H c, a final reply is generated:
the semantic implicit space representation and the expression implicit space representation are kept away as far as possible, the model learns the decoupled representation as far as possible, and the final objective function is:
at the same time, this step generates a dialogue reply at this stage.
The specific method of the step 2) is as follows:
Scoring by a semantic diversity scoring function and an expression diversity scoring function according to the replies generated in step 1):
(1) Semantic diversity scoring function:
firstly, a classifier is pre-trained on sentences and corresponding semantic tags on a training data set, in a scoring stage, the correct dialogue and generated replies are labeled by the pre-trained classifier, and meanwhile, the similarity of cosine is calculated by using dialogue history and replies:
If the correct dialogue reply semantic label is consistent with the generated dialogue reply semantic label, the dialogue reply semantic label is divided into 0; if the semantic tags are inconsistent and the cosine similarity of the dialogue history and the generated replies is higher than the cosine similarity of the correct replies, the semantic tags are classified as 1; if the semantic tags are inconsistent and the cosine similarity of the dialogue history and the generated replies is smaller than the cosine similarity of the correct replies, the semantic tags are classified as-1;
(2) Expression diversity scoring function:
And scoring by adopting a reply diversity evaluation index distinct, calculating the degree of expression diversity by calculating the number of different words in generated replies, and regularizing by utilizing the sentence length to be used as a scoring function of final expression diversity for generating a sentence with too long length.
And continuously feeding back the model of the previous step according to the scoring of semantic diversity and expression diversity, and updating the parameter theta.
The invention has the beneficial effects that:
The invention has the advantages that the improved model learns the hidden space of the dialogue reply on the semantics and the expression, and realizes the diversity of the dialogue reply on the two dimensions in a sampling mode. Compared with learning implicit vector representation, the method is easier to generate high-quality conversation samples with wide coverage. The invention designs a unique scoring model aiming at specific dimension characteristics, not only considers the authenticity of the generated sample, but also ensures that the generated sample is consistent with the dialogue history compared with the real sample, and generates a high-quality dialogue sample.
Drawings
FIG. 1 is a diagram of a framework of a diversity dialogue data enhancement technique for reinforcement learning.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention.
Example 1:
In order to ensure that the generated replies are consistent with the dialogue history and simultaneously enhance the diversity of dialogue samples from expression and semantics, the present disclosure provides a diversity dialogue data enhancement technology based on reinforcement learning, and the framework is shown in fig. 1, and the specific scheme is as follows:
given an input dialog history, a collection of replies of different semantics or different expressions under the same dialog history is collected, the two collections are used to generate the semantics and expression implication space of the dialog replies, the two implication spaces are respectively sampled, and the final replies are generated by combining the dialog history coding information. In order to ensure the diversity of the generated reply in the meaning and the expression, two unique discriminators are designed to discriminate the diversity of the meaning and the expression of the generated sentences.
As shown in fig. 1, the dialog history (dialogue history) and the corresponding true response (response) are represented using H and R, respectively, and the dialog history, semantic, expression is first encoded using three different encoders (Encoder), the Encoder encoder encoding with a transducer:
Hc=E(H)+G(E(H)+MultiHead(E(H),E(H),E(H) (1)
e (-) represents the word embedding (embedding) vector representation corresponding to the dialog history. Specifically, the semantics, expressions, and overall vector representations corresponding to the dialog history are obtained by using a transducer operation, respectively, and are represented as
Given the learned dialog semantics, the expressed vector representations then derive implicit semantics in a self-encoder fashion, the expression distributions z 1 and z 2. In particular, assume that z 1 and z 2 have the following formats:
Where μ represents the expectation and σ represents the standard deviation.
For the followingAnd/>The calculation is as follows, where W represents the weight and b represents the bias:
logσ2=Hx·Wσ+bσ (4)
z=μ+σ⊙∈ (5)
Given z 1,z2 and H c, a final reply is generated:
The semantic implicit space representation and the expression implicit space representation are kept away as far as possible, the model learns the decoupled representation as far as possible, and the final objective function is as follows:
at the same time, this step generates a dialogue reply at this stage.
The specific method of the step 2) is as follows:
Scoring by a semantic diversity scoring function and an expression diversity scoring function according to the replies generated in the previous step:
(1) Semantic diversity scoring function:
Firstly, a classifier is pre-trained on sentences and corresponding semantic tags on a training data set, in a scoring stage, the correct dialogue and generated replies are labeled by the pre-trained classifier, and the similarity of the cosine is calculated by using dialogue history and replies. If the correct dialogue reply semantic label is consistent with the generated dialogue reply semantic label, the dialogue reply semantic label is divided into 0; if the semantic tags are not consistent and the similarity of the dialog history and the generated reply is higher than the similarity of the correct reply, the score is 1. Otherwise, if the semantic tags are not consistent and the similarity of the dialog history and the generated reply is less than the similarity of the correct reply, the score is-1.
(2) Expression diversity scoring function:
and scoring by adopting distinct reply diversity evaluation indexes, and calculating the degree of expression diversity by calculating the number of different words in the generated replies. In order to compare the generation of too long sentences, regularization is performed with sentence length as a scoring function of the final expression diversity.
And continuously feeding back the model of the previous step according to the scoring of semantic diversity and expression diversity, and updating parameters.

Claims (1)

1. The method for enhancing the diversity dialogue data based on reinforcement learning is characterized by comprising the following steps of:
1) Given the input dialogue history, collecting different semantic sets or reply sets with different expressions under the same dialogue history, wherein the two sets are used for generating the semantic and expression hidden spaces of dialogue replies, sampling the two hidden spaces respectively, and generating final replies by combining dialogue history coding information;
using H to represent dialogue history, R to represent corresponding real reply, firstly using three different encoders to encode dialogue history, semantics and expression, the encoder adopts a transducer to encode:
Hc=E(H)+G(E(H)+MultiHead(E(H),E(H),E(H) (1)
Given the learned dialog semantics, the expressed vector representation, then the implicit semantics are obtained by adopting a self-encoder mode, and the expression distributions z 1 and z 2 are expressed; specifically, assume that z 1 and z 2 have the following formats:
Wherein, Semantic meaning and expressed vector representation corresponding to dialogue history; μ represents the expectation, σ represents the standard deviation;
For the following And/>The calculation is as follows, where W represents the weight and b represents the bias:
logσ2=Hx·Wσ+bσ (4)
z=μ+σ⊙∈ (5)
given z 1,z2 and H c, a final reply is generated:
the semantic implicit space representation and the expression implicit space representation are kept away as far as possible, the model learns the decoupled representation as far as possible, and the final objective function is:
At the same time, this step will generate a dialogue reply at this stage;
2) Using two unique discriminators to discriminate the semantic meaning and the expression diversity of the generated sentences;
Scoring by a semantic diversity scoring function and an expression diversity scoring function according to the replies generated in step 1):
(1) Semantic diversity scoring function:
firstly, a classifier is pre-trained on sentences and corresponding semantic tags on a training data set, in a scoring stage, the correct dialogue and generated replies are labeled by the pre-trained classifier, and meanwhile, the similarity of cosine is calculated by using dialogue history and replies:
If the correct dialogue reply semantic label is consistent with the generated dialogue reply semantic label, the dialogue reply semantic label is divided into 0; if the semantic tags are inconsistent and the cosine similarity of the dialogue history and the generated replies is higher than the cosine similarity of the correct replies, the semantic tags are classified as 1; if the semantic tags are inconsistent and the cosine similarity of the dialogue history and the generated replies is smaller than the cosine similarity of the correct replies, the semantic tags are classified as-1;
(2) Expression diversity scoring function:
Scoring by using a reply diversity evaluation index distinct, calculating the degree of expression diversity by calculating the number of different words in generated replies, and regularizing by using sentence length to be used as a scoring function of final expression diversity for generating a sentence with too long length;
continuously feeding back to the model of the previous step according to the scoring of semantic diversity and expression diversity, and updating the parameter theta;
3) And continuously and circularly training the model to finally obtain high-quality diversified dialogue samples.
CN202110885428.9A 2021-08-03 2021-08-03 Diversity dialogue data enhancement method based on reinforcement learning Active CN113626560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110885428.9A CN113626560B (en) 2021-08-03 2021-08-03 Diversity dialogue data enhancement method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110885428.9A CN113626560B (en) 2021-08-03 2021-08-03 Diversity dialogue data enhancement method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN113626560A CN113626560A (en) 2021-11-09
CN113626560B true CN113626560B (en) 2024-05-07

Family

ID=78382406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110885428.9A Active CN113626560B (en) 2021-08-03 2021-08-03 Diversity dialogue data enhancement method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN113626560B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992657A (en) * 2019-04-03 2019-07-09 浙江大学 A kind of interactive problem generation method based on reinforcing Dynamic Inference
CN110737764A (en) * 2019-10-24 2020-01-31 西北工业大学 personalized dialogue content generating method
CN112199481A (en) * 2020-09-30 2021-01-08 中国人民大学 Single-user personalized dialogue method and system adopting PCC dialogue model
CN113158665A (en) * 2021-04-02 2021-07-23 西安交通大学 Method for generating text abstract and generating bidirectional corpus-based improved dialog text

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885756B (en) * 2016-09-30 2020-05-08 华为技术有限公司 Deep learning-based dialogue method, device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992657A (en) * 2019-04-03 2019-07-09 浙江大学 A kind of interactive problem generation method based on reinforcing Dynamic Inference
CN110737764A (en) * 2019-10-24 2020-01-31 西北工业大学 personalized dialogue content generating method
CN112199481A (en) * 2020-09-30 2021-01-08 中国人民大学 Single-user personalized dialogue method and system adopting PCC dialogue model
CN113158665A (en) * 2021-04-02 2021-07-23 西安交通大学 Method for generating text abstract and generating bidirectional corpus-based improved dialog text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于记忆网络的多轮对话下的意图识别方法;杨成彪;吕荣荣;吴刚;;电子技术与软件工程(10);全文 *

Also Published As

Publication number Publication date
CN113626560A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
CN111859978B (en) Deep learning-based emotion text generation method
CN110609891B (en) Visual dialog generation method based on context awareness graph neural network
CN110321418B (en) Deep learning-based field, intention recognition and groove filling method
CN111783459A (en) Laos named entity recognition method based on improved transform + CRF
CN111897933A (en) Emotional dialogue generation method and device and emotional dialogue model training method and device
CN112417894B (en) Conversation intention identification method and system based on multi-task learning
CN111966800A (en) Emotional dialogue generation method and device and emotional dialogue model training method and device
CN113065331A (en) Entity emotion recognition method and system based on entity context discrimination
CN113032601A (en) Zero sample sketch retrieval method based on discriminant improvement
CN115563314A (en) Knowledge graph representation learning method for multi-source information fusion enhancement
CN114444519A (en) Emotional dialogue generation method based on Seq2Seq model
CN112651225B (en) Multi-item selection machine reading understanding method based on multi-stage maximum attention
CN116561325B (en) Multi-language fused media text emotion analysis method
CN113626560B (en) Diversity dialogue data enhancement method based on reinforcement learning
CN113222002A (en) Zero sample classification method based on generative discriminative contrast optimization
CN116958700A (en) Image classification method based on prompt engineering and contrast learning
CN113946670B (en) Contrast type context understanding enhancement method for dialogue emotion recognition
CN113901172B (en) Case-related microblog evaluation object extraction method based on keyword structural coding
CN116028606A (en) Human-machine multi-round dialogue rewriting method based on transform pointer extraction
CN115422329A (en) Knowledge-driven multi-channel screening fusion dialogue generation method
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
CN114548117A (en) Cause-and-effect relation extraction method based on BERT semantic enhancement
CN114239575A (en) Statement analysis model construction method, statement analysis method, device, medium and computing equipment
CN110619118B (en) Automatic text generation method
CN115587909A (en) Judicial text data amplification method based on generating type confrontation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant