CN116127051A

CN116127051A - Dialogue generation method based on deep learning, electronic equipment and storage medium

Info

Publication number: CN116127051A
Application number: CN202310428793.6A
Authority: CN
Inventors: 万之蕴; 何向南
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-05-16
Anticipated expiration: 2043-04-20
Also published as: CN116127051B

Abstract

The invention discloses a dialogue generation method based on deep learning, electronic equipment and a storage medium, wherein the method comprises the following steps: 1. constructing a dialogue generation data set based on retrieval editing; 2. constructing a dialogue generating model consisting of a skeleton generator, a skeleton response generator, an interference response generator and a response fusion module, and training; 3. and generating corresponding replies to any queries input by the user by using the trained models. According to the invention, the template response is obtained through retrieval and the response skeleton is constructed so as to eliminate the interference of useless information in the template response, and then the response skeleton is edited so as to generate a final response, so that the dialogue system can generate a response which is more attached to the context and has richer semantics, and the problem of 'safety response' is relieved.

Description

Dialogue generation method based on deep learning, electronic equipment and storage medium

Technical Field

The invention belongs to the field of natural language processing, relates to the technical fields of dialogue systems, deep learning and the like, and particularly relates to a dialogue generation method based on deep learning, electronic equipment and a storage medium.

Background

With the rapid development of artificial intelligence and man-machine interaction, a dialogue system or a dialogue robot is applied to more and more service scenes, and artificial services are replaced to a certain extent. Current dialog systems can be classified into an open domain dialog system and a task dialog system according to the usage scenario. Task dialogue systems are designed to accomplish a specific task or goal, such as customer service robot nectar and intelligent assistant siri; open domain conversations are generally based on boring conversations, which are not aimed at performing specific tasks, but rather for natural and fluent communication with humans.

Compared with the task type dialogue system, the dialogue theme of the open domain dialogue system is open, and a wider topic and a more complex sentence pattern are covered. According to the construction method, existing open domain dialog systems can be classified into two types, a generation-based dialog system and a retrieval-based dialog system. Wherein the search-based approach selects a response from the existing corpus, and thus its performance is severely limited by predefined indexing rules. With the development of deep learning, dialog systems based on the generation have become increasingly popular in recent years. Deep learning models based on sequence-to-sequence (seq 2 seq) have found wide application in single-round dialog generation. However, conventional sequence-to-sequence based dialog generation models often fail to generate responses that are word-rich, content-rich, and information-rich. In practice, such models often tend to generate popular but tedious replies, such as "I don't know" or "I consider as such". This problem is also known as the "safety response" problem.

Recent efforts have attempted to use information retrieval techniques to fill in the deficiencies of insufficient information in dialog generation. In conventional search-based dialog systems, the data sets are constructed based on human dialog, from which replies are retrieved that are generally grammatically correct and semantically rich. For a given context, similar dialogs are retrieved from the corpus and considered as additional sources of information in the generative dialog system, introducing more rich semantics and sentence patterns to a certain extent, so that the generated replies can improve the "safety response" problem of the generative model to a certain extent. However, when the retrieved reply is similar to the original reply, the generative model tends to replicate the reference reply without making necessary modifications to the reply. In the opposite case, i.e. when the retrieved reply is independent of the original reply, a lot of information is acquired and interference independent of the current dialog context is introduced, resulting in non-ideal model performance.

Disclosure of Invention

The invention aims to solve the defects of the prior art, and provides a dialogue generating method, electronic equipment and a storage medium based on deep learning, which are used for combining a search dialogue model and a generation dialogue model to introduce external information into dialogue generation so as to relieve the problem of safety response of the generation dialogue system, thereby obtaining a smooth response generating result with rich information.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

the invention relates to a dialogue generating method based on deep learning, which is characterized by comprising the following steps:

step 1, constructing a dialogue generation data set based on retrieval editing;

step 1.1, acquiring a query text setQCorresponding response text setROrder-makingqRepresenting a set of query textQAny one of the queries, orderrRepresenting queriesqA corresponding response;

step 1.2, retrieving and respondingrSimilar template responser’And get and connectr’Corresponding template queryq’Thereby composing a dialogue data setDA four-element group @ inr, q, r’, q’)；

Step 2, constructing a skeleton generatorGSkeleton response generatorG _T Interference response generatorG _S And a dialogue generating model formed by the response fusion module and training;

step 2.1, use skeleton GeneratorGResponse from templater’Middle separation response frameworktVocabulary of interferencesThereby obtainingInterfering vocabulary to all template responses and forming a vector representation setSThe method comprises the steps of carrying out a first treatment on the surface of the And from the vector representation setSRandomly selecting interference wordss’Vector representation of (a)Hs _’ ；

Step 2.2 response-basedrVocabulary of interferences’QueryqUsing an interference response generatorG _S Obtaining a response generation result interference responser _s’ ；

Step 2.3 response-basedrResponse skeletontQueryqUsing a generator of interference responses with the transmitterG _S Skeletal response generator with identical structureG _T Obtaining a response generation result skeleton responser _t ；

Step 2.4, the response fusion module utilizes (8) to respond to interferencer _s’ And skeleton responser _t Fusion is carried out to obtain fusion responser _s,t ：

r _s,t = r _t ⊙σ(r _s’ ) (8)

In the formula (8), the amino acid sequence of the compound,σindicates a sigmoid function, ++indicates a dot product;

step 2.5, constructing a skeleton generator by using the formula (9)GSkeleton response generatorG _T Is a loss function of (2)R _DIR ：

R _DIR = E[L(r _s,t , r)] +λ Var[L(r _s,t , r)] (9)

In the formula (9), the amino acid sequence of the compound,Eit is indicated that the desire is to be met,Varthe variance is represented as a function of the variance,λis the parameter of the ultrasonic wave to be used as the ultrasonic wave,L() represents cross entropy loss;

step 2.6, constructing an interference response generator using (10)G _S Is a loss function of (2)R _S ：

R _S = E[L(r _s’ , r)] (10)

Step 2.7 training the dialog generation model by random gradient descent method and calculating the loss functionR _S A kind of electronic device with high-pressure air-conditioning systemR _DIR And when the loss function converges or reaches the maximum training times, stopping training and obtaining a dialogue generating model of the optimal parameters for generating corresponding replies to any query input by the user.

The dialogue generating method based on deep learning is also characterized in that the skeleton generator in the step 2.1GConsists of a transducer encoder and a cross-attention layer, and separates response frameworks according to the following processtVocabulary of interferences；

Step 2.1.1 query of the Transformer encoder pairqAnd template responser’Respectively processing to obtain inquiryqVector representation of (a)H _q ={h _q ¹ ,…,h _q ⁱ , …,h _q ^m Sum template responser’Vector representation of (a)H _r’ ={h ¹ _r’ ,…,h ^j _r’ , …,h ⁿ _r’ And } wherein,h _q ⁱ for inquiring aboutqThe hidden vector of the i-th character in (c),mfor inquiring aboutqThe number of characters in (a) is set,h ^j _r’ representing template responsesr’Middle (f)jThe hidden vector of the individual character is used to determine,nresponding to templatesr’The number of characters in (a);

step 2.1.2, calculating a template response by the cross-attention layer using (1)r’Middle (f)jIndividual character pair queryqAttention weight of the i-th character in (b)M _i,j ：

(1)

In the formula (1), the components are as follows,h _q ^k for inquiring aboutqMiddle (f)kThe hidden vector of the individual character is used to determine,score(.) is the attention score, and has:

score (h ^j _r’ ,h _q ^k ) = (h ^j _r’ ) ^T W _att h _q ^k (2)

in the formula (2), the amino acid sequence of the compound,W _att as a parameter to be learned of the cross-attention layer,Tis a transposition;

step 2.1.3, the Cross-attention layer calculates a response skeleton using equation (3) and equation (4)tVector representation of (a)H _t = {h _t ¹ ,…,h _t ^j , …,h _t ⁿ ' interference vocabularysVector representation of (a)H _s = {h _s ¹ ,…,h _s ^j , …,h _s ⁿ }：

(3)

(4)

In the formulas (3) and (4),h _t ^j andh _s ^j respectively represent response skeletonstInterference vocabularysMiddle (f)jHidden vectors of the individual characters.

The interference response generator in said step 2.2G _S Consists of a transducer decoder and a controller; wherein the transducer decoder is composed of an encoding layer, a position encoding layer, a self-attention layer, a cross-attention layer, two normalization layers, a controller and a response generatorForming device; the transform decoder generates a result interference response by obtaining a response according to the following processr _s’ ：

Step 2.2.1 the transducer decoder uses the coding layer, the position coding layer, the self-attention layer and the first normalization layer pair responserProcessing to obtain responserVector representation of (a)H _r ；

Step 2.2.2 the Transformer decoder will respond by crossing the attention layer and the second normalization layerrVector representation of (a)H _r And queryqFusing to obtain a fused response query vector representationH _r,q ；

Step 2.2.3, the controller uses (5) to process the interference vocabularys’Vector representation of (a)H _s’ And (3) withH _r,q Fusing to obtain fused interference fusion vector representationH ^s’ _r,q ：

H ^s’ _r,q =β•LN(H _s’ ) + (1-β)•H _r,q (5)

In the formula (5), the amino acid sequence of the compound,LN(.) represents a standardized layer in the controller;βrepresents the fusion weight and is obtained by the formula (6);

β = σ(W _s • [W _s’ ;H _r,q ]) (6)

in the formula (6), the amino acid sequence of the compound,W _s representing the parameters to be learned of the controller,σrepresenting a sigmoid function;

step 2.2.4, the response generator of the transducer decoder obtaining a response generating a resulting interference response using equation (7)r _s’ ：

r _s’ = Linear(LN’ (FFN (H ^s’ _r,q ) +H ^s’ _r,q )) (7)

In the formula (7), the amino acid sequence of the compound,Linear(.) represents a linear layer in the response generator,LN’(.) represents a normalization layer in the response generator,FFN(.) represents the forward propagation layer in the response generator.

The electronic device of the invention comprises a memory and a processor, wherein the memory is used for storing a program for supporting the processor to execute any one of the dialog generating methods, and the processor is configured to execute the program stored in the memory.

The invention provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program is executed by a processor to execute the steps of any dialog generating method.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a new dialogue generating method, namely, template response is obtained through retrieval and then response generation is carried out on the basis of the template response, the introduction of the template response enables a dialogue system to utilize external information, and human history dialogue is used as a reference, so that the fluency and information quantity of a generated result are improved, and the problem of safety response is relieved to a certain extent.

2. The invention provides a two-stage dialogue generation model based on search editing, namely, a response framework is constructed by searching to obtain template response so as to eliminate interference of useless information in the response framework, and then the response framework is edited to generate a final reply. Compared with the past work, the method and the device have the advantages that the flexibility of the generated model is maintained on the basis of inheriting the fluency and rich information quantity of the search result. Meanwhile, the response framework is generated, so that the generated model can eliminate the interference of irrelevant information in the retrieved template response, and a response which is more fit with the context is generated.

3. The invention introduces a causal intervention method, so that the model can learn a causal mode with environmental invariance, can extract response skeleton help response generation from the retrieved template response, improves the defect that the model can not properly utilize information in the template response in the past research, and can eliminate the influence of interference vocabulary in the template response, thereby better utilizing external information and improving the correlation between the generated response and the query.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of an interference response generator according to the present invention;

FIG. 3 is a causal graph used by the response fusion module of the present invention.

Detailed Description

In this embodiment, as shown in fig. 1, a dialogue generating method based on deep learning is performed according to the following steps:

step 1, constructing a dialogue generation data set based on retrieval editing;

step 1.1, acquiring a query text setQCorresponding response text setROrder-makingqRepresenting a set of query textQAny one of the queries, orderrRepresenting queriesqA corresponding response; in the embodiment, the data sources are China larger network social platform bean and microblog;

step 1.2, retrieving and respondingrSimilar template responser’And get and connectr’Corresponding template queryq’Thereby composing a dialogue data setDA four-element group @ inr, q, r’, q’)。

step 2.1, use skeleton GeneratorGResponse from templater’Middle separation response frameworktVocabulary of interferencesThereby obtaining the interference vocabulary of all template responses and forming a vector representation setSThe method comprises the steps of carrying out a first treatment on the surface of the And from a vector representation setSRandomly selecting interference wordss’Vector representation of (a)Hs _’ ；

Skeleton generatorGConsists of a transducer encoder and a cross-attention layer, and separates response frameworks according to the following processtVocabulary of interferences；

Step 2.1.1 transducer encoder pair queryqAnd template responser’Respectively processing to obtain inquiryqVector representation of (a)H _q ={h _q ¹ ,…,h _q ⁱ , …,h _q ^m Sum template responser’Vector representation of (a)H _r’ ={h ¹ _r’ ,…,h ^j _r’ , …,h ⁿ _r’ And } wherein,h _q ⁱ for inquiring aboutqThe hidden vector of the i-th character in (c),mfor inquiring aboutqThe number of characters in (a) is set,h ^j _r’ representing template responsesr’Middle (f)jThe hidden vector of the individual character is used to determine,nresponding to templatesr’The number of characters in (a);

step 2.1.2 calculating the template response by the Cross attention layer using (1)r’Middle (f)jIndividual character pair queryqAttention weight of the i-th character in (b)M _i,j ：

(1)

score (h ^j _r’ ,h _q ^k ) = (h ^j _r’ ) ^T W _att h _q ^k (2)

step 2.1.3 Cross injectionThe force sense layer calculates a response skeleton by using the formula (3) and the formula (4)tVector representation of (a)H _t = {h _t ¹ ,…,h _t ^j , …,h _t ⁿ ' interference vocabularysVector representation of (a)H _s = {h _s ¹ ,…,h _s ^j , …,h _s ⁿ }：

(3)

(4)

Step 2.2, response based as shown in FIG. 2rVocabulary of interferences’QueryqUsing an interference response generatorG _S Obtaining a response generation result interference responser _s’ ；

Interference response generatorG _S Consists of a transducer decoder and a controller; the transducer decoder consists of a coding layer, a position coding layer, a self-attention layer, a cross-attention layer, two standardization layers, a controller and a response generator; the transducer decoder generates a resulting interference response by obtaining the response as followsr _s’ ；

Step 2.2.1, transformer decoder Using coding layer, position coding layer, self-attention layer and first normalization layer responserProcessing to obtain responserVector representation of (a)H _r ；

Step 2.2.2 the transducer decoder will respond through the cross-attention layer and the second normalization layerrVector table of (a)Showing theH _r And queryqFusing to obtain a fused response query vector representationH _r,q ；

H ^s’ _r,q =β•LN(H _s’ ) + (1-β)•H _r,q (5)

β = σ(W _s • [W _s’ ;H _r,q ]) (6)

step 2.2.4, response generator of the transducer decoder obtaining a response generating a resulting interference response using equation (7)r _s’ ：

r _s’ = Linear(LN’ (FFN (H ^s’ _r,q ) +H ^s’ _r,q )) (7)

Step 2.3 response-basedrResponse skeletontQueryqUsing and interfering response generatorsG _S Skeletal response generator with identical structureG _T Obtaining a response generation result skeleton responser _t 。

Step 2.4, the response fusion module responds to the skeleton based on the causal graph shown in fig. 3r _t Causal intervention; specifically, the response fusion module utilizes (8) to respond to interferencer _s’ And skeleton responser _t Fusion is carried out to obtain fusion responser：

r _s , _t = r _t ⊙σ(r _s’ ) (8)

In the formula (8), the amino acid sequence of the compound,σindicates a sigmoid function, ++indicates a dot product; the causal graph is a directed acyclic graph, consisting of points representing variables and edges representing causal relationships between variables; causal graphs are generally used to describe the interaction mechanism between a set of variables, which show the causal relationships behind data; the dialog generation process of the present invention may be represented by a causal graph as shown in fig. 3; in step 2.4, the interference response is artificially given by causal interventionr _s’ But otherwise still follow the original data generation process shown in fig. 3.

R _DIR = E[L(r _s , _t , r)] +λ Var[L(r _s , _t , r)] (9)

In the formula (9), the amino acid sequence of the compound,Eit is indicated that the desire is to be met,Varthe variance is represented as a function of the variance,λis the parameter of the ultrasonic wave to be used as the ultrasonic wave,L() represents cross entropy loss; loss functionR _DIR Meaning that the model is reducing the response generatedr _s , _t Simultaneously with the error of response r, attempts are made to reduce the effect of external disturbance information on the generated result.

R _S = E[L(r _s’ , r)] (10)

The loss function obtained by using equation (10)R _S For only interference response generatorsG _S Updating the parameters in the step (a); by separating the training of this module from the training of other modules in the method, it is avoided that it is interfering with the presentation of learning; at the same time, the method of parameter updating also promotes the interference response generatorG _S Only non-causal features based on a given interfering vocabulary are learned.

Step 2.7, training the dialogue generating model by utilizing a random gradient descent method, and calculating a loss functionR _S A kind of electronic device with high-pressure air-conditioning systemR _DIR And when the loss function converges or reaches the maximum training times, stopping training and obtaining a dialogue generating model of the optimal parameters for generating corresponding replies to any query input by the user.

The test results of the present invention are further described in connection with the following chart:

in order to verify the effectiveness of the method proposed by the present invention, a comparative test was performed. The experimental results are shown in table 1. Wherein, retrieval is a Retrieval system, seq2Seq is a basic sequence-to-sequence model, BART is a pre-training language model, BART-cat receives template response as input based on BART, ske Re is a dialogue generation model based on the Retrieval system, TSLF is a dialogue generation model based on a Transformer and introducing external knowledge, DG is a method provided by the invention. In Table 1, BLUE-1 is an evaluation index based on word overlap, and is used for comparing the character overlap ratio between the generated text and the reference text, and dist-1 and dist-2 are two evaluation indexes for measuring vocabulary diversity.

TABLE 1

As can be seen from the experimental results in Table 1, the method of the present invention is superior to other models in terms of various indexes, and can better acquire information from the retrieved template response types and generate more various texts.

Claims

1. A dialogue generation method based on deep learning is characterized by comprising the following steps:

step 1, constructing a dialogue generation data set based on retrieval editing;

step 2.1, use skeleton GeneratorGResponse from templater’Middle separation response frameworktVocabulary of interferencesThereby obtaining the interference vocabulary of all template responses and forming a vector representation setSThe method comprises the steps of carrying out a first treatment on the surface of the And from the vector representation setSRandomly selecting interference wordss’Vector representation of (a)Hs _’ ；

Step 2.3 response-basedrResponse skeletontQueryqUsing a noise with the interferenceStress generatorG _S Skeletal response generator with identical structureG _T Obtaining a response generation result skeleton responser _t ；

r _s,t = r _t ⊙σ(r _s’ ) (8)

R _DIR = E[L(r _s,t , r)] + λ Var[L(r _s,t , r)] (9)

R _S = E[L(r _s’ , r)] (10)

Step 2.7 training the dialog generation model by random gradient descent method and calculating the loss functionR _S A kind of electronic device with high-pressure air-conditioning systemR _DIR To update network parameters, stopping training and obtaining dialogue generation model of optimal parameters when the loss function converges or reaches maximum training times, for generating corresponding replies to any query input by user。

2. The deep learning based dialog generation method of claim 1, wherein the skeleton generator in step 2.1GConsists of a transducer encoder and a cross-attention layer, and separates response frameworks according to the following processtVocabulary of interferences；

Step 2.1.1, the transducer encoder pair queryqAnd template responser’Respectively processing to obtain inquiryqVector representation of (a)H _q ={h _q ¹ , …, h _q ⁱ , …, h _q ^m Sum template responser’Vector representation of (a)H _r’ ={h ¹ _r’ , …, h ^j _r’ , …, h ⁿ _r’ And } wherein,h _q ⁱ for inquiring aboutqThe hidden vector of the i-th character in (c),mfor inquiring aboutqThe number of characters in (a) is set,h ^j _r’ representing template responsesr’Middle (f)jThe hidden vector of the individual character is used to determine,nresponding to templatesr’The number of characters in (a);

(1)/>

score (h ^j _r’ , h _q ^k ) = (h ^j _r’ ) ^T W _att h _q ^k (2)

step 2.1.3, the Cross-attention layer calculates a response skeleton using equation (3) and equation (4)tVector representation of (a)H _t = {h _t ¹ , …, h _t ^j , …, h _t ⁿ ' interference vocabularysVector representation of (a)H _s = {h _s ¹ , …, h _s ^j , …, h _s ⁿ }：

(3)

(4)

3. The deep learning based dialog generation method of claim 1 wherein the interference response generator in step 2.2G _S Consists of a transducer decoder and a controller; the transducer decoder consists of an encoding layer, a position encoding layer, a self-attention layer, a cross-attention layer, two standardization layers, a controller and a response generator; the transform decoder generates a result interference response by obtaining a response according to the following processr _s’ ：

Step 2.2.1, the Transformer decoder utilizationCoding layer, position coding layer, self-attention layer and first normalization layer pair responserProcessing to obtain responserVector representation of (a)H _r ；

H ^s’ _r,q = β•LN(H _s’ ) + (1-β)•H _r,q (5)

β = σ(W _s • [W _s’ ; H _r,q ]) (6)

r _s’ = Linear( LN’ ( FFN ( H ^s’ _r,q ) + H ^s’ _r,q )) (7)

In the formula (7), the amino acid sequence of the compound,Linear(-) represents a linear layer in a response generator，LN’(.) represents a normalization layer in the response generator,FFN(.) represents the forward propagation layer in the response generator.

4. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that supports the processor to perform the dialog generation method of any of claims 1-3, the processor being configured to execute the program stored in the memory.

5. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the dialog generation method of any of claims 1-3.