CN116127051A - Dialogue generation method based on deep learning, electronic equipment and storage medium - Google Patents

Dialogue generation method based on deep learning, electronic equipment and storage medium Download PDF

Info

Publication number
CN116127051A
CN116127051A CN202310428793.6A CN202310428793A CN116127051A CN 116127051 A CN116127051 A CN 116127051A CN 202310428793 A CN202310428793 A CN 202310428793A CN 116127051 A CN116127051 A CN 116127051A
Authority
CN
China
Prior art keywords
response
interference
generator
skeleton
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310428793.6A
Other languages
Chinese (zh)
Other versions
CN116127051B (en
Inventor
万之蕴
何向南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202310428793.6A priority Critical patent/CN116127051B/en
Publication of CN116127051A publication Critical patent/CN116127051A/en
Application granted granted Critical
Publication of CN116127051B publication Critical patent/CN116127051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a dialogue generation method based on deep learning, electronic equipment and a storage medium, wherein the method comprises the following steps: 1. constructing a dialogue generation data set based on retrieval editing; 2. constructing a dialogue generating model consisting of a skeleton generator, a skeleton response generator, an interference response generator and a response fusion module, and training; 3. and generating corresponding replies to any queries input by the user by using the trained models. According to the invention, the template response is obtained through retrieval and the response skeleton is constructed so as to eliminate the interference of useless information in the template response, and then the response skeleton is edited so as to generate a final response, so that the dialogue system can generate a response which is more attached to the context and has richer semantics, and the problem of 'safety response' is relieved.

Description

Dialogue generation method based on deep learning, electronic equipment and storage medium
Technical Field
The invention belongs to the field of natural language processing, relates to the technical fields of dialogue systems, deep learning and the like, and particularly relates to a dialogue generation method based on deep learning, electronic equipment and a storage medium.
Background
With the rapid development of artificial intelligence and man-machine interaction, a dialogue system or a dialogue robot is applied to more and more service scenes, and artificial services are replaced to a certain extent. Current dialog systems can be classified into an open domain dialog system and a task dialog system according to the usage scenario. Task dialogue systems are designed to accomplish a specific task or goal, such as customer service robot nectar and intelligent assistant siri; open domain conversations are generally based on boring conversations, which are not aimed at performing specific tasks, but rather for natural and fluent communication with humans.
Compared with the task type dialogue system, the dialogue theme of the open domain dialogue system is open, and a wider topic and a more complex sentence pattern are covered. According to the construction method, existing open domain dialog systems can be classified into two types, a generation-based dialog system and a retrieval-based dialog system. Wherein the search-based approach selects a response from the existing corpus, and thus its performance is severely limited by predefined indexing rules. With the development of deep learning, dialog systems based on the generation have become increasingly popular in recent years. Deep learning models based on sequence-to-sequence (seq 2 seq) have found wide application in single-round dialog generation. However, conventional sequence-to-sequence based dialog generation models often fail to generate responses that are word-rich, content-rich, and information-rich. In practice, such models often tend to generate popular but tedious replies, such as "I don't know" or "I consider as such". This problem is also known as the "safety response" problem.
Recent efforts have attempted to use information retrieval techniques to fill in the deficiencies of insufficient information in dialog generation. In conventional search-based dialog systems, the data sets are constructed based on human dialog, from which replies are retrieved that are generally grammatically correct and semantically rich. For a given context, similar dialogs are retrieved from the corpus and considered as additional sources of information in the generative dialog system, introducing more rich semantics and sentence patterns to a certain extent, so that the generated replies can improve the "safety response" problem of the generative model to a certain extent. However, when the retrieved reply is similar to the original reply, the generative model tends to replicate the reference reply without making necessary modifications to the reply. In the opposite case, i.e. when the retrieved reply is independent of the original reply, a lot of information is acquired and interference independent of the current dialog context is introduced, resulting in non-ideal model performance.
Disclosure of Invention
The invention aims to solve the defects of the prior art, and provides a dialogue generating method, electronic equipment and a storage medium based on deep learning, which are used for combining a search dialogue model and a generation dialogue model to introduce external information into dialogue generation so as to relieve the problem of safety response of the generation dialogue system, thereby obtaining a smooth response generating result with rich information.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
the invention relates to a dialogue generating method based on deep learning, which is characterized by comprising the following steps:
step 1, constructing a dialogue generation data set based on retrieval editing;
step 1.1, acquiring a query text setQCorresponding response text setROrder-makingqRepresenting a set of query textQAny one of the queries, orderrRepresenting queriesqA corresponding response;
step 1.2, retrieving and respondingrSimilar template responser’And get and connectr’Corresponding template queryq’Thereby composing a dialogue data setDA four-element group @ inr, q, r’, q’);
Step 2, constructing a skeleton generatorGSkeleton response generatorG T Interference response generatorG S And a dialogue generating model formed by the response fusion module and training;
step 2.1, use skeleton GeneratorGResponse from templater’Middle separation response frameworktVocabulary of interferencesThereby obtainingInterfering vocabulary to all template responses and forming a vector representation setSThe method comprises the steps of carrying out a first treatment on the surface of the And from the vector representation setSRandomly selecting interference wordss’Vector representation of (a)Hs
Step 2.2 response-basedrVocabulary of interferences’QueryqUsing an interference response generatorG S Obtaining a response generation result interference responser s’
Step 2.3 response-basedrResponse skeletontQueryqUsing a generator of interference responses with the transmitterG S Skeletal response generator with identical structureG T Obtaining a response generation result skeleton responser t
Step 2.4, the response fusion module utilizes (8) to respond to interferencer s’ And skeleton responser t Fusion is carried out to obtain fusion responser s,t
r s,t = r t σ(r s’ ) (8)
In the formula (8), the amino acid sequence of the compound,σindicates a sigmoid function, ++indicates a dot product;
step 2.5, constructing a skeleton generator by using the formula (9)GSkeleton response generatorG T Is a loss function of (2)R DIR
R DIR = E[L(r s,t , r)] +λ Var[L(r s,t , r)] (9)
In the formula (9), the amino acid sequence of the compound,Eit is indicated that the desire is to be met,Varthe variance is represented as a function of the variance,λis the parameter of the ultrasonic wave to be used as the ultrasonic wave,L() represents cross entropy loss;
step 2.6, constructing an interference response generator using (10)G S Is a loss function of (2)R S
R S = E[L(r s’ , r)] (10)
Step 2.7 training the dialog generation model by random gradient descent method and calculating the loss functionR S A kind of electronic device with high-pressure air-conditioning systemR DIR And when the loss function converges or reaches the maximum training times, stopping training and obtaining a dialogue generating model of the optimal parameters for generating corresponding replies to any query input by the user.
The dialogue generating method based on deep learning is also characterized in that the skeleton generator in the step 2.1GConsists of a transducer encoder and a cross-attention layer, and separates response frameworks according to the following processtVocabulary of interferences
Step 2.1.1 query of the Transformer encoder pairqAnd template responser’Respectively processing to obtain inquiryqVector representation of (a)H q ={h q 1 ,,h q i , …,h q m Sum template responser’Vector representation of (a)H r’ ={h 1 r’ ,,h j r’ , …,h n r’ And } wherein,h q i for inquiring aboutqThe hidden vector of the i-th character in (c),mfor inquiring aboutqThe number of characters in (a) is set,h j r’ representing template responsesr’Middle (f)jThe hidden vector of the individual character is used to determine,nresponding to templatesr’The number of characters in (a);
step 2.1.2, calculating a template response by the cross-attention layer using (1)r’Middle (f)jIndividual character pair queryqAttention weight of the i-th character in (b)M i,j
Figure SMS_1
(1)
In the formula (1), the components are as follows,h q k for inquiring aboutqMiddle (f)kThe hidden vector of the individual character is used to determine,score(.) is the attention score, and has:
score (h j r’ ,h q k ) = (h j r’ ) T W att h q k (2)
in the formula (2), the amino acid sequence of the compound,W att as a parameter to be learned of the cross-attention layer,Tis a transposition;
step 2.1.3, the Cross-attention layer calculates a response skeleton using equation (3) and equation (4)tVector representation of (a)H t = {h t 1 ,,h t j , …,h t n ' interference vocabularysVector representation of (a)H s = {h s 1 ,,h s j , …,h s n }:
Figure SMS_2
(3)
Figure SMS_3
(4)
In the formulas (3) and (4),h t j andh s j respectively represent response skeletonstInterference vocabularysMiddle (f)jHidden vectors of the individual characters.
The interference response generator in said step 2.2G S Consists of a transducer decoder and a controller; wherein the transducer decoder is composed of an encoding layer, a position encoding layer, a self-attention layer, a cross-attention layer, two normalization layers, a controller and a response generatorForming device; the transform decoder generates a result interference response by obtaining a response according to the following processr s’
Step 2.2.1 the transducer decoder uses the coding layer, the position coding layer, the self-attention layer and the first normalization layer pair responserProcessing to obtain responserVector representation of (a)H r
Step 2.2.2 the Transformer decoder will respond by crossing the attention layer and the second normalization layerrVector representation of (a)H r And queryqFusing to obtain a fused response query vector representationH r,q
Step 2.2.3, the controller uses (5) to process the interference vocabularys’Vector representation of (a)H s’ And (3) withH r,q Fusing to obtain fused interference fusion vector representationH s’ r,q
H s’ r,q =βLN(H s’ ) + (1-β)•H r,q (5)
In the formula (5), the amino acid sequence of the compound,LN(.) represents a standardized layer in the controller;βrepresents the fusion weight and is obtained by the formula (6);
β = σ(W s • [W s’ ;H r,q ]) (6)
in the formula (6), the amino acid sequence of the compound,W s representing the parameters to be learned of the controller,σrepresenting a sigmoid function;
step 2.2.4, the response generator of the transducer decoder obtaining a response generating a resulting interference response using equation (7)r s’
r s’ = Linear(LN’ (FFN (H s’ r,q ) +H s’ r,q )) (7)
In the formula (7), the amino acid sequence of the compound,Linear(.) represents a linear layer in the response generator,LN’(.) represents a normalization layer in the response generator,FFN(.) represents the forward propagation layer in the response generator.
The electronic device of the invention comprises a memory and a processor, wherein the memory is used for storing a program for supporting the processor to execute any one of the dialog generating methods, and the processor is configured to execute the program stored in the memory.
The invention provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program is executed by a processor to execute the steps of any dialog generating method.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a new dialogue generating method, namely, template response is obtained through retrieval and then response generation is carried out on the basis of the template response, the introduction of the template response enables a dialogue system to utilize external information, and human history dialogue is used as a reference, so that the fluency and information quantity of a generated result are improved, and the problem of safety response is relieved to a certain extent.
2. The invention provides a two-stage dialogue generation model based on search editing, namely, a response framework is constructed by searching to obtain template response so as to eliminate interference of useless information in the response framework, and then the response framework is edited to generate a final reply. Compared with the past work, the method and the device have the advantages that the flexibility of the generated model is maintained on the basis of inheriting the fluency and rich information quantity of the search result. Meanwhile, the response framework is generated, so that the generated model can eliminate the interference of irrelevant information in the retrieved template response, and a response which is more fit with the context is generated.
3. The invention introduces a causal intervention method, so that the model can learn a causal mode with environmental invariance, can extract response skeleton help response generation from the retrieved template response, improves the defect that the model can not properly utilize information in the template response in the past research, and can eliminate the influence of interference vocabulary in the template response, thereby better utilizing external information and improving the correlation between the generated response and the query.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of an interference response generator according to the present invention;
FIG. 3 is a causal graph used by the response fusion module of the present invention.
Detailed Description
In this embodiment, as shown in fig. 1, a dialogue generating method based on deep learning is performed according to the following steps:
step 1, constructing a dialogue generation data set based on retrieval editing;
step 1.1, acquiring a query text setQCorresponding response text setROrder-makingqRepresenting a set of query textQAny one of the queries, orderrRepresenting queriesqA corresponding response; in the embodiment, the data sources are China larger network social platform bean and microblog;
step 1.2, retrieving and respondingrSimilar template responser’And get and connectr’Corresponding template queryq’Thereby composing a dialogue data setDA four-element group @ inr, q, r’, q’)。
Step 2, constructing a skeleton generatorGSkeleton response generatorG T Interference response generatorG S And a dialogue generating model formed by the response fusion module and training;
step 2.1, use skeleton GeneratorGResponse from templater’Middle separation response frameworktVocabulary of interferencesThereby obtaining the interference vocabulary of all template responses and forming a vector representation setSThe method comprises the steps of carrying out a first treatment on the surface of the And from a vector representation setSRandomly selecting interference wordss’Vector representation of (a)Hs
Skeleton generatorGConsists of a transducer encoder and a cross-attention layer, and separates response frameworks according to the following processtVocabulary of interferences
Step 2.1.1 transducer encoder pair queryqAnd template responser’Respectively processing to obtain inquiryqVector representation of (a)H q ={h q 1 ,,h q i , …,h q m Sum template responser’Vector representation of (a)H r’ ={h 1 r’ ,,h j r’ , …,h n r’ And } wherein,h q i for inquiring aboutqThe hidden vector of the i-th character in (c),mfor inquiring aboutqThe number of characters in (a) is set,h j r’ representing template responsesr’Middle (f)jThe hidden vector of the individual character is used to determine,nresponding to templatesr’The number of characters in (a);
step 2.1.2 calculating the template response by the Cross attention layer using (1)r’Middle (f)jIndividual character pair queryqAttention weight of the i-th character in (b)M i,j
Figure SMS_4
(1)
In the formula (1), the components are as follows,h q k for inquiring aboutqMiddle (f)kThe hidden vector of the individual character is used to determine,score(.) is the attention score, and has:
score (h j r’ ,h q k ) = (h j r’ ) T W att h q k (2)
in the formula (2), the amino acid sequence of the compound,W att as a parameter to be learned of the cross-attention layer,Tis a transposition;
step 2.1.3 Cross injectionThe force sense layer calculates a response skeleton by using the formula (3) and the formula (4)tVector representation of (a)H t = {h t 1 ,,h t j , …,h t n ' interference vocabularysVector representation of (a)H s = {h s 1 ,,h s j , …,h s n }:
Figure SMS_5
(3)
Figure SMS_6
(4)
In the formulas (3) and (4),h t j andh s j respectively represent response skeletonstInterference vocabularysMiddle (f)jHidden vectors of the individual characters.
Step 2.2, response based as shown in FIG. 2rVocabulary of interferences’QueryqUsing an interference response generatorG S Obtaining a response generation result interference responser s’
Interference response generatorG S Consists of a transducer decoder and a controller; the transducer decoder consists of a coding layer, a position coding layer, a self-attention layer, a cross-attention layer, two standardization layers, a controller and a response generator; the transducer decoder generates a resulting interference response by obtaining the response as followsr s’
Step 2.2.1, transformer decoder Using coding layer, position coding layer, self-attention layer and first normalization layer responserProcessing to obtain responserVector representation of (a)H r
Step 2.2.2 the transducer decoder will respond through the cross-attention layer and the second normalization layerrVector table of (a)Showing theH r And queryqFusing to obtain a fused response query vector representationH r,q
Step 2.2.3, the controller uses (5) to process the interference vocabularys’Vector representation of (a)H s’ And (3) withH r,q Fusing to obtain fused interference fusion vector representationH s’ r,q
H s’ r,q =βLN(H s’ ) + (1-β)•H r,q (5)
In the formula (5), the amino acid sequence of the compound,LN(.) represents a standardized layer in the controller;βrepresents the fusion weight and is obtained by the formula (6);
β = σ(W s • [W s’ ;H r,q ]) (6)
in the formula (6), the amino acid sequence of the compound,W s representing the parameters to be learned of the controller,σrepresenting a sigmoid function;
step 2.2.4, response generator of the transducer decoder obtaining a response generating a resulting interference response using equation (7)r s’
r s’ = Linear(LN’ (FFN (H s’ r,q ) +H s’ r,q )) (7)
In the formula (7), the amino acid sequence of the compound,Linear(.) represents a linear layer in the response generator,LN’(.) represents a normalization layer in the response generator,FFN(.) represents the forward propagation layer in the response generator.
Step 2.3 response-basedrResponse skeletontQueryqUsing and interfering response generatorsG S Skeletal response generator with identical structureG T Obtaining a response generation result skeleton responser t
Step 2.4, the response fusion module responds to the skeleton based on the causal graph shown in fig. 3r t Causal intervention; specifically, the response fusion module utilizes (8) to respond to interferencer s’ And skeleton responser t Fusion is carried out to obtain fusion responser
r s , t = r t σ(r s’ ) (8)
In the formula (8), the amino acid sequence of the compound,σindicates a sigmoid function, ++indicates a dot product; the causal graph is a directed acyclic graph, consisting of points representing variables and edges representing causal relationships between variables; causal graphs are generally used to describe the interaction mechanism between a set of variables, which show the causal relationships behind data; the dialog generation process of the present invention may be represented by a causal graph as shown in fig. 3; in step 2.4, the interference response is artificially given by causal interventionr s’ But otherwise still follow the original data generation process shown in fig. 3.
Step 2.5, constructing a skeleton generator by using the formula (9)GSkeleton response generatorG T Is a loss function of (2)R DIR
R DIR = E[L(r s , t , r)] +λ Var[L(r s , t , r)] (9)
In the formula (9), the amino acid sequence of the compound,Eit is indicated that the desire is to be met,Varthe variance is represented as a function of the variance,λis the parameter of the ultrasonic wave to be used as the ultrasonic wave,L() represents cross entropy loss; loss functionR DIR Meaning that the model is reducing the response generatedr s , t Simultaneously with the error of response r, attempts are made to reduce the effect of external disturbance information on the generated result.
Step 2.6, constructing an interference response generator using (10)G S Is a loss function of (2)R S
R S = E[L(r s’ , r)] (10)
The loss function obtained by using equation (10)R S For only interference response generatorsG S Updating the parameters in the step (a); by separating the training of this module from the training of other modules in the method, it is avoided that it is interfering with the presentation of learning; at the same time, the method of parameter updating also promotes the interference response generatorG S Only non-causal features based on a given interfering vocabulary are learned.
Step 2.7, training the dialogue generating model by utilizing a random gradient descent method, and calculating a loss functionR S A kind of electronic device with high-pressure air-conditioning systemR DIR And when the loss function converges or reaches the maximum training times, stopping training and obtaining a dialogue generating model of the optimal parameters for generating corresponding replies to any query input by the user.
The test results of the present invention are further described in connection with the following chart:
in order to verify the effectiveness of the method proposed by the present invention, a comparative test was performed. The experimental results are shown in table 1. Wherein, retrieval is a Retrieval system, seq2Seq is a basic sequence-to-sequence model, BART is a pre-training language model, BART-cat receives template response as input based on BART, ske Re is a dialogue generation model based on the Retrieval system, TSLF is a dialogue generation model based on a Transformer and introducing external knowledge, DG is a method provided by the invention. In Table 1, BLUE-1 is an evaluation index based on word overlap, and is used for comparing the character overlap ratio between the generated text and the reference text, and dist-1 and dist-2 are two evaluation indexes for measuring vocabulary diversity.
TABLE 1
Figure SMS_7
As can be seen from the experimental results in Table 1, the method of the present invention is superior to other models in terms of various indexes, and can better acquire information from the retrieved template response types and generate more various texts.

Claims (5)

1. A dialogue generation method based on deep learning is characterized by comprising the following steps:
step 1, constructing a dialogue generation data set based on retrieval editing;
step 1.1, acquiring a query text setQCorresponding response text setROrder-makingqRepresenting a set of query textQAny one of the queries, orderrRepresenting queriesqA corresponding response;
step 1.2, retrieving and respondingrSimilar template responser’And get and connectr’Corresponding template queryq’Thereby composing a dialogue data setDA four-element group @ inr, q, r’, q’);
Step 2, constructing a skeleton generatorGSkeleton response generatorG T Interference response generatorG S And a dialogue generating model formed by the response fusion module and training;
step 2.1, use skeleton GeneratorGResponse from templater’Middle separation response frameworktVocabulary of interferencesThereby obtaining the interference vocabulary of all template responses and forming a vector representation setSThe method comprises the steps of carrying out a first treatment on the surface of the And from the vector representation setSRandomly selecting interference wordss’Vector representation of (a)Hs
Step 2.2 response-basedrVocabulary of interferences’QueryqUsing an interference response generatorG S Obtaining a response generation result interference responser s’
Step 2.3 response-basedrResponse skeletontQueryqUsing a noise with the interferenceStress generatorG S Skeletal response generator with identical structureG T Obtaining a response generation result skeleton responser t
Step 2.4, the response fusion module utilizes (8) to respond to interferencer s’ And skeleton responser t Fusion is carried out to obtain fusion responser s,t
r s,t = r t σ(r s’ ) (8)
In the formula (8), the amino acid sequence of the compound,σindicates a sigmoid function, ++indicates a dot product;
step 2.5, constructing a skeleton generator by using the formula (9)GSkeleton response generatorG T Is a loss function of (2)R DIR
R DIR = E[L(r s,t , r)] + λ Var[L(r s,t , r)] (9)
In the formula (9), the amino acid sequence of the compound,Eit is indicated that the desire is to be met,Varthe variance is represented as a function of the variance,λis the parameter of the ultrasonic wave to be used as the ultrasonic wave,L() represents cross entropy loss;
step 2.6, constructing an interference response generator using (10)G S Is a loss function of (2)R S
R S = E[L(r s’ , r)] (10)
Step 2.7 training the dialog generation model by random gradient descent method and calculating the loss functionR S A kind of electronic device with high-pressure air-conditioning systemR DIR To update network parameters, stopping training and obtaining dialogue generation model of optimal parameters when the loss function converges or reaches maximum training times, for generating corresponding replies to any query input by user。
2. The deep learning based dialog generation method of claim 1, wherein the skeleton generator in step 2.1GConsists of a transducer encoder and a cross-attention layer, and separates response frameworks according to the following processtVocabulary of interferences
Step 2.1.1, the transducer encoder pair queryqAnd template responser’Respectively processing to obtain inquiryqVector representation of (a)H q ={h q 1 , , h q i , …, h q m Sum template responser’Vector representation of (a)H r’ ={h 1 r’ , , h j r’ , …, h n r’ And } wherein,h q i for inquiring aboutqThe hidden vector of the i-th character in (c),mfor inquiring aboutqThe number of characters in (a) is set,h j r’ representing template responsesr’Middle (f)jThe hidden vector of the individual character is used to determine,nresponding to templatesr’The number of characters in (a);
step 2.1.2, calculating a template response by the cross-attention layer using (1)r’Middle (f)jIndividual character pair queryqAttention weight of the i-th character in (b)M i,j
Figure QLYQS_1
(1)/>
In the formula (1), the components are as follows,h q k for inquiring aboutqMiddle (f)kThe hidden vector of the individual character is used to determine,score(.) is the attention score, and has:
score (h j r’ , h q k ) = (h j r’ ) T W att h q k (2)
in the formula (2), the amino acid sequence of the compound,W att as a parameter to be learned of the cross-attention layer,Tis a transposition;
step 2.1.3, the Cross-attention layer calculates a response skeleton using equation (3) and equation (4)tVector representation of (a)H t = {h t 1 , , h t j , …, h t n ' interference vocabularysVector representation of (a)H s = {h s 1 , , h s j , …, h s n }:
Figure QLYQS_2
(3)
Figure QLYQS_3
(4)
In the formulas (3) and (4),h t j andh s j respectively represent response skeletonstInterference vocabularysMiddle (f)jHidden vectors of the individual characters.
3. The deep learning based dialog generation method of claim 1 wherein the interference response generator in step 2.2G S Consists of a transducer decoder and a controller; the transducer decoder consists of an encoding layer, a position encoding layer, a self-attention layer, a cross-attention layer, two standardization layers, a controller and a response generator; the transform decoder generates a result interference response by obtaining a response according to the following processr s’
Step 2.2.1, the Transformer decoder utilizationCoding layer, position coding layer, self-attention layer and first normalization layer pair responserProcessing to obtain responserVector representation of (a)H r
Step 2.2.2 the Transformer decoder will respond by crossing the attention layer and the second normalization layerrVector representation of (a)H r And queryqFusing to obtain a fused response query vector representationH r,q
Step 2.2.3, the controller uses (5) to process the interference vocabularys’Vector representation of (a)H s’ And (3) withH r,q Fusing to obtain fused interference fusion vector representationH s’ r,q
H s’ r,q = βLN(H s’ ) + (1-β)•H r,q (5)
In the formula (5), the amino acid sequence of the compound,LN(.) represents a standardized layer in the controller;βrepresents the fusion weight and is obtained by the formula (6);
β = σ(W s • [W s’ ; H r,q ]) (6)
in the formula (6), the amino acid sequence of the compound,W s representing the parameters to be learned of the controller,σrepresenting a sigmoid function;
step 2.2.4, the response generator of the transducer decoder obtaining a response generating a resulting interference response using equation (7)r s’
r s’ = Linear( LN’ ( FFN ( H s’ r,q ) + H s’ r,q )) (7)
In the formula (7), the amino acid sequence of the compound,Linear(-) represents a linear layer in a response generator,LN’(.) represents a normalization layer in the response generator,FFN(.) represents the forward propagation layer in the response generator.
4. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that supports the processor to perform the dialog generation method of any of claims 1-3, the processor being configured to execute the program stored in the memory.
5. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the dialog generation method of any of claims 1-3.
CN202310428793.6A 2023-04-20 2023-04-20 Dialogue generation method based on deep learning, electronic equipment and storage medium Active CN116127051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310428793.6A CN116127051B (en) 2023-04-20 2023-04-20 Dialogue generation method based on deep learning, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310428793.6A CN116127051B (en) 2023-04-20 2023-04-20 Dialogue generation method based on deep learning, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116127051A true CN116127051A (en) 2023-05-16
CN116127051B CN116127051B (en) 2023-07-11

Family

ID=86303166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310428793.6A Active CN116127051B (en) 2023-04-20 2023-04-20 Dialogue generation method based on deep learning, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116127051B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844335A (en) * 2016-12-21 2017-06-13 海航生态科技集团有限公司 Natural language processing method and device
CN107506823A (en) * 2017-08-22 2017-12-22 南京大学 A kind of construction method for being used to talk with the hybrid production style of generation
CN109829038A (en) * 2018-12-11 2019-05-31 平安科技(深圳)有限公司 Question and answer feedback method, device, equipment and storage medium based on deep learning
US20200097814A1 (en) * 2018-09-26 2020-03-26 MedWhat.com Inc. Method and system for enabling interactive dialogue session between user and virtual medical assistant
US20200226475A1 (en) * 2019-01-14 2020-07-16 Cambia Health Solutions, Inc. Systems and methods for continual updating of response generation by an artificial intelligence chatbot
CN111858931A (en) * 2020-07-08 2020-10-30 华中师范大学 Text generation method based on deep learning
WO2021077974A1 (en) * 2019-10-24 2021-04-29 西北工业大学 Personalized dialogue content generating method
US20210141798A1 (en) * 2019-11-08 2021-05-13 PolyAI Limited Dialogue system, a method of obtaining a response from a dialogue system, and a method of training a dialogue system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844335A (en) * 2016-12-21 2017-06-13 海航生态科技集团有限公司 Natural language processing method and device
CN107506823A (en) * 2017-08-22 2017-12-22 南京大学 A kind of construction method for being used to talk with the hybrid production style of generation
US20200097814A1 (en) * 2018-09-26 2020-03-26 MedWhat.com Inc. Method and system for enabling interactive dialogue session between user and virtual medical assistant
CN109829038A (en) * 2018-12-11 2019-05-31 平安科技(深圳)有限公司 Question and answer feedback method, device, equipment and storage medium based on deep learning
US20200226475A1 (en) * 2019-01-14 2020-07-16 Cambia Health Solutions, Inc. Systems and methods for continual updating of response generation by an artificial intelligence chatbot
WO2021077974A1 (en) * 2019-10-24 2021-04-29 西北工业大学 Personalized dialogue content generating method
US20210141798A1 (en) * 2019-11-08 2021-05-13 PolyAI Limited Dialogue system, a method of obtaining a response from a dialogue system, and a method of training a dialogue system
CN111858931A (en) * 2020-07-08 2020-10-30 华中师范大学 Text generation method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CAI D等: "Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory", 《PROCEEDINGS OF THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS:HUMAN LANGUAGE TECHNOLOGIES》 *
陆兴武: "基于深度学习的开放领域多轮对话系统研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN116127051B (en) 2023-07-11

Similar Documents

Publication Publication Date Title
Bakhtin et al. Real or fake? learning to discriminate machine from human generated text
CN109992657B (en) Dialogue type problem generation method based on enhanced dynamic reasoning
CN111651557B (en) Automatic text generation method and device and computer readable storage medium
CN104598611B (en) The method and system being ranked up to search entry
CN108628935A (en) A kind of answering method based on end-to-end memory network
CN107679225A (en) A kind of reply generation method based on keyword
KR102654480B1 (en) Knowledge based dialogue system and method for language learning
CN110334196A (en) Neural network Chinese charater problem based on stroke and from attention mechanism generates system
CN116186216A (en) Question generation method and system based on knowledge enhancement and double-graph interaction
CN110516053A (en) Dialog process method, equipment and computer storage medium
CN114387537A (en) Video question-answering method based on description text
CN112463935B (en) Open domain dialogue generation method and system with generalized knowledge selection
CN111046157B (en) Universal English man-machine conversation generation method and system based on balanced distribution
CN116127051B (en) Dialogue generation method based on deep learning, electronic equipment and storage medium
Lin et al. A hierarchical structured multi-head attention network for multi-turn response generation
CN111414466A (en) Multi-round dialogue modeling method based on depth model fusion
CN116561251A (en) Natural language processing method
CN115796187A (en) Open domain dialogue method based on dialogue structure diagram constraint
CN113051897B (en) GPT2 text automatic generation method based on Performer structure
CN113065324A (en) Text generation method and device based on structured triples and anchor templates
CN113626566B (en) Knowledge dialogue cross-domain learning method based on synthetic data
Szymanski et al. Semantic memory knowledge acquisition through active dialogues
CN116244419B (en) Knowledge enhancement dialogue generation method and system based on character attribute
CN117035064B (en) Combined training method for retrieving enhanced language model and storage medium
Ma et al. Cascaded LSTMs based deep reinforcement learning for goal-driven dialogue

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant