CN111143522A - Domain adaptation method of end-to-end task type dialog system - Google Patents

Domain adaptation method of end-to-end task type dialog system Download PDF

Info

Publication number
CN111143522A
CN111143522A CN201911199141.XA CN201911199141A CN111143522A CN 111143522 A CN111143522 A CN 111143522A CN 201911199141 A CN201911199141 A CN 201911199141A CN 111143522 A CN111143522 A CN 111143522A
Authority
CN
China
Prior art keywords
field
model
dialogue
target
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911199141.XA
Other languages
Chinese (zh)
Other versions
CN111143522B (en
Inventor
贺樑
郁建峰
陈成才
杨燕
胡佳颖
陈培华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Shanghai Xiaoi Robot Technology Co Ltd
Original Assignee
East China Normal University
Shanghai Xiaoi Robot Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University, Shanghai Xiaoi Robot Technology Co Ltd filed Critical East China Normal University
Priority to CN201911199141.XA priority Critical patent/CN111143522B/en
Publication of CN111143522A publication Critical patent/CN111143522A/en
Application granted granted Critical
Publication of CN111143522B publication Critical patent/CN111143522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a field adaptation method of an end-to-end task type dialog system, which is characterized by comprising the following steps: an end-to-end task type conversation system is constructed by using an encoding-decoding model, and an attention mechanism and a replication mechanism are combined to learn how to generate a reply according to the conversation context information; training a dialog system in the source field by using a small sample learning method to obtain prior knowledge with generalization; combining the limited linguistic data of the target field, and strengthening the characteristics of the target field to realize the field adaptation. Compared with the prior art, the method is simple and convenient, does not need to consume labor cost for marking, has high working efficiency, can solve the problem that an end-to-end dialogue system based on a neural network depends on a large amount of marked data in a certain field to a certain extent, simultaneously ensures the success rate of the dialogue system for completing tasks, and further widens the application scene of the dialogue system.

Description

Domain adaptation method of end-to-end task type dialog system
Technical Field
The invention relates to the technical field of task-based dialog systems, in particular to a field adaptation method of an end-to-end task-based dialog system by utilizing prior knowledge obtained by training a dialog model in a corpus of a source field and combining a small sample learning model and an algorithm.
Background
With the development of modern information technology, man-machine interactive dialog systems have more and more applications in life, and voice assistants such as Siri, Cortana and Alexa have become popular in our lives. There is an increasing demand for task-oriented dialog systems, such as intelligent customer service, intelligent personal assistants, etc., which can be used for services such as booking airline tickets, finding restaurants, making trips, etc. The existing intelligent customer service is well applied to the fields of e-commerce platforms, mobile phone banks and the like, and the labor cost is effectively saved. However, at present, such intelligent customer service has a large gap in intelligence degree compared with human, and usually can only solve some fixed template problems, and cannot perform good interaction with users, and further cannot replace manual customer service. On the other hand, frequent user-service interactions generate a large amount of data containing rich valuable information, and these interactive data provide the possibility to build task-based dialog systems.
Conventional task-based dialog systems typically require a lot of manual work, and it is therefore difficult to expand the system to a new application domain. Recently, a popular approach to developing task-oriented dialog systems has viewed this problem as a partially observable markov decision process and has used reinforcement learning for dialog strategy optimization through interaction with the user, but the dialog state and action space must be carefully designed to make reinforcement-learning-based strategy learning easy to handle.
The neural network-based method is excellent in establishing a chat-type dialog system, and thus, researchers have proposed a neural network-based method for constructing an end-to-end task-type dialog system. The end-to-end dialogue model abandons the idea of using sub-modules to solve the problem, and the whole dialogue system only uses one coding-decoding model, wherein an encoder codes the context information of the dialogue into a hidden vector for representing the current state of the dialogue, and then a decoder decodes the dialogue state to further generate the reply of the dialogue system. The end-to-end model can avoid that a plurality of components in the system are trained independently, and the independent training can cause that the optimization target of the model does not completely meet the evaluation standard of the whole system, and errors generated by the upstream component can be propagated to the downstream component and amplified.
The prior art end-to-end dialog systems to solve the problem in real-world scenarios still have some challenges, of which it is important that the neural network based dialog system needs sufficient annotated corpus. The task type dialog system constructed by some current works can only be applied to a single field or a plurality of similar fields, and a large number of fields with labeled corpora are needed to train the dialog system, so that the method needs to consume labor cost for labeling, is not efficient, and depends on a large number of field dialog corpora. This limits the applicability of dialog systems to some areas lacking dialog corpora.
Disclosure of Invention
The invention aims to design a field adaptation method of an end-to-end task dialog system aiming at the defects of the prior art, which adopts a transfer learning method, trains the task dialog system based on a coding-decoding model, then realizes field adaptation by using a model and an algorithm of small sample learning, adapts the dialog system trained in the source field to the target field lacking in language materials, learns the prior knowledge of the model as far as possible by using the training language materials in the source field with sufficient quantity, ensures that the model is well adapted in the target field as far as possible by using the language materials limited in the target field, has simple and convenient method, does not need to consume labor cost for labeling, has high working efficiency, can solve the dependence of the end-to-end dialog system based on a neural network on a large amount of labeled data in a certain field to a certain extent, and simultaneously ensures the success rate of the dialog system for completing tasks, further widening the application scene of the dialogue system.
The specific technical scheme of the invention is as follows: a field adaptation method of end-to-end task type dialogue system is characterized in that a task type dialogue model is obtained by training based on dialogue corpus in source field, and the dialogue model is transferred in field to make the model adapt to target field, the specific method comprises the following steps:
step 1: training a task-based dialog model in a source domain, and learning how to generate a reply by using dialog context information by using an encoding-decoding model;
step 2: in the source field training process, the characteristic attenuation module is used for filtering and weakening the characteristics of the source field, so that the model has generalization capability;
and step 3: migrating the dialogue model to the target field, and strengthening the characteristics of the target field by using a characteristic strengthening module so that the model can better adapt to the target field;
and 4, step 4: and testing the migrated model in the target field.
And the characteristic attenuation module in the step 2 performs element multiplication on the decoded hidden vector and a mask vector in the decoding process of the source field model, the dimension of the mask vector is the same as that of the hidden vector, and the value of an element of the mask vector is between 0 and 1, so that the source field characteristic in the hidden vector is filtered or weakened.
The feature enhancement module in the step 3 performs a self-attention mechanism operation on the decoded hidden vector in the target field model decoding process, so that effective feature weights in the target field are increased, and the model is more suitable for the specific task of the target field.
Compared with the prior art, the invention has the following advantages:
1) the method realizes the field adaptation of the dialogue system by using a small sample learning method, so that the dialogue system can be used by using priori knowledge and only a small amount of target field training corpora, can be widely used in various fields, and further widens the application scene of the dialogue system.
2) The advantages of the end-to-end model and the small sample learning method are combined, the dialogue system can be trained in a short time, and the training efficiency is high.
3) The method can perform multiple rounds of interaction with the user, gradually make clear the intention of the user, finally complete the set target of the user or provide related suggestions, and has good human-computer interaction experience.
4) The method can be applied to application scenes of various intelligent customer services or intelligent assistants such as restaurant, hotel, air ticket reservation and travel planning, scenic spot recommendation and the like, can effectively reduce the dependence on the labeled corpora, and has wide practical significance.
Drawings
FIG. 1 is a schematic diagram of a model trained in the source domain according to the present invention:
FIG. 2 is a schematic diagram of a model trained in the target domain according to the present invention.
Detailed Description
The invention obtains a task type dialogue model based on dialogue corpus training in the source field, and carries out field migration on the dialogue model to enable the model to adapt to the target field, and the specific method comprises the following steps:
step 1: training a task-based dialog model in a source domain, and learning how to generate a reply by using dialog context information by using an encoding-decoding model;
step 2: in the source field training process, the characteristic attenuation module is used for filtering and weakening the characteristics of the source field, so that the model has generalization capability;
and step 3: migrating the dialogue model to the target field, and strengthening the characteristics of the target field by using a characteristic strengthening module so that the model can better adapt to the target field;
and 4, step 4: and testing the migrated model in the target field.
Referring to fig. 1, a feature attenuation module is added between an encoder and a decoder, the feature attenuation module calculates to obtain a mask vector m, then performs corresponding element multiplication on m and a hidden vector h to obtain h ', and performs subsequent decoding on the h'; the subsequent decoding is divided into two steps, the first step generates the Bspan label by decoding, and the second step further generates the reply by using the Bspan label.
Referring to fig. 2, a feature enhancement module is added between an encoder and a decoder, and the feature enhancement module calculates a weight matrix of an implicit vector h by using a self-attention mechanism, performs softmax operation on the weight matrix according to rows, and then performs corresponding element multiplication on the implicit vector h to obtain h ', and performs subsequent decoding on the h'.
The present invention is further illustrated by the following specific examples.
Example 1
The invention introduces an encoding-decoding model, an encoder of the model can fully capture fine-grained text semantic information from a dialogue statement through a cyclic neural network structure, the statement is encoded into a hidden vector to represent, and then a decoder decodes the hidden vector to generate a reply statement.
Let X denote an input sentence sequence, which contains n words X ═ X1x2...xnThe recurrent neural network maps each word in the sentence into a low-dimensional dense space, generating a hidden layer representation
Figure BDA0002295425560000041
Thereby obtaining an overall representation h (x) of a sentence.
The decoder decodes the implicit layer representation H (x) to generate a reply sentence sequence Y (Y)1y2...ymAnd the generated Y is subjected to maximum likelihood estimation by a training set.
The entire coding-decoding model described above consists of two recurrent neural networks, for yjThe decoder first acquires yj-1Hidden vector representation of
Figure BDA0002295425560000042
Then the decoder calculates separately
Figure BDA0002295425560000043
And
Figure BDA0002295425560000044
attention weighted value between, then for all
Figure BDA0002295425560000045
Weighted summation is carried out to obtain
Figure BDA0002295425560000046
Finally will be
Figure BDA0002295425560000047
And
Figure BDA0002295425560000051
splicing, sending into an output layer, and generating an output word y after softmax operationjThe formula of the process is as follows:
Figure BDA0002295425560000052
Figure BDA0002295425560000053
Figure BDA0002295425560000054
in the training process of the source field, in order to enable the dialogue model to have generalization capability and better adapt to the target field, the model is additionally provided with a characteristic attenuation module. Specifically, the feature attenuation module is based on before calculating the attention weight
Figure BDA0002295425560000055
And
Figure BDA0002295425560000056
computing a mask vector mdomainThen using mdomainAre respectively connected with
Figure BDA0002295425560000057
And
Figure BDA0002295425560000058
corresponding element multiplication is carried out, and the calculation formula is as follows:
Figure BDA0002295425560000059
Figure BDA00022954255600000510
Figure BDA00022954255600000511
the invention uses
Figure BDA00022954255600000512
And
Figure BDA00022954255600000513
instead of the former
Figure BDA00022954255600000514
And
Figure BDA00022954255600000515
performing subsequent attention weight calculation and decoding, wherein WxAnd WyAre parameters to be learned.
Through the training process of the neural network, the model learns the priori knowledge from the training corpus of the source field, and then, how to make the dialogue model better adapt to the target field under the condition of limited corpus and have the capability of processing the special problems of the target field needs to be considered. For this purpose, the model is added during the training process in the target fieldThe feature enhancement module is similar to the feature attenuation module, and the feature attenuation module is used for calculating the attention weight value according to the feature enhancement module
Figure BDA00022954255600000516
And
Figure BDA00022954255600000517
respectively calculating respective self-attention weight matrix, performing softmax operation on the weight matrix according to rows to obtain weight vectors, and respectively comparing the weight vectors with the corresponding weight vectors
Figure BDA00022954255600000518
And
Figure BDA00022954255600000519
corresponding element multiplication is carried out, and the result after multiplication is respectively used for replacing
Figure BDA00022954255600000520
And
Figure BDA00022954255600000521
and performing subsequent attention weight calculation and decoding, wherein the self-attention weight calculation formula is as follows:
Figure BDA0002295425560000061
wherein: w1And W2Is a parameter to be learned; mijIs a position mask, which is expressed as follows:
Figure BDA0002295425560000062
the invention realizes the field adaptation by training a task-based dialogue system based on an encoding-decoding model and then utilizing a model and an algorithm for small sample learning, wherein the field adaptation can be divided into two parts: firstly, in the training process of the source domain, the dialogue system aims to filter domain features in dialogue corpora as much as possible, such as domain-specific words and specific question sentences, so as to intensively learn the commonality of the dialogue process among different domains. In the decoding process, the mask vector is used for filtering and weakening the characteristics of the hidden vector in the decoding process; secondly, to make full use of the limited corpus in the target domain. The invention uses a self-attention mechanism to strengthen the characteristics of the target field, thereby leading the dialogue system to be better adapted to the target field. The invention has been described in further detail in order to avoid limiting the scope of the invention, and it is intended that all such equivalent embodiments be included within the scope of the following claims.

Claims (3)

1. A field adaptation method of an end-to-end task type dialogue system is characterized in that a task type dialogue model is obtained by training based on dialogue linguistic data in a source field, and the dialogue model is subjected to field migration to enable the model to adapt to a target field, and the specific method comprises the following steps:
step 1: training a task-based dialog model in a source domain, and learning how to generate a reply by using dialog context information by using an encoding-decoding model;
step 2: in the source field training process, the characteristic attenuation module is used for filtering and weakening the characteristics of the source field, so that the model has generalization capability;
and step 3: migrating the dialogue model to the target field, and strengthening the characteristics of the target field by using a characteristic strengthening module so that the model can better adapt to the target field;
and 4, step 4: and testing the migrated model in the target field.
2. The field adaptation method of the end-to-end task-based dialog system of claim 1, wherein the feature attenuation module in step 2 performs element multiplication on a decoded hidden vector and a mask vector in a decoding process of the source field model, the dimension of the mask vector is the same as that of the hidden vector, and the value of an element of the mask vector is between 0 and 1, so that the source field feature in the hidden vector is filtered or weakened.
3. The method for domain adaptation in an end-to-end task-based dialog system as claimed in claim 1, wherein the feature enhancement module in step 3 performs a self-attention mechanism on the decoded hidden vectors during the target domain model decoding process, so as to increase the weights of some effective features in the target domain, thereby making the model more adaptive to the specific tasks in the target domain.
CN201911199141.XA 2019-11-29 2019-11-29 Domain adaptation method of end-to-end task type dialogue system Active CN111143522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911199141.XA CN111143522B (en) 2019-11-29 2019-11-29 Domain adaptation method of end-to-end task type dialogue system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911199141.XA CN111143522B (en) 2019-11-29 2019-11-29 Domain adaptation method of end-to-end task type dialogue system

Publications (2)

Publication Number Publication Date
CN111143522A true CN111143522A (en) 2020-05-12
CN111143522B CN111143522B (en) 2023-08-01

Family

ID=70517451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911199141.XA Active CN111143522B (en) 2019-11-29 2019-11-29 Domain adaptation method of end-to-end task type dialogue system

Country Status (1)

Country Link
CN (1) CN111143522B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528667A (en) * 2020-11-27 2021-03-19 北京大学 Domain migration method and device on semantic analysis

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060301A1 (en) * 2016-08-31 2018-03-01 Microsoft Technology Licensing, Llc End-to-end learning of dialogue agents for information access
CN108563640A (en) * 2018-04-24 2018-09-21 中译语通科技股份有限公司 A kind of multilingual pair of neural network machine interpretation method and system
CN108681610A (en) * 2018-05-28 2018-10-19 山东大学 Production takes turns more and chats dialogue method, system and computer readable storage medium
CN108804611A (en) * 2018-05-30 2018-11-13 浙江大学 A kind of dialogue reply generation method and system based on self comment Sequence Learning
CN109726276A (en) * 2018-12-29 2019-05-07 中山大学 A kind of Task conversational system based on depth e-learning
CN109918560A (en) * 2019-01-09 2019-06-21 平安科技(深圳)有限公司 A kind of answering method and device based on search engine
CN109918493A (en) * 2019-03-19 2019-06-21 重庆邮电大学 A kind of dialogue generation method based on shot and long term Memory Neural Networks
CN110032636A (en) * 2019-04-30 2019-07-19 合肥工业大学 Emotion based on intensified learning talks with the method that asynchronous generation model generates text
CN110297887A (en) * 2019-06-26 2019-10-01 山东大学 Service robot personalization conversational system and method based on cloud platform

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060301A1 (en) * 2016-08-31 2018-03-01 Microsoft Technology Licensing, Llc End-to-end learning of dialogue agents for information access
WO2018044633A1 (en) * 2016-08-31 2018-03-08 Microsoft Technology Licensing, Llc End-to-end learning of dialogue agents for information access
CN108563640A (en) * 2018-04-24 2018-09-21 中译语通科技股份有限公司 A kind of multilingual pair of neural network machine interpretation method and system
CN108681610A (en) * 2018-05-28 2018-10-19 山东大学 Production takes turns more and chats dialogue method, system and computer readable storage medium
CN108804611A (en) * 2018-05-30 2018-11-13 浙江大学 A kind of dialogue reply generation method and system based on self comment Sequence Learning
CN109726276A (en) * 2018-12-29 2019-05-07 中山大学 A kind of Task conversational system based on depth e-learning
CN109918560A (en) * 2019-01-09 2019-06-21 平安科技(深圳)有限公司 A kind of answering method and device based on search engine
CN109918493A (en) * 2019-03-19 2019-06-21 重庆邮电大学 A kind of dialogue generation method based on shot and long term Memory Neural Networks
CN110032636A (en) * 2019-04-30 2019-07-19 合肥工业大学 Emotion based on intensified learning talks with the method that asynchronous generation model generates text
CN110297887A (en) * 2019-06-26 2019-10-01 山东大学 Service robot personalization conversational system and method based on cloud platform

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528667A (en) * 2020-11-27 2021-03-19 北京大学 Domain migration method and device on semantic analysis

Also Published As

Publication number Publication date
CN111143522B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN108681610B (en) generating type multi-turn chatting dialogue method, system and computer readable storage medium
Kreyssig et al. Neural user simulation for corpus-based policy optimisation for spoken dialogue systems
CN108734276B (en) Simulated learning dialogue generation method based on confrontation generation network
CN110532355B (en) Intention and slot position joint identification method based on multitask learning
CN109284506A (en) A kind of user comment sentiment analysis system and method based on attention convolutional neural networks
CN110188182A (en) Model training method, dialogue generation method, device, equipment and medium
CN110781663B (en) Training method and device of text analysis model, text analysis method and device
EP3913521A1 (en) Method and apparatus for creating dialogue, electronic device and storage medium
CN111581966A (en) Context feature fusion aspect level emotion classification method and device
CN113268610B (en) Intent jump method, device, equipment and storage medium based on knowledge graph
CN113901191A (en) Question-answer model training method and device
CN114168749A (en) Question generation system based on knowledge graph and question word drive
WO2023231513A1 (en) Conversation content generation method and apparatus, and storage medium and terminal
CN115688879A (en) Intelligent customer service voice processing system and method based on knowledge graph
CN115455197A (en) Dialogue relation extraction method integrating position perception refinement
CN115630145A (en) Multi-granularity emotion-based conversation recommendation method and system
CN112818698A (en) Fine-grained user comment sentiment analysis method based on dual-channel model
Zeng et al. Jointly optimizing state operation prediction and value generation for dialogue state tracking
CN115525744A (en) Dialog recommendation system based on prompt learning method
CN117633239B (en) End-to-end face emotion recognition method combining combined category grammar
Rohmatillah et al. Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy.
CN114417874A (en) Chinese named entity recognition method and system based on graph attention network
CN112417118B (en) Dialog generation method based on marked text and neural network
CN111143522B (en) Domain adaptation method of end-to-end task type dialogue system
CN112183062A (en) Spoken language understanding method based on alternate decoding, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant