CN111382257A - Method and system for generating dialog context - Google Patents

Method and system for generating dialog context Download PDF

Info

Publication number
CN111382257A
CN111382257A CN202010470216.XA CN202010470216A CN111382257A CN 111382257 A CN111382257 A CN 111382257A CN 202010470216 A CN202010470216 A CN 202010470216A CN 111382257 A CN111382257 A CN 111382257A
Authority
CN
China
Prior art keywords
knowledge
vector
current time
generating
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010470216.XA
Other languages
Chinese (zh)
Inventor
简葳玙
王太峰
何建杉
林谢雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010470216.XA priority Critical patent/CN111382257A/en
Publication of CN111382257A publication Critical patent/CN111382257A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the specification discloses a method and a system for generating a dialog context. The method comprises the following steps: acquiring a conversation text, acquiring at least one knowledge text related to the conversation text according to the conversation text, and generating at least one knowledge vector k corresponding to the at least one knowledge text1~km(ii) a The knowledge text is stored in a knowledge base; according to at least one knowledge vector k1~kmAnd decoding hidden state S at current timetGenerating a knowledge fusion vector for a current time using a first attention model
Figure DEST_PATH_IMAGE001
(ii) a Knowledge fusion vector based on the current time
Figure 232616DEST_PATH_IMAGE001
Context vector of current time
Figure DEST_PATH_IMAGE002
And a decoding hidden state S of the current timetGenerating the dialog context word y at the current timet;y1~ytThe dialog context words of (a) make up the dialog context, the y1Representing the dialog context word at t = 1.

Description

Method and system for generating dialog context
Technical Field
The present description relates to the field of Natural Language Processing (NLP), and more particularly to a method and system for generating a dialog context.
Background
In recent years, with the rise of artificial intelligence, man-machine conversation has received wide attention from academic and industrial circles as an important challenge of artificial intelligence. The task-independent chat type conversation gradually becomes the key point of research of people as more intelligent and vivid conversation experience can be provided for users and emotional appeal of the users can be solved. At present, the generation mode of the chatting dialogue can be roughly divided into a retrieval mode and a generation mode.
And the retrieval type dialogue recalls the relevant candidate reply from the knowledge base according to the input of the user by utilizing the information retrieval technology. Unlike the retrievals, the generative methods do not select historical replies from the corpus, but generate completely new replies, because the chat conversation is open and has no definite target and limited knowledge range, so that it is a challenge how to smoothly introduce knowledge in the generative conversation into the external knowledge base and reduce the "yes" and "i know" safety replies.
Therefore, a method for generating a dialog context is desired, which can dynamically select knowledge in the process of generating a reply by a system, so that the dialog system has the capability of naturally switching topics in the process of man-machine dialog.
Disclosure of Invention
One embodiment of the present specification provides a method for generating a dialog context. The method comprises the following steps:
acquiring a conversation text, acquiring at least one knowledge text related to the conversation text according to the conversation text, and generating at least one knowledge vector k corresponding to the at least one knowledge text1~km(ii) a Said isStoring the recognition text in a knowledge base; according to at least one knowledge vector k1~kmAnd decoding hidden state S at current timetGenerating a knowledge fusion vector for a current time using a first attention model
Figure 174544DEST_PATH_IMAGE001
(ii) a Knowledge fusion vector based on the current time
Figure 476212DEST_PATH_IMAGE001
Context vector of current time
Figure 367945DEST_PATH_IMAGE002
And a decoding hidden state S of the current timetGenerating the dialog context word y at the current timet;y1~ytThe dialog context words of (a) make up the dialog context, the y1Representing the dialog context word at t = 1.
One of the embodiments of the present specification provides a system for generating a dialog context, the system comprising:
a knowledge vector generation module, configured to obtain a dialog context, obtain at least one knowledge text related to the dialog context according to the dialog context, and generate at least one knowledge vector k corresponding to the at least one knowledge text1~km(ii) a The knowledge text is stored in a knowledge base; a knowledge fusion vector generation module for generating a knowledge fusion vector k based on at least one knowledge vector k1~kmAnd decoding hidden state S at current timetGenerating a knowledge fusion vector for a current time using a first attention model
Figure 755064DEST_PATH_IMAGE001
(ii) a A dialogue following word generation module for fusing vector based on the knowledge of the current time
Figure 656024DEST_PATH_IMAGE001
Context vector of current time
Figure 77778DEST_PATH_IMAGE002
And a decoding hidden state S of the current timetGenerating the dialog context word y at the current timet(ii) a A dialog context generation module for generating y1~ytThe dialog context words of (a) make up the dialog context, the y1Representing the dialog context word at t = 1.
One of the embodiments of the present specification provides an apparatus for generating a dialog context, the apparatus including:
at least one processor and at least one memory; the at least one memory is for storing computer instructions; the at least one processor is configured to execute at least some of the computer instructions to implement a method of generating a dialog context.
One of the embodiments of the present specification provides a computer-readable storage medium storing computer instructions, and when the computer instructions in the storage medium are read by a computer, the computer executes at least part of the instructions to realize a method for generating a dialog context.
Drawings
The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:
FIG. 1 is a structured flow diagram of a method of generating a dialog context, shown in accordance with some embodiments of the present description;
FIG. 2 is a schematic structural diagram of a first attention model according to some embodiments herein;
FIG. 3 is a diagram of an application scenario for generating a dialog context, shown in accordance with some embodiments of the present description;
FIG. 4 is an exemplary conversation between a chat bot and a user, shown in some embodiments herein.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.
Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
FIG. 1 is a structured flow diagram of a method of generating a dialog context, shown in some embodiments in accordance with the present description.
Step 110, obtaining the dialog text, and generating a sequence h of coding hidden state by using a dialog coder according to the dialog text1~hn
In some embodiments, the dialog context may be a dialog statement of a user chatting in the dialog system. The chat Conversation system is a kind of Human-Machine Conversation (Human-Machine Conversation) system, which makes a Machine understand and use natural language to communicate emotionally with a user. Unlike the man-machine dialogue systems that are commonly used for information query and retrieval of specific service types, dialogue through the chat dialogue system is open, i.e. the user's dialogue sentences can be any reasonable natural language sentences, such as: "I want to go to climb a mountain", "do you like to watch a movie", etc. The user dialog may be entered into the dialog system by the user via a human-machine interface, including but not limited to the following: voice input, text input, etc. In some embodiments, to generate a system conversation using more of the above information, the conversation context may include all of the historical conversations of the user and the system during a conversation.
In some embodiments, the dialog system may be implemented using a Sequence-to-Sequence (Seq 2 Seq) network architecture based on a contextual attention mechanism, the Seq2Seq consisting of a dialog encoder and a dialog decoder, the dialog encoder converting a variable length dialog context into a fixed length vector representation and the dialog decoder converting the fixed length vector into a variable length dialog context. In some embodiments, the dialog encoder may be constructed based on a Bi-directional Long Short-Term Memory (Bi-directional LSTM) model. The Bi-directional LSTM is composed of two LSTM models, the first one processes an input sentence sequence from left to right, the other one processes the input sentence sequence from right to left, and hidden states obtained by the two LSTMs are combined at each moment of coding processing and are output as the hidden state of the whole model. Because the Bi-directional LSTM takes the information of the whole context into consideration when encoding, the Bi-directional LSTM has better encoding effect compared with the unidirectional LSTM. In some embodiments, the dialog encoder may also be constructed based on other sequence-based models, such as bi-directional gru, without being limited by the description herein.
In some embodiments, all tokens (tokens) in the dialog context are converted into word-embedding vectors using a word-embedding model, and then input to a dialog encoder, which outputs a compilation of time instantsSequence h of code hidden states1~hn. Wherein h isnAll information above the dialog is contained. The word embedding model may include, but is not limited to: word2vec model, Term Frequency-Inverse Document Frequency model (TF-IDF), or SSWE-C (skip-gram based combined-sensitive Word embedding) model, etc.
Step 120, obtaining the decoding hidden state S at the current timet
In some embodiments, the encoding hidden state h generated by the dialog encoder at the last moment in step 110 is usednAs an initial intermediate semantic vector between the dialog encoder and the dialog decoder, i.e. to encode the hidden state hnInitial decoding hidden state S as dialog decoder0At the moment when t is 1, the dialog decoder conceals the state S according to the initial decoding0Decoding to generate a first decoded hidden state S1
In some embodiments, because the decoding process of the dialog context is unidirectional, the dialog decoder may be constructed based on a unidirectional LSTM model. In some embodiments, the dialog decoder may also be constructed based on other sequence-based models, and is not limited by the description herein.
In some embodiments, at the current time (any time t after the first time), the decoding hidden state S at the previous time (time t-1) can be based ont-1Inputting the decoding of the current time
Figure 156723DEST_PATH_IMAGE003
And knowledge fusion vector of previous time
Figure 765559DEST_PATH_IMAGE004
Generating a decoding hidden state S at the current time as an input of a dialog decodertThe calculation formula is as follows:
Figure 470210DEST_PATH_IMAGE005
(1)
wherein the content of the first and second substances,
Figure 480891DEST_PATH_IMAGE004
for knowledge fusion vectors of the previous time, knowledge fusion vectors
Figure 980006DEST_PATH_IMAGE006
For details see in step 150, see,
Figure 607296DEST_PATH_IMAGE007
for knowledge gating, for determining knowledge fusion vectors
Figure 600791DEST_PATH_IMAGE004
The ratio in the input is obtained by the following formula:
Figure 465979DEST_PATH_IMAGE008
(2)
wherein the content of the first and second substances,
Figure 135995DEST_PATH_IMAGE009
and
Figure 72420DEST_PATH_IMAGE010
for a learnable parameter, by
Figure 384452DEST_PATH_IMAGE009
And
Figure 369726DEST_PATH_IMAGE010
to pair
Figure 210643DEST_PATH_IMAGE003
And
Figure 563258DEST_PATH_IMAGE004
and performing linear transformation on the formed splicing matrix, and mapping the result into a range of (0, 1) through a sigmoid function. The sigmoid function is also called Logistic function, and is commonly used for hiding layer neuron output.
In formula (1)
Figure 147823DEST_PATH_IMAGE003
Obtained from the following equation:
Figure 987603DEST_PATH_IMAGE011
(3)
wherein the content of the first and second substances,
Figure 999421DEST_PATH_IMAGE012
and
Figure 823021DEST_PATH_IMAGE013
is a parameter that can be learned by the user,
Figure 680119DEST_PATH_IMAGE014
context vector generated for last moment, with respect to context vector
Figure 656296DEST_PATH_IMAGE015
See step
160, e (y)t-1) Dialog word y generated for the last momentt-1The words of (a) are embedded into the vector,
Figure 839016DEST_PATH_IMAGE003
passing parameters
Figure 415490DEST_PATH_IMAGE012
And
Figure 341858DEST_PATH_IMAGE013
for e (y)t-1) And
Figure 172542DEST_PATH_IMAGE014
the splicing matrix is obtained by linear transformation.
As can be seen from equations (1) to (3), the decoding hidden state S is generated at the current timetExcept that the context vector of the previous time instant is used
Figure 260584DEST_PATH_IMAGE014
And the dialogue word y generated at the last momentt-1AsInput of dialog decoder (realization of existing decoder based on context attention mechanism) and knowledge fusion vector of last moment
Figure 58775DEST_PATH_IMAGE004
Dynamic integration of external knowledge into the current decoding hidden state S generated by the dialog decodertIn (1). From step 150, knowledge fusion vectors
Figure 788834DEST_PATH_IMAGE006
Is dynamically generated using an attention mechanism so that the dialog decoder can focus on different information in the extrinsic knowledge at each moment of decoding.
Step 130, generating a word vector.
In some embodiments, at least one knowledge text related to the context of the conversation may be obtained from the context of the conversation. Specifically, at least one knowledge text related to the above of the conversation is recalled by querying a knowledge base.
In some embodiments, the knowledge text may be in a ternary format, i.e., the text format is: subject + predicate + object. For example: "Zhang three, height, 226", "Liqu, representational name, drama a" and so on. In some embodiments, the knowledge text may also be a piece of text or other format, and is not limited by the description herein. In some embodiments, the language of the knowledge text may be chinese, english or other languages, and is not limited by the description of the present specification. In some embodiments, the knowledge base may be queried by the search system according to some keywords or/words in the dialog, the relevant knowledge texts may be recalled, the top N knowledge texts may be selected by sorting, and the value of N may be selected by comprehensively considering the computation workload of the model and the acquired richness of knowledge, for example, N may be 30, or other values. In some embodiments, the content in the knowledge base may be from an open source data set, such as Wizard-of-Wikipedia, DuConv, and the like.
In some embodiments, knowledge encoding may be used for any knowledge textThe device encodes each word in the knowledge text, generating a plurality of word vectors. Specifically, the word segmentation model may be used to divide the knowledge text into token sequences, the obtained token sequences are input into a knowledge encoder, the knowledge encoder encodes each token, and a word vector sequence corresponding to the knowledge text is generated
Figure 723292DEST_PATH_IMAGE016
. Finally obtaining a sequence Z of m word vectors corresponding to m knowledge texts1~Zm. In some embodiments, transform's encoder may be used as the knowledge encoder. The Transformer is a classical model of natural language processing, the Transformer encoder does not adopt the sequential structure of an RNN model, but processes each input token in parallel through a Self-extension layer and a feed-forward layer, residual connection is used among sub-layers, and finally each input token is coded into a word vector representation with global information.
In some embodiments, a sequence Z of m word vectors to be obtained may be obtained using a third attention model1~ZmCombine into m knowledge vectors k1~km. See step 140 for a detailed description of the third attention model. In some embodiments, other methods may also be used to obtain at least one knowledge vector k corresponding to at least one knowledge text1~kmFor example, the encoding may be performed by a knowledge graph embedding method, and is not limited by the description of the present specification.
Step 140, a knowledge vector is generated.
In some embodiments, there may be many knowledge texts (e.g. 30 knowledge texts) obtained in step 130, and in order to use all the information contained in these texts for decoding by the decoder, it is first necessary to use one knowledge vector to represent each knowledge text, and then generate knowledge fusion vectors according to the method described in step 150 from the knowledge vectors corresponding to all the knowledge texts. Thus, in some embodiments, a knowledge vector may be generated using the third attention model from the generated plurality of word vectors.
In some embodiments, the input to the third attention model may be the sequence Z of m word vectors obtained in step 1301~ZmJ-th of (1): zj,ZjComprises a
Figure 792355DEST_PATH_IMAGE017
Individual word vectors:
Figure 77843DEST_PATH_IMAGE018
the output may be the m knowledge vectors k described in step 1301~kmJ-th knowledge vector of (1): k is a radical ofj. The third attention model is implemented as follows:
for multiple word vectors
Figure 814854DEST_PATH_IMAGE018
For each of the word vectors
Figure 869398DEST_PATH_IMAGE019
Performing weighting operation, and processing the operation result by using activation function to generate word attention vector wtThe calculation formula is as follows:
Figure 581133DEST_PATH_IMAGE020
(4)
wherein, VzAnd WzAre learnable parameters. In some embodiments, word vectors that are more relevant to the context of a conversation may be trained
Figure 557180DEST_PATH_IMAGE019
Corresponding word attention vector
Figure 629041DEST_PATH_IMAGE021
And is also higher.
(II) attention vector w based on wordstGenerating a plurality of word attention weights corresponding to the word attention vectors using a scoring function
Figure 803670DEST_PATH_IMAGE022
The calculation formula is as follows:
Figure 669995DEST_PATH_IMAGE023
(5)
wherein the softmax function is paired with wtIs normalized to obtain a value in the range of (0, 1)
Figure 680807DEST_PATH_IMAGE024
. In some embodiments, word attention weight may be obtained by calculating cosine similarity
Figure 290780DEST_PATH_IMAGE024
. In some embodiments, other ways of calculating word attention weights may also be used
Figure 788758DEST_PATH_IMAGE024
And are not intended to be limited by the description herein.
(III) respectively calculating a plurality of word vectors
Figure 357142DEST_PATH_IMAGE018
And multiple word attention weights
Figure 573360DEST_PATH_IMAGE024
And summing the results of the calculations to generate a knowledge vector kjThe calculation formula is as follows:
Figure 737756DEST_PATH_IMAGE025
(6)
by using an attention mechanism, a plurality of word vectors are encoded
Figure 621399DEST_PATH_IMAGE018
Combining into a knowledge vector k based on importancejKnowledge vector kjAnd word vector
Figure 95105DEST_PATH_IMAGE019
The dimension is the same and comprises word vectors
Figure 64198DEST_PATH_IMAGE018
All the information of (2) is convenient for subsequent calculation.
Step 150, generating the knowledge fusion vector of the current time
Figure 484815DEST_PATH_IMAGE001
In some embodiments, the at least one knowledge vector k may be based on1~kmAnd decoding hidden state S at current timetGenerating a knowledge fusion vector for a current time using a first attention model
Figure 236346DEST_PATH_IMAGE001
In some embodiments, as shown in FIG. 2, the input to the first attention model may be at least one knowledge vector k1~kmAnd decoding hidden state S at current timetThe output may be a knowledge fusion vector for the current time instant
Figure 615375DEST_PATH_IMAGE001
. About knowledge vector k1~kmSee step 130 for details. Decoding hidden state S about current timetReferring to step 120, at each decoding time, the dialog decoder generates a decoding hidden state according to the method described in step 120, so that t-1 decoding hidden states have been generated before the current time t: s1~St-1. The first attention model is implemented as follows:
for knowledge vector k1~kmOf the knowledge vector and the decoded hidden state S at the current time instanttPerforming weighted summation operation, and processing the operation result by using activation function to generate knowledge attention vector
Figure 71764DEST_PATH_IMAGE026
The calculation formula is as follows:
Figure 827230DEST_PATH_IMAGE027
(7)
wherein
Figure 436197DEST_PATH_IMAGE028
Obtained for learnable parameters
Figure 251707DEST_PATH_IMAGE026
Representing a knowledge vector k for a vector of m real numbers1~kmDecoding hidden state S corresponding to current timetThe degree of correlation of (c). In some embodiments, the parameters are trained
Figure 195392DEST_PATH_IMAGE028
So that the decoding from the current time is hidden from the state StKnowledge vector k with high correlationiCalculated to obtain
Figure 957812DEST_PATH_IMAGE029
The higher.
(II) attention vector based on knowledge
Figure 952444DEST_PATH_IMAGE026
Generating and knowledge vector k using a scoring function1~kmCorresponding knowledge attention weight
Figure 938854DEST_PATH_IMAGE030
The calculation formula is as follows:
Figure 838677DEST_PATH_IMAGE031
(8)
wherein, softmax function pair
Figure 935946DEST_PATH_IMAGE026
Each element in (1)Normalizing the elements to obtain values in the range of (0, 1)
Figure 503194DEST_PATH_IMAGE030
. In some embodiments, knowledge attention weight may be obtained by calculating cosine similarity
Figure 411238DEST_PATH_IMAGE030
. In some embodiments, other ways of computing knowledge attention weights may also be used
Figure 329515DEST_PATH_IMAGE030
And are not intended to be limited by the description herein.
(III) calculating knowledge vectors k respectively1~kmAnd knowledge attention weight
Figure 433738DEST_PATH_IMAGE030
And summing the calculation results to generate a knowledge fusion vector at the current moment
Figure 121071DEST_PATH_IMAGE001
The calculation formula is as follows:
Figure 449284DEST_PATH_IMAGE032
(9)
in the embodiments described in this specification, the vectors are fused by introducing knowledge in the decoding process
Figure 337081DEST_PATH_IMAGE006
The knowledge vector k can be obtained by the dialog decoder at each moment of decoding1~kmAvoids losing part of knowledge information in the decoding process. And hiding the state S by using the current decodingtCalculated knowledge attention weight dtKnowledge vector k may be given at different times1~kmWith different attention. For example: at the current moment, if the knowledge vector k2Knowledge information contained with currentDecoding hidden state StThe correlation is higher, then k2Corresponding knowledge attention weight d2Will also be higher, at the next instant if the knowledge vector k3Knowledge information contained and decoding hidden state S at next momentt+1The correlation is higher, then k3Corresponding knowledge attention weight d3Will also be higher. It is possible to make the dialog decoder focus on different information in the extrinsic knowledge at different moments of decoding, for example: at the current time in the above example, the decoder may be interested in the knowledge vector k1~kmK in (1)2At the next instant, the decoder may look to the knowledge vector k1~kmK in (1)3
Step 160, generating a context vector of the current time
Figure 41732DEST_PATH_IMAGE002
In step 120, the encoded hidden state h of the semantic information of the above dialog is savednAs the initial intermediate semantic vector C between the dialog encoder and the dialog decoder. Since this vector length is fixed, when the length of the dialog context is long, the intermediate semantic vector C cannot hold all the semantic information, thereby limiting the comprehensibility of the dialog decoder. Thus, in some embodiments, a dynamic attention mechanism is used to generate a context vector for a current time instant
Figure 52413DEST_PATH_IMAGE002
As an intermediate semantic vector for the current time between the dialog encoder and the dialog decoder. In some embodiments, the concealment state may be encoded according to a sequence h1~hnAnd decoding hidden state S at current timetGenerating a context vector for the current time using a second attention model
Figure 551528DEST_PATH_IMAGE002
In some embodiments, the input to the second attention model may beDecoding hidden states St and sequences h of encoding hidden states at previous instants1~hnThe output may be a context vector for the current time instant
Figure 178818DEST_PATH_IMAGE002
. The second attention model is implemented as follows:
for coding sequence h of hidden states1~hnFor each of the encoded hidden state and the decoded hidden state S at the current timetPerforming weighted summation operation, and processing the operation result by using activation function to generate context attention vector
Figure 906734DEST_PATH_IMAGE033
The calculation formula is as follows:
Figure 37501DEST_PATH_IMAGE034
(10)
wherein the content of the first and second substances,
Figure 707517DEST_PATH_IMAGE035
are learnable parameters. Obtained by
Figure 822103DEST_PATH_IMAGE033
Representing the sequence h of encoded hidden states for a vector of n real numbers1~hnDecoding hidden state S corresponding to current timetCorrelation of (2), coding hidden state h with high correlationiCorresponding to
Figure 602977DEST_PATH_IMAGE036
The higher.
(II) attention vector based on context
Figure 588251DEST_PATH_IMAGE033
Generating and encoding a sequence h of hidden states using a scoring function1~hnCorresponding contextual attention weights
Figure 179900DEST_PATH_IMAGE037
The calculation formula is as follows:
Figure 781783DEST_PATH_IMAGE038
(11)
wherein, softmax function pair
Figure 835190DEST_PATH_IMAGE033
Is normalized to obtain a value in the range of (0, 1)
Figure 206128DEST_PATH_IMAGE037
. In some embodiments, the contextual attention weight may be obtained by calculating cosine similarity
Figure 217946DEST_PATH_IMAGE037
. In some embodiments, other ways to compute the contextual attention weight may also be used
Figure 792278DEST_PATH_IMAGE037
And are not intended to be limited by the description herein.
(III) respectively calculating sequences h of coding hidden states1~hnAnd a plurality of contextual attention weights
Figure 180534DEST_PATH_IMAGE037
And summing the calculation results to generate a context vector at the current time
Figure 874821DEST_PATH_IMAGE002
The calculation formula is as follows:
Figure 57541DEST_PATH_IMAGE039
(12)
by using
Figure 634015DEST_PATH_IMAGE002
Can make the dialog decoder decode at each timeThe sequence h of the coding hidden state can be obtained at all times1~hnAll of the information in (1). At the same time, the hidden state S is concealed by using the current decodingtCalculated contextual attention weight
Figure 319905DEST_PATH_IMAGE040
The sequence h of the coded hidden states is given at different times1~hnDifferent attention is paid so that the dialog decoder can focus on different information in the dialog context at various moments of decoding. For example: the dialog encoder, based on the dialog context: "Tom chasJerry" generates a sequence h encoding a hidden state1~h3At time t =1 when the dialog decoder decodes, attention is weighted by context
Figure 399856DEST_PATH_IMAGE041
Administration of h1The highest attention, and therefore the dialog decoder may be in
Figure 487898DEST_PATH_IMAGE042
Is concerned with h1Information of the word "Tom" in the corresponding dialog context, and so on, at the time t =2 at which the dialog decoder decodes, the dialog decoder may be at
Figure 286089DEST_PATH_IMAGE043
Is concerned with h2The information of the word "chase" in the corresponding dialog context may be decoded by the dialog decoder at time t =3
Figure 750569DEST_PATH_IMAGE044
Is concerned with h3The corresponding dialog is for information of the word "Jerry" above. If the sequence h of the coding hidden states is not given at different times1~hnWith different attention, the information that the dialog decoder can take into account is the same at any time the dialog decoder decodes.
At step 170, the following terms are generated.
In some casesIn an embodiment, the vector may be fused based on knowledge of the current time instant
Figure 701338DEST_PATH_IMAGE001
Context vector of current time
Figure 491440DEST_PATH_IMAGE002
And a decoding hidden state St at the current time, and generating a dialog context word yt at the current time. Specifically, a context vector of the current time is generated
Figure 980190DEST_PATH_IMAGE002
Decoding hidden state St at current time and knowledge fusion vector at current time
Figure 513940DEST_PATH_IMAGE001
The splicing matrix is input into a word selection model, and the word selection model is used for predicting the following words y of the conversation at the current momenttThe calculation formula is as follows:
Figure 568483DEST_PATH_IMAGE045
(13)
wherein, V1、V2、b1And b2For learnable parameters, in equation (13), St
Figure 14639DEST_PATH_IMAGE002
And
Figure 256265DEST_PATH_IMAGE001
composed mosaic matrix [ S ]t,
Figure 328126DEST_PATH_IMAGE002
,
Figure 237176DEST_PATH_IMAGE001
]Two linear transformations were performed with the above parameters: 1. [ S ]t,
Figure 369080DEST_PATH_IMAGE002
,
Figure 114313DEST_PATH_IMAGE001
]And V1Performing dot product operation to obtain vector and offset vector b1And performing addition operation. 2. Operation result obtained by linear transformation 1 and V2Performing dot product operation to obtain vector and offset vector b2And performing addition operation to finally obtain a vector consisting of M real numbers. M is the size of the vocabulary used by the dialog system, and M real numbers represent the similarity of the predicted dialog context word and the M words in the vocabulary, respectively. Then, the vector is normalized by a softmax function to obtain probabilities represented by M scores (within the range of 0-1), and words in the extracted vocabulary corresponding to the highest probability are selected as the words yt in the conversation context at the current moment.
In some embodiments, y1~ytThe dialog context words of (a) constitute a dialog context, y1Representing the dialog context word at t = 1. The embodiments described in this specification fuse vectors by adding knowledge based on the existing context awareness based Seq2Seq network structure
Figure 989866DEST_PATH_IMAGE001
The dynamic interaction of the external knowledge and the dialog decoder is realized, so that the dialog decoder can dynamically introduce the external knowledge in the decoding process, and a dialog system can naturally and smoothly switch new topics. The following is an example of a set of human-machine dialog and knowledge text shown in fig. 4:
the history dialog shown in fig. 4, a is a dialog of chatting the robot, B is a dialog of the user, and the dialogs that have been performed are: "A: you are recommended a good-for-mouth movie bar. "," a: kay, director F of movie a may be a true stick to inspire the movie. "," B: the director has a good ear! ". At the start of the conversation, the chat bot recalls the set of knowledge texts shown in fig. 4 from the knowledge base: "F, ancestry, USA", "F, gender, male", "F,representative works, movie b "," F, profession, director "," F, date of birth, 1925/12/5/1925 "," movie a, director, F "," movie a, public praise, love movie with good public praise "," movie a, winning prize, american gold prize nomination-tv-like-best mini-drama ", for a total of 8 knowledge texts, please refer to step 130 for a related description of the knowledge texts. Knowledge vectors k corresponding to the above 8 knowledge texts can then be generated according to the methods described in steps 130 and 140 in this specification1~k8. In the generated dialog shown in fig. 4, the knowledge text has been integrated: "movie a, public praise, love movie with public praise", "movie a, director, F". The last dialog in fig. 4 is "B: the director has a good ear! ", focusing the dialog on the director-F, the dialog decoder generates a decoded hidden state S at time t =1 from the intermediate semantic vector generated above the dialog1(see step 120 for details), then based on the knowledge vector k using a dynamic attention mechanism as described in step 1501~k8And decoding the state S1Generating knowledge fusion vectors
Figure 19001DEST_PATH_IMAGE046
Wherein knowledge fusion vectors are being generated due to the conversational context
Figure 56228DEST_PATH_IMAGE046
In the process of (1), the knowledge text "F, date of birth, 5/12/1925" and "F" is given a higher attention to the knowledge vector corresponding to the movie b ", wherein the degree of matching between" F, date of birth, 5/12/1925 "and the dialogue text is relatively higher, so that the attention given to the corresponding knowledge vector is the highest. The knowledge is then fused into vectors as described in step 170
Figure 551406DEST_PATH_IMAGE046
As one of the inputs of the word selection model, the word selection model may extract the knowledge information with the highest current attention in the process of generating the following words: "F, date of birth, 5 months and 12 days 1925". At the time t =2, knowledgeFused vector
Figure 496229DEST_PATH_IMAGE046
Again, this is one of the inputs to the dialog decoder, so the decoded hidden state S2 generated by the dialog decoder at time t =2 contains the most recently interesting knowledge information: "F, date of birth, 5 months and 12 days 1925". By analogy, vectors are fused by the knowledge described above
Figure 661762DEST_PATH_IMAGE001
The repeated interaction process with the dialog decoder can coherently integrate the knowledge text "F, date of birth, 5/12/1925" and "F, representative, movie b" into the dialog context, resulting in a dialog context "he was born at 5/12/1925" and he also has a representation, movie b ".
In some embodiments, a training set of a dialog context, at least one knowledge text, and a dialog context may be obtained, and a dialog system composed of a dialog encoder, a dialog decoder, a knowledge encoder, a first attention model, a second attention model, a third attention model, and a softmax function may be trained using a back propagation algorithm. Specifically, the dialogue context can be used as a label, and model training is performed in an end-to-end training mode, so that a trained dialogue system is obtained.
It should be noted that the above description of the process 100 is for illustration and description only, and does not limit the scope of the application of the present disclosure. Various modifications and alterations to process 100 will become apparent to those skilled in the art in light of the present description. However, such modifications and variations are intended to be within the scope of the present description. For example, step 130 and step 140 may be combined into one step, and a word vector corresponding to the knowledge text is generated in the same step, and a knowledge vector is generated based on the word vector.
FIG. 3 is a diagram of an application scenario for generating a dialog context, shown in some embodiments according to the present description.
As shown in fig. 3, during the chat process between the chat robot and the user, there is a dialog: "recommend a theatrical bar to you. "," good, you recommend a bar ". The chat robot generates a conversation context and carries out a smooth conversation with the user by using the method described in the specification: the 'called TV play a' is the one played by Zhang three actors. "i prefer three. ",.... Please refer to fig. 1 for a method for generating a dialog context in detail, which is not described herein again.
The method described in this specification can also be applied to other application scenarios, and is not limited by the description of this specification.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.
Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.
Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.
For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims (24)

1. A method of generating a dialog context, the method comprising:
acquiring a conversation text, acquiring at least one knowledge text related to the conversation text according to the conversation text, and generating at least one knowledge vector k corresponding to the at least one knowledge text1~km(ii) a The knowledge text is stored in a knowledge base;
according to at least one knowledge vector k1~kmAnd decoding hidden state S at current timetGenerating a knowledge fusion vector for a current time using a first attention model
Figure 765214DEST_PATH_IMAGE001
Knowledge fusion vector based on the current time
Figure 485040DEST_PATH_IMAGE001
When inContext vector of previous time
Figure 702394DEST_PATH_IMAGE002
And a decoding hidden state S of the current timetGenerating the dialog context word y at the current timet
y1~ytThe dialog context words of (a) make up the dialog context, the y1Representing the dialog context word at t = 1.
2. The method of claim 1, wherein the context vector for the current time instant
Figure 440543DEST_PATH_IMAGE002
Generated in the following way:
generating a sequence h of encoded hidden states from the dialog context using a dialog encoder1~hn
Obtaining decoding hidden state S of current timet
According to the sequence h of the coded hidden states1~hnAnd a decoding hidden state S of the current timetGenerating a context vector for the current time using a second attention model
Figure 85151DEST_PATH_IMAGE002
3. The method of claim 2, wherein the generating at least one knowledge vector k corresponding to the at least one knowledge text1~kmThe method comprises the following steps:
for any of the knowledge texts, encoding each word in the knowledge text using a knowledge encoder, and generating the knowledge vector using a third attention model based on the generated plurality of word vectors.
4. The method of claim 3, wherein the generating the knowledge vector using a third attention model from the generated plurality of word vectors comprises:
for each of the plurality of word vectors, performing a weighted operation on the word vector, and processing an operation result by using an activation function to generate a word attention vector;
generating a plurality of word attention weights corresponding to a plurality of the word vectors using a scoring function based on the word attention vectors;
generating the knowledge vector based on a plurality of the word vectors and a plurality of the word attention weights.
5. The method of claim 4, wherein said k is based on at least one of said knowledge vectors1~kmAnd decoding hidden state S at current timetGenerating a knowledge fusion vector for a current time using a first attention model
Figure 292273DEST_PATH_IMAGE001
The method comprises the following steps:
for at least one of the knowledge vectors k1~kmFor each of the knowledge vector and the decoded hidden state S at the current timetPerforming weighted summation operation, and processing the operation result by using an activation function to generate a knowledge attention vector;
generating at least one knowledge attention weight corresponding to at least one of the knowledge vectors using a scoring function based on the knowledge attention vector;
generating a knowledge fusion vector for the current time based on at least one knowledge vector and at least one knowledge attention weight
Figure 516581DEST_PATH_IMAGE001
6. The method of claim 5, wherein the obtaining of the decoding hidden state S at the current timetThe method comprises the following steps:
at the current moment, based on the previous oneDecoding hidden state S of timet-1Inputting the decoding of the current time
Figure 374815DEST_PATH_IMAGE003
And knowledge fusion vector of previous time
Figure 190324DEST_PATH_IMAGE004
Generating a decoding hidden state S at the current time as an input of a dialog decodert(ii) a Wherein, by knowledge gating
Figure 868430DEST_PATH_IMAGE005
To determine the decoded input at said current time instant
Figure 896429DEST_PATH_IMAGE003
And the knowledge fusion vector of the previous moment
Figure 368692DEST_PATH_IMAGE004
The ratio of (A) to (B);
decoded input of the current time
Figure 355102DEST_PATH_IMAGE003
The dialog context word y generated from the previous momentt-1Word-embedded representation e (y)t-1) And context vector of previous time instant
Figure 786084DEST_PATH_IMAGE006
And (4) combining and generating.
7. Method according to claim 6, wherein said sequence h according to said coded hidden state1~hnAnd a decoding hidden state S of the current timetGenerating a context vector for the current time using a second attention model
Figure 352194DEST_PATH_IMAGE002
The method comprises the following steps:
for said sequence h of encoded hidden states1~hnFor each of said encoded hidden state and said decoded hidden state S at said current timetPerforming weighted summation operation, and processing the operation result by using an activation function to generate a context attention vector;
generating and encoding the hidden state h using a scoring function based on the context attention vector1~hnA corresponding plurality of contextual attention weights;
based on the coding hidden state h1~hnAnd generating a context vector for the current time instant using a plurality of the context attention weights
Figure 919442DEST_PATH_IMAGE002
8. The method of claim 7, wherein the fusion vector is based on knowledge of the current time instant
Figure 827486DEST_PATH_IMAGE001
Context vector of current time
Figure 480184DEST_PATH_IMAGE002
And a decoding hidden state S of the current timetGenerating the dialog context word y at the current timetThe method comprises the following steps:
generating a context vector for the current time instant
Figure 381144DEST_PATH_IMAGE002
Decoding hidden state S of the current timetAnd knowledge fusion vector of the current time
Figure 68478DEST_PATH_IMAGE001
Based on the concatenation vector, predicting the dialog context word y at the current time using a softmax functiont
9. The method of claim 8, wherein said obtaining at least one knowledge text from the dialog context comprises:
and querying the knowledge base, and recalling the at least one knowledge text related to the conversation.
10. The method of claim 9, wherein the knowledge encoder is a transform encoder.
11. The method of claim 10, wherein the method further comprises:
obtaining a training set consisting of the dialog context, the at least one knowledge text, and the dialog context, training a dialog system consisting of the dialog encoder, the dialog decoder, the knowledge encoder, the first attention model, the second attention model, the third attention model, and the softmax function using a back propagation algorithm.
12. A system for generating a dialog context, the system comprising:
a knowledge vector generation module, configured to obtain a dialog context, obtain at least one knowledge text related to the dialog context according to the dialog context, and generate at least one knowledge vector k corresponding to the at least one knowledge text1~km(ii) a The knowledge text is stored in a knowledge base;
a knowledge fusion vector generation module for generating a knowledge fusion vector k based on at least one knowledge vector k1~kmAnd a decoding hidden state St of the current time, and generating a knowledge fusion vector of the current time by using the first attention model
Figure 147423DEST_PATH_IMAGE001
A dialogue following word generation module for fusing vector based on the knowledge of the current time
Figure 287417DEST_PATH_IMAGE001
Context vector of current time
Figure 195331DEST_PATH_IMAGE002
And the decoding hidden state St of the current moment generates the dialog context word y of the current momentt
A dialog context generation module for generating y1~ytThe dialog context words of (a) make up the dialog context, the y1Representing the dialog context word at t = 1.
13. The system of claim 12, further comprising:
an encoding module for generating a sequence h of encoded hidden states using a dialog encoder based on the dialog context1~hn
A decoding module for obtaining the decoding hidden state S at the current timet
A context vector generation module for generating a context vector based on the sequence h of the coded hidden states1~hnAnd a decoding hidden state S of the current timetGenerating a context vector for the current time using a second attention model
Figure 737170DEST_PATH_IMAGE002
14. The system of claim 13, wherein the generating at least one knowledge vector k corresponding to the at least one knowledge text1~kmThe method comprises the following steps:
for any of the knowledge texts, encoding each word in the knowledge text using a knowledge encoder, and generating the knowledge vector using a third attention model based on the generated plurality of word vectors.
15. The system of claim 14, wherein the generating the knowledge vector using a third attention model from the generated plurality of word vectors comprises:
for each of the plurality of word vectors, performing a weighted operation on the word vector, and processing an operation result by using an activation function to generate a word attention vector;
generating a plurality of word attention weights corresponding to a plurality of the word vectors using a scoring function based on the word attention vectors;
generating the knowledge vector based on a plurality of the word vectors and a plurality of the word attention weights.
16. The system of claim 15, wherein the function is based on at least one of the knowledge vectors k1~kmAnd decoding hidden state S at current timetGenerating a knowledge fusion vector for a current time using a first attention model
Figure 236285DEST_PATH_IMAGE001
The method comprises the following steps:
for at least one of the knowledge vectors k1~kmFor each of the knowledge vector and the decoded hidden state S at the current timetPerforming weighted summation operation, and processing the operation result by using an activation function to generate a knowledge attention vector;
generating at least one knowledge attention weight corresponding to at least one of the knowledge vectors using a scoring function based on the knowledge attention vector;
generating a knowledge fusion vector for the current time based on at least one knowledge vector and at least one knowledge attention weight
Figure 614308DEST_PATH_IMAGE001
17. The system of claim 16, wherein the obtaining of the decoding hidden state S at the current timetThe method comprises the following steps:
at the present timeMoment based on the decoded hidden state S of the previous momentt-1Inputting the decoding of the current time
Figure 325912DEST_PATH_IMAGE003
And knowledge fusion vector of previous time
Figure 987837DEST_PATH_IMAGE004
Generating a decoding hidden state S at the current time as an input of a dialog decodert(ii) a Wherein, by knowledge gating
Figure 657853DEST_PATH_IMAGE005
To determine the decoded input at said current time instant
Figure 975702DEST_PATH_IMAGE003
And the knowledge fusion vector of the previous moment
Figure 22155DEST_PATH_IMAGE004
The ratio of (A) to (B);
decoded input of the current time
Figure 286390DEST_PATH_IMAGE003
The dialog context word y generated from the previous momentt-1Word-embedded representation e (y)t-1) And context vector of previous time instant
Figure 127307DEST_PATH_IMAGE006
And (4) combining and generating.
18. The system according to claim 17, wherein said sequence h according to said encoded concealment state1~hnAnd a decoding hidden state S of the current timetGenerating a context vector for the current time using a second attention model
Figure 463610DEST_PATH_IMAGE002
The method comprises the following steps:
for said sequence h of encoded hidden states1~hnFor each of said encoded hidden state and said decoded hidden state S at said current timetPerforming weighted summation operation, and processing the operation result by using an activation function to generate a context attention vector;
generating and encoding the hidden state h using a scoring function based on the context attention vector1~hnA corresponding plurality of contextual attention weights;
based on the coding hidden state h1~hnAnd generating a context vector for the current time instant using a plurality of the context attention weights
Figure 517017DEST_PATH_IMAGE002
19. The system of claim 18, wherein the knowledge fusion vector based on the current time of day
Figure 887956DEST_PATH_IMAGE001
Context vector of the current time
Figure 650506DEST_PATH_IMAGE002
And a decoding hidden state S of the current timetGenerating the dialog context word y at the current timetThe method comprises the following steps:
generating a context vector for the current time instant
Figure 739685DEST_PATH_IMAGE002
Decoding hidden state S of the current timetAnd knowledge fusion vector of the current time
Figure 331203DEST_PATH_IMAGE001
Based on the concatenation vector, predicting the dialog context word y at the current time using a softmax functiont
20. The system of claim 19, wherein said obtaining at least one knowledge text from the dialog context comprises:
and querying the knowledge base, and recalling the at least one knowledge text related to the conversation.
21. The system of claim 20, wherein the knowledge encoder is a transform encoder.
22. The system of claim 21, wherein the system further comprises:
a training module to obtain a training set of the dialog context, the at least one knowledge text, and the dialog context, train a dialog system composed of the dialog encoder, the dialog decoder, the knowledge encoder, the first attention model, the second attention model, the third attention model, and the softmax function using a back propagation algorithm.
23. An apparatus to generate a dialog context, wherein the apparatus comprises at least one processor and at least one memory;
the at least one memory is for storing computer instructions;
the at least one processor is configured to execute at least some of the computer instructions to implement the method of any of claims 1-11.
24. A computer-readable storage medium storing computer instructions which, when read by a computer, cause the computer to perform the method of any one of claims 1 to 11.
CN202010470216.XA 2020-05-28 2020-05-28 Method and system for generating dialog context Pending CN111382257A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010470216.XA CN111382257A (en) 2020-05-28 2020-05-28 Method and system for generating dialog context

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010470216.XA CN111382257A (en) 2020-05-28 2020-05-28 Method and system for generating dialog context

Publications (1)

Publication Number Publication Date
CN111382257A true CN111382257A (en) 2020-07-07

Family

ID=71217697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010470216.XA Pending CN111382257A (en) 2020-05-28 2020-05-28 Method and system for generating dialog context

Country Status (1)

Country Link
CN (1) CN111382257A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084314A (en) * 2020-08-20 2020-12-15 电子科技大学 Knowledge-introducing generating type session system
CN112214591A (en) * 2020-10-29 2021-01-12 腾讯科技(深圳)有限公司 Conversation prediction method and device
CN112328756A (en) * 2020-10-13 2021-02-05 山东师范大学 Context-based dialog generation method and system
CN113032545A (en) * 2021-05-29 2021-06-25 成都晓多科技有限公司 Method and system for conversation understanding and answer configuration based on unsupervised conversation pre-training
CN114942986A (en) * 2022-06-21 2022-08-26 平安科技(深圳)有限公司 Text generation method and device, computer equipment and computer readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399460A (en) * 2019-07-19 2019-11-01 腾讯科技(深圳)有限公司 Dialog process method, apparatus, equipment and storage medium
CN110851575A (en) * 2019-09-23 2020-02-28 上海深芯智能科技有限公司 Dialogue generating system and dialogue realizing method
CN110858215A (en) * 2018-08-23 2020-03-03 广东工业大学 End-to-end target guiding type dialogue method based on deep learning
CN111125333A (en) * 2019-06-06 2020-05-08 北京理工大学 Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism
CN111159368A (en) * 2019-12-12 2020-05-15 华南理工大学 Reply generation method for personalized dialogue
CN111159467A (en) * 2019-12-31 2020-05-15 青岛海信智慧家居系统股份有限公司 Method and equipment for processing information interaction
CN111191015A (en) * 2019-12-27 2020-05-22 上海大学 Neural network movie knowledge intelligent dialogue method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110858215A (en) * 2018-08-23 2020-03-03 广东工业大学 End-to-end target guiding type dialogue method based on deep learning
CN111125333A (en) * 2019-06-06 2020-05-08 北京理工大学 Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism
CN110399460A (en) * 2019-07-19 2019-11-01 腾讯科技(深圳)有限公司 Dialog process method, apparatus, equipment and storage medium
CN110851575A (en) * 2019-09-23 2020-02-28 上海深芯智能科技有限公司 Dialogue generating system and dialogue realizing method
CN111159368A (en) * 2019-12-12 2020-05-15 华南理工大学 Reply generation method for personalized dialogue
CN111191015A (en) * 2019-12-27 2020-05-22 上海大学 Neural network movie knowledge intelligent dialogue method
CN111159467A (en) * 2019-12-31 2020-05-15 青岛海信智慧家居系统股份有限公司 Method and equipment for processing information interaction

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
HAO ZHOU等: "Commonsense Knowledge Aware Conversation Generation with Graph Attention", 《INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE 2018》 *
JANE: "对话清华大学周昊,详解IJCAI杰出论文及其背后的故事", 《HTTPS://JUEJIN.IM/POST/5B6A9E085188251A8D37136D》 *
PETAR VELICKOVIC等: "GRAPH ATTENTION NETWORKS", 《PUBLISHED AS A CONFERENCE PAPER AT ICLR 2018》 *
学习ML的皮皮虾: "基于常识知识图谱的对话模型【阅读笔记】", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/50502922》 *
李少博等: "基于知识拷贝机制的生成式对话模型", 《第十八届全国计算语言学学术会议THE 18TH CHINESE NATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS (CCL 2019)》 *
陈晨等: "基于深度学习的开放领域对话系统研究综述", 《计算机学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084314A (en) * 2020-08-20 2020-12-15 电子科技大学 Knowledge-introducing generating type session system
CN112084314B (en) * 2020-08-20 2023-02-21 电子科技大学 Knowledge-introducing generating type session system
CN112328756A (en) * 2020-10-13 2021-02-05 山东师范大学 Context-based dialog generation method and system
CN112214591A (en) * 2020-10-29 2021-01-12 腾讯科技(深圳)有限公司 Conversation prediction method and device
CN112214591B (en) * 2020-10-29 2023-11-07 腾讯科技(深圳)有限公司 Dialog prediction method and device
CN113032545A (en) * 2021-05-29 2021-06-25 成都晓多科技有限公司 Method and system for conversation understanding and answer configuration based on unsupervised conversation pre-training
CN114942986A (en) * 2022-06-21 2022-08-26 平安科技(深圳)有限公司 Text generation method and device, computer equipment and computer readable storage medium
CN114942986B (en) * 2022-06-21 2024-03-19 平安科技(深圳)有限公司 Text generation method, text generation device, computer equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US11423233B2 (en) On-device projection neural networks for natural language understanding
CN110782870B (en) Speech synthesis method, device, electronic equipment and storage medium
Kamath et al. Deep learning for NLP and speech recognition
CN110326002B (en) Sequence processing using online attention
CN111382257A (en) Method and system for generating dialog context
CN111312245B (en) Voice response method, device and storage medium
CN109508377A (en) Text feature, device, chat robots and storage medium based on Fusion Model
CN112214591B (en) Dialog prediction method and device
Deng et al. Foundations and Trends in Signal Processing: DEEP LEARNING–Methods and Applications
US11961515B2 (en) Contrastive Siamese network for semi-supervised speech recognition
US11132994B1 (en) Multi-domain dialog state tracking
WO2023231513A1 (en) Conversation content generation method and apparatus, and storage medium and terminal
JP7229345B2 (en) Sentence processing method, sentence decoding method, device, program and device
CN109637527A (en) The semantic analytic method and system of conversation sentence
CN112364148A (en) Deep learning method-based generative chat robot
Pieraccini AI assistants
Mathur et al. A scaled‐down neural conversational model for chatbots
Hsueh et al. A Task-oriented Chatbot Based on LSTM and Reinforcement Learning
CN116611459B (en) Translation model training method and device, electronic equipment and storage medium
Wu et al. Multistate encoding with end-to-end speech rnn transducer network
CN114373443A (en) Speech synthesis method and apparatus, computing device, storage medium, and program product
CN112150103B (en) Schedule setting method, schedule setting device and storage medium
CN117980915A (en) Contrast learning and masking modeling for end-to-end self-supervised pre-training
CN115204181A (en) Text detection method and device, electronic equipment and computer readable storage medium
CN117521674B (en) Method, device, computer equipment and storage medium for generating countermeasure information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200707