CN109255020A - A method of talked with using convolution and generates model solution dialogue generation task - Google Patents

A method of talked with using convolution and generates model solution dialogue generation task Download PDF

Info

Publication number
CN109255020A
CN109255020A CN201811057115.9A CN201811057115A CN109255020A CN 109255020 A CN109255020 A CN 109255020A CN 201811057115 A CN201811057115 A CN 201811057115A CN 109255020 A CN109255020 A CN 109255020A
Authority
CN
China
Prior art keywords
word
vector
convolution
output
dimension value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811057115.9A
Other languages
Chinese (zh)
Other versions
CN109255020B (en
Inventor
赵洲
章璇
孟令涛
梁伟欣
金志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201811057115.9A priority Critical patent/CN109255020B/en
Publication of CN109255020A publication Critical patent/CN109255020A/en
Application granted granted Critical
Publication of CN109255020B publication Critical patent/CN109255020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Error Detection And Correction (AREA)

Abstract

The invention discloses a kind of methods for being talked with using convolution and generating model and solving dialogue generation task, include the following steps: the above of the next word for being directed to the dialogue to be generated, the meaning vector of obtained word and the position vector of word are added, obtain the Integrative expression vector of word;It is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, obtains Integrative expression above;The last one word above is converted into the meaning vector of last word, and combines the position vector of last word, the two is added the Integrative expression for obtaining last word;It is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and combines Integrative expression above, obtains next expression that generate word.Talk with present invention utilizes convolution and generate model, can overcome and be led to not in the prior art using Recognition with Recurrent Neural Network using the parallel feature of GPU, and Recognition with Recurrent Neural Network will lead to the problem of gradient disappears.

Description

A method of talked with using convolution and generates model solution dialogue generation task
Technical field
The present invention relates to dialogue generation task technical fields, and in particular to a kind of to generate model solution pair using convolution dialogue The method for talking about generation task.
Background technique
Instantly, the dialogue generation of non task guiding has attracted wide attention, and becomes an important service, but mesh Before have this service effect be not fine.
For existing technology mainly using doing based on Recognition with Recurrent Neural Network, this method, which mainly passes through, utilizes circulation mind The timing meaning possessed through network, to complete the generation of dialogue.But Recognition with Recurrent Neural Network is due to being related to timing, so nothing Method uses the parallel feature of GPU (Graphics Processing Unit, graphics processor).Simultaneously as circulation nerve The chain type derivation of network, causes Recognition with Recurrent Neural Network to be prone to gradient extinction tests.In order to overcome these defects, this method Convolution dialogue will be used to generate model and complete dialogue generation task.
The present invention will obtain current session context first with the convolutional neural networks with attention mechanism module This expression, is input in decoder module by expression later, and the next word talked with needed for obtaining successively carries out, and generates entire Dialogue.
Summary of the invention
It is an object of the invention to solve the problems of the prior art, in order to overcome in the prior art using circulation nerve net Network leads to not using the parallel feature of GPU, and Recognition with Recurrent Neural Network will lead to the problem of gradient disappears, and the present invention provides a kind of benefit Talk with the method for generating model and solving dialogue generation task with convolution.
Specific technical solution of the present invention is:
A method of talked with using convolution and generate model solution dialogue generation task, included the following steps:
1) it is directed to (context) above of the next word for the dialogue to be generated, the word above that carries out is mapped to Corresponding meaning vector (obtaining word expression above), and obtain the position vector of word, later containing obtained word Adopted vector is added with the position vector of word, obtains the Integrative expression vector of word;
The Integrative expression vector for the word that will acquire is input to the coding for combining convolutional layer in conjunction with gate-type linear unit Network obtains Integrative expression above;
2) by the last one word (word that last time generates, letter above of the next word for the dialogue to be generated Claim last word) it is converted into the meaning vector (expression for obtaining last word) of last word, and combine the position of last word Vector, the two are added the Integrative expression for obtaining last word;
The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and In conjunction with the Integrative expression above that step 1) obtains, obtains next expression that generate word and (obtained using the expression next A word to be generated);
3) it by training, obtains final convolution dialogue and generates model, required context can be generated using the model Dialogue.
In step 1), the meaning vector of the word is wc={ wc1,...,wcn, wcFor c-th of word meaning to Amount, wc1For c-th of word meaning to 1 dimension value of flow control, wcnFor the n-th dimension value of meaning vector of c-th of word;
The position vector of the word is pc={ pc1,...,pcn, pcFor the position vector of c-th of word, pc1For c The 1st dimension value of position vector of a word, pcnFor the n-th dimension value of position vector of c-th of word;
The Integrative expression vector o of the wordc={ oc1,...,ocn, ocFor the Integrative expression vector of c-th of word, oc1For c-th of word Integrative expression to 1 dimension value of flow control, ocnFor the n-th dimension value of Integrative expression vector of c-th of word.
In step 1), the Integrative expression vector for the word that will acquire, which is input to, combines convolutional layer and gate-type linear unit knot The coding network of conjunction obtains Integrative expression above, specifically includes:
1.1) by the Integrative expression vector o of wordc={ oc1,...,ocnCirculation be input in m convolution module, utilize this M convolution module obtains Integrative expression vector q abovem;Each convolution module is by a convolutional calculation in m convolution module Operation is formed with the operation of NONLINEAR CALCULATION, convolutional calculation operation can be generated according to following formula two column d dimensional vector Y=[A, B]∈R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X) Convolution operation is represented, X represents the input mapping expression vector of convolutional calculation operation, WmIt represents in m-th of convolutional calculation operation Weight matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d
1.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 1.1) convolution operation2dIn Two column d dimensional vector B obtain the output g=δ (B) of information flow momentum in control network, the output in conjunction with door operation function δ (B) Next neuron will be transmitted to;Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, in conjunction with The output g=δ (B) of information flow momentum, exports according to the convolution module that following formula obtains encoder in the control network of generation,
Wherein,Represent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is fixed The good parameter (such as 3,5,7 etc. can be determined) of justice,Represent the i-th dimension of the output of the m-1 encoder convolution module Value;
By the continuous operation of m convolution module, Integrative expression q above can be obtainedm
In step 2), the meaning vector of the last word is ww={ ww1,...,wwn, wwFor the meaning of last word Vector, wwFor last word meaning to 1 dimension value of flow control, wwnFor the n-th dimension value of meaning vector of last word;
The position vector of the last word is pw={ pw1,...,pwn, pwFor the position vector of last word, pw1For The 1st dimension value of position vector of last word, pwnFor the n-th dimension value of position vector of last word;
The Integrative expression of the last word is ow={ ow1,...,own, owFor the Integrative expression vector of last word, ow1For last word Integrative expression to 1 dimension value of flow control, ownFor the n-th dimension value of Integrative expression vector of last word.
The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and In conjunction with the Integrative expression above that step 1) obtains, obtains next expression that generate word and (obtained using the expression next A word to be generated), it specifically includes:
2.1) by the Integrative expression o of last wordw={ ow1,...,ownCirculation is input to and m identical in encoder is a In convolution module, r is expressed using the prediction that this m convolution module obtains next word to be generatedm;Each convolution module It is operated to operate with a NONLINEAR CALCULATION by a convolutional calculation and be formed, convolutional calculation operation can generate two column according to following formula D dimensional vector Y=[A, B] ∈ R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X) Convolution operation is represented, X represents the input mapping expression vector of convolutional calculation operation, WmIt represents in m-th of convolutional calculation operation Weight matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d
2.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 2.1) convolution operation2dIn Two column d dimensional vector B obtain the output g=δ (B) of information flow momentum in control network, the output in conjunction with door operation function δ (B) Next neuron will be transmitted to;
Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, in conjunction with the control network of generation The output g=δ (B) of middle information flow momentum is exported according to the convolution module that following formula obtains encoder;
Wherein ri mRepresent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is definition A good parameter (such as 3,5,7 etc. can be determined), ri m-1Represent the i-th dimension value of the output of the m-1 encoder convolution module;
2.3) following formula is utilized, in conjunction with the i-th dimension value r of the output of m-th of convolution module of decoderi m, obtain the decoding Device convolution module corresponds to the i-th dimension value of attention mechanism output
Wherein,Weight matrix is represented,Represent bias vector, giRepresentation parameter coefficient (giIt can be manually set);
It is exported later using the available correspondence attention mechanism corresponding to m-th of convolution module of decoder of following formula I-th dimension valueIn conjunction with the jth dimension value in m-th of convolution module output of encoder For Integrative expression in step 1) to Measure qmJth dimension value obtains corresponding activation parameter
The jth dimension value of encoder overall output is combined laterIn conjunction with word in encoder step 1) Integrative expression to Measure oc={ oc1,...,ocnJth dimension value ocj, ocjFor the Integrative expression vector jth dimension value of c-th of word, decoder is obtained The i-th dimension value activating part add-ins of m-th of convolution module output
The i-th dimension value activating part add-ins that m-th of convolution module of decoder of generation is exportedWith m-th of decoder The i-th dimension value r of convolution module outputi mIt is added, by the circular treatment of m convolution module, obtains final decoder output rm
2.3) by by the output r of decoderm, it is input in softmax function, will be generated according to the acquisition of following formula Next word probability,
p(yi+1|y1,...,yi)=softmax (Worm+bo)
Wherein, WoRepresent weight matrix, boBias vector is represented, softmax () represents softmax function, general using this Rate output finds the corresponding word of maximum probability as the next word output of dialogue generated.p(yi+1|y1,...,yi) be The probability of next word, yi+1|y1,...,yiIn, yi+1Indicate i+1 word, y1To indicate the 1st word, yiIndicate the I word.
Compared with prior art, the present invention has the advantage that
The present invention generates the method that model solves dialogue generation task using convolution dialogue, generates compared to general dialogue Solution talks with present invention utilizes convolution and generates model, and can overcome is caused using Recognition with Recurrent Neural Network in the prior art The parallel feature of GPU can not be utilized, and Recognition with Recurrent Neural Network will lead to the problem of gradient disappears.The present invention is in dialogue generation task Acquired effect is more preferable compared to traditional method.
Detailed description of the invention
Fig. 1 is the flow diagram that the present invention generates that model solves the method for dialogue generation task using convolution dialogue.
Specific embodiment
As shown in Figure 1, a kind of talk with the method for generating model and solving dialogue generation task using convolution, including walk as follows It is rapid:
1) it is directed to (context) above of the next word for the dialogue to be generated, the word above that carries out is mapped to Corresponding meaning vector (obtaining word expression above), and obtain the position vector of word, later containing obtained word Adopted vector is added with the expression of the position vector of word, obtains the Integrative expression vector of word;
The Integrative expression vector for the word that will acquire is input to the coding for combining convolutional layer in conjunction with gate-type linear unit Network obtains Integrative expression above;
In step 1), the meaning vector of word is wc={ wc1,...,wcn, wcFor the meaning vector of c-th of word, wc1For The meaning of c-th of word is to 1 dimension value of flow control, wcnFor the n-th dimension value of meaning vector of c-th of word;
The position vector of word is pc={ pc1,...,pcn, pcFor the position vector of c-th of word, pc1For c-th of word The 1st dimension value of position vector, pcnFor the n-th dimension value of position vector of c-th of word;
The Integrative expression vector o of wordc={ oc1,...,ocn, ocFor the Integrative expression vector of c-th of word, oc1For c The Integrative expression of a word is to 1 dimension value of flow control, ocnFor the n-th dimension value of Integrative expression vector of c-th of word.
The Integrative expression vector for the word that will acquire is input to the coding for combining convolutional layer in conjunction with gate-type linear unit Network obtains Integrative expression above, specifically includes:
1.1) by the Integrative expression vector o of wordc={ oc1,...,ocnCirculation be input in m convolution module, utilize this M convolution module obtains Integrative expression vector q abovem;Each convolution module is by a convolutional calculation in m convolution module Operation is formed with the operation of NONLINEAR CALCULATION, convolutional calculation operation can be generated according to following formula two column d dimensional vector Y=[A, B]∈R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X) Convolution operation is represented, X represents the input mapping expression vector of convolutional calculation operation, WmIt represents in m-th of convolutional calculation operation Weight matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d
1.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 1.1) convolution operation2dIn Two column d dimensional vector B obtain the output g=δ (B) of information flow momentum in control network, the output in conjunction with door operation function δ (B) Next neuron will be transmitted to;Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, in conjunction with The output g=δ (B) of information flow momentum, exports according to the convolution module that following formula obtains encoder in the control network of generation,
Wherein,Represent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is fixed The good parameter (such as 3,5,7 etc. can be determined) of justice,Represent the i-th dimension of the output of the m-1 encoder convolution module Value;
By the continuous operation of m convolution module, Integrative expression q above can be obtainedm
2) by the last one word above of next word of the dialogue to be generated (word of last time generation, Referred to as last word) it is converted into the meaning vector (expression for obtaining last word) of last word, and combine the position of last word Vector is set, the two is added the Integrative expression for obtaining last word;
The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and In conjunction with the Integrative expression above that step 1) obtains, obtains next expression that generate word and (obtained using the expression next A word to be generated);
In step 2), the meaning vector of last word is ww={ ww1,...,wwn, wwFor the meaning vector of last word, wwFor last word meaning to 1 dimension value of flow control, wwnFor the n-th dimension value of meaning vector of last word;
The position vector of last word is pw={ pw1,...,pwn, pwFor the position vector of last word, pw1It is last single The 1st dimension value of position vector of word, pwnFor the n-th dimension value of position vector of last word;
The Integrative expression of last word is ow={ ow1,...,own, owFor the Integrative expression vector of last word, ow1For most The Integrative expression of word is to 1 dimension value of flow control, o afterwardswnFor the n-th dimension value of Integrative expression vector of last word.
The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and In conjunction with the Integrative expression above that step 1) obtains, obtains next expression that generate word and (obtained using the expression next A word to be generated), it specifically includes:
2.1) by the Integrative expression o of last wordw={ ow1,...,ownCirculation is input to and m identical in encoder is a In convolution module, r is expressed using the prediction that this m convolution module obtains next word to be generatedm;Each convolution module It is operated to operate with a NONLINEAR CALCULATION by a convolutional calculation and be formed, convolutional calculation operation can generate two column according to following formula D dimensional vector Y=[A, B] ∈ R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X) Convolution operation is represented, X represents the input mapping expression vector of convolutional calculation operation, WmIt represents in m-th of convolutional calculation operation Weight matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d
2.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 2.1) convolution operation2dIn Two column d dimensional vector B obtain the output g=δ (B) of information flow momentum in control network, the output in conjunction with door operation function δ (B) Next neuron will be transmitted to;
Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, in conjunction with the control network of generation The output g=δ (B) of middle information flow momentum is exported according to the convolution module that following formula obtains encoder;
Wherein ri mRepresent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is definition A good parameter (such as 3,5,7 etc. can be determined), ri m-1Represent the i-th dimension value of the output of the m-1 encoder convolution module;
2.3) following formula is utilized, in conjunction with the i-th dimension value r of the output of m-th of convolution module of decoderi m, obtain the decoding Device convolution module corresponds to the i-th dimension value of attention mechanism output
Wherein,Weight matrix is represented,Represent bias vector, giRepresentation parameter coefficient (giIt can be manually set);
It is exported later using the available correspondence attention mechanism corresponding to m-th of convolution module of decoder of following formula I-th dimension valueIn conjunction with the jth dimension value in m-th of convolution module output of encoder For Integrative expression in step 1) to Measure qmJth dimension value obtains corresponding activation parameter
The jth dimension value of encoder overall output is combined laterIn conjunction with word in encoder step 1) Integrative expression to Measure oc={ oc1,...,ocnJth dimension value ocj, ocjFor the Integrative expression vector jth dimension value of c-th of word, decoder is obtained The i-th dimension value activating part add-ins of m-th of convolution module output
The i-th dimension value activating part add-ins that m-th of convolution module of decoder of generation is exportedWith decoder m The i-th dimension value r of a convolution module outputi mIt is added, by the circular treatment of m convolution module, obtains final decoder output rm
2.3) by by the output r of decoderm, it is input in softmax function, will be generated according to the acquisition of following formula Next word probability,
p(yi+1|y1,...,yi)=softmax (Worm+bo)
Wherein, WoRepresent weight matrix, boBias vector is represented, softmax () represents softmax function, general using this Rate output finds the corresponding word of maximum probability as the next word output of dialogue generated.p(yi+1|y1,...,yi) be The probability of next word, yi+1|y1,...,yiIn, yi+1Indicate i+1 word, y1To indicate the 1st word, yiIndicate the I word.
3) it by training, obtains final convolution dialogue and generates model, required context can be generated using the model Dialogue.
The above method is applied in the following example below, it is specific in embodiment to embody technical effect of the invention Step repeats no more.
Embodiment
The present invention tests on DailyDialog data set.In order to objectively evaluate the performance of algorithm of the invention, The present invention has used Average, Greedy, Extrema, these four evaluations of Training Time in selected test set Standard evaluates effect of the invention.According to step described in specific embodiment, resulting experimental result is such as Shown in table 1, the present invention is directed to Average, Greedy, Extrema, the test result of tetra- kinds of standards of Training Time, sheet Method is expressed as ConvTalker.
Table 1

Claims (5)

1. a kind of talk with the method for generating model and solving dialogue generation task using convolution, which comprises the steps of:
1) be directed to the above of the next word for the dialogue to be generated, will above carry out word be mapped to corresponding meaning to Amount, and the position vector of word is obtained, the meaning vector of obtained word is added with the position vector of word later, is obtained single The Integrative expression vector of word;
The Integrative expression vector for the word that will acquire is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, Obtain Integrative expression above;
2) the last one word above of the next word for the dialogue to be generated is converted into the meaning vector of last word, and In conjunction with the position vector of last word, the two is added the Integrative expression for obtaining last word;
The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and is combined The Integrative expression above that step 1) obtains obtains next expression that generate word;
3) it by training, obtains final convolution dialogue and generates model, the context needed for being generated using the model is talked with.
2. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists In in step 1), the meaning vector of the word is wc={ wc1,...,wcn, wcFor the meaning vector of c-th of word, wc1 For c-th of word meaning to 1 dimension value of flow control, wcnFor the n-th dimension value of meaning vector of c-th of word;
The position vector of the word is pc={ pc1,...,pcn, pcFor the position vector of c-th of word, pc1It is single for c-th The 1st dimension value of position vector of word, pcnFor the n-th dimension value of position vector of c-th of word;
The Integrative expression vector o of the wordc={ oc1,...,ocn, ocFor the Integrative expression vector of c-th of word, oc1For The Integrative expression of c-th of word is to 1 dimension value of flow control, ocnFor the n-th dimension value of Integrative expression vector of c-th of word.
3. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists In in step 1), the Integrative expression vector for the word that will acquire, which is input to, combines convolutional layer in conjunction with gate-type linear unit Coding network obtains Integrative expression above, specifically includes:
1.1) by the Integrative expression vector o of wordc={ oc1,...,ocnCirculation is input in m convolution module, utilize this m a Convolution module obtains Integrative expression vector q abovem;Each convolution module is grasped by a convolutional calculation in m convolution module Make to form with a NONLINEAR CALCULATION operation, convolutional calculation operation can generate two column d dimensional vector Y=[A, B] according to following formula ∈R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X) it represents Convolution operation, X represent the input mapping expression vector of convolutional calculation operation, WmRepresent the weight in m-th of convolutional calculation operation Matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d
1.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 1.1) convolution operation2dIn secondary series d Dimensional vector B obtains the output g=δ (B) of information flow momentum in control network in conjunction with door operation function δ (B), which will transmit To next neuron;
Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, believe in the control network in conjunction with generation The output g=δ (B) for ceasing amount of flow is exported according to the convolution module that following formula obtains encoder,
Wherein,Represent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is fixed The good parameter (such as 3,5,7 etc. can be determined) of justice,Represent the i-th dimension of the output of the m-1 encoder convolution module Value;
By the continuous operation of m convolution module, Integrative expression q above can be obtainedm
4. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists In in step 2), the meaning vector of the last word is ww={ ww1,...,wwn, wwFor the meaning vector of last word, wwFor last word meaning to 1 dimension value of flow control, wwnFor the n-th dimension value of meaning vector of last word;
The position vector of the last word is pw={ pw1,...,pwn, pwFor the position vector of last word, pw1It is last The 1st dimension value of position vector of word, pwnFor the n-th dimension value of position vector of last word;
The Integrative expression of the last word is ow={ ow1,...,own, owFor the Integrative expression vector of last word, ow1For The Integrative expression of last word is to 1 dimension value of flow control, ownFor the n-th dimension value of Integrative expression vector of last word.
5. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists In the Integrative expression of last word being input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and combine The Integrative expression above that step 1) obtains obtains next expression that generate word, specifically includes:
2.1) by the Integrative expression o of last wordw={ ow1,...,ownCirculation be input to the identical m convolution with encoder In module, r is expressed using the prediction that this m convolution module obtains next word to be generatedm;Each convolution module is by one A convolutional calculation operation is formed with a NONLINEAR CALCULATION operation, and convolutional calculation operation can generate two column d dimensions according to following formula Vector Y=[A, B] ∈ R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X) it represents Convolution operation, X represent the input mapping expression vector of convolutional calculation operation, WmRepresent the weight in m-th of convolutional calculation operation Matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d
2.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 2.1) convolution operation2dIn secondary series d Dimensional vector B obtains the output g=δ (B) of information flow momentum in control network in conjunction with door operation function δ (B), which will transmit To next neuron;
Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, believe in the control network in conjunction with generation The output g=δ (B) for ceasing amount of flow is exported according to the convolution module that following formula obtains encoder;
Wherein ri mRepresent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is definition A good parameter, ri m-1Represent the i-th dimension value of the output of the m-1 encoder convolution module;
2.3) following formula is utilized, in conjunction with the i-th dimension value r of the output of m-th of convolution module of decoderi m, obtain decoder volume Volume module corresponds to the i-th dimension value of attention mechanism output
Wherein,Weight matrix is represented,Represent bias vector, giRepresentation parameter coefficient;
The of the available correspondence attention mechanism output corresponding to m-th of convolution module of decoder of following formula is utilized later I dimension valueIn conjunction with the jth dimension value in m-th of convolution module output of encoder For Integrative expression vector q in step 1)m Jth dimension value obtains corresponding activation parameter
The jth dimension value of encoder overall output is combined laterIn conjunction with the Integrative expression vector o of word in encoder step 1)c ={ oc1,...,ocnJth dimension value ocj, ocjFor the Integrative expression vector jth dimension value of c-th of word, decoder m is obtained The i-th dimension value activating part add-ins of a convolution module output
The i-th dimension value activating part add-ins that m-th of convolution module of decoder of generation is exportedWith m-th of convolution of decoder The i-th dimension value r of module outputi mIt is added, by the circular treatment of m convolution module, obtains final decoder output rm
2.3) by by the output r of decoderm, it is input in softmax function, will be generated down according to the acquisition of following formula The probability of one word,
p(yi+1|y1,...,yi)=softmax (Worm+bo)
Wherein, WoRepresent weight matrix, boBias vector is represented, softmax () represents softmax function, defeated using the probability Out, the corresponding word of maximum probability is found as the next word output of dialogue generated.
CN201811057115.9A 2018-09-11 2018-09-11 Method for solving dialogue generation task by using convolution dialogue generation model Active CN109255020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811057115.9A CN109255020B (en) 2018-09-11 2018-09-11 Method for solving dialogue generation task by using convolution dialogue generation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811057115.9A CN109255020B (en) 2018-09-11 2018-09-11 Method for solving dialogue generation task by using convolution dialogue generation model

Publications (2)

Publication Number Publication Date
CN109255020A true CN109255020A (en) 2019-01-22
CN109255020B CN109255020B (en) 2022-04-01

Family

ID=65046678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811057115.9A Active CN109255020B (en) 2018-09-11 2018-09-11 Method for solving dialogue generation task by using convolution dialogue generation model

Country Status (1)

Country Link
CN (1) CN109255020B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196928A (en) * 2019-05-17 2019-09-03 北京邮电大学 Fully parallelized end-to-end more wheel conversational systems and method with field scalability

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140201126A1 (en) * 2012-09-15 2014-07-17 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN107273487A (en) * 2017-06-13 2017-10-20 北京百度网讯科技有限公司 Generation method, device and the computer equipment of chat data based on artificial intelligence
CN107506823A (en) * 2017-08-22 2017-12-22 南京大学 A kind of construction method for being used to talk with the hybrid production style of generation
CN107590153A (en) * 2016-07-08 2018-01-16 微软技术许可有限责任公司 Use the dialogue correlation modeling of convolutional neural networks
US20180060301A1 (en) * 2016-08-31 2018-03-01 Microsoft Technology Licensing, Llc End-to-end learning of dialogue agents for information access
CN107980130A (en) * 2017-11-02 2018-05-01 深圳前海达闼云端智能科技有限公司 It is automatic to answer method, apparatus, storage medium and electronic equipment
CN108388944A (en) * 2017-11-30 2018-08-10 中国科学院计算技术研究所 LSTM neural network chips and its application method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140201126A1 (en) * 2012-09-15 2014-07-17 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
CN107590153A (en) * 2016-07-08 2018-01-16 微软技术许可有限责任公司 Use the dialogue correlation modeling of convolutional neural networks
US20180060301A1 (en) * 2016-08-31 2018-03-01 Microsoft Technology Licensing, Llc End-to-end learning of dialogue agents for information access
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN107273487A (en) * 2017-06-13 2017-10-20 北京百度网讯科技有限公司 Generation method, device and the computer equipment of chat data based on artificial intelligence
CN107506823A (en) * 2017-08-22 2017-12-22 南京大学 A kind of construction method for being used to talk with the hybrid production style of generation
CN107980130A (en) * 2017-11-02 2018-05-01 深圳前海达闼云端智能科技有限公司 It is automatic to answer method, apparatus, storage medium and electronic equipment
CN108388944A (en) * 2017-11-30 2018-08-10 中国科学院计算技术研究所 LSTM neural network chips and its application method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MIN YANG ET AL.: "Investigating Deep Reinforcement Learning Techniques in Personalized Dialogue Generation", 《PROCEEDINGS OF THE 2018 SIAM INTERNATIONAL CONFERENCE ON DATA MINING》 *
XIAOYU SHEN ET AL.: "Improving Variational Encoder-Decoders in Dialogue Generation", 《THE THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 *
贾熹滨 等: "智能对话系统研究综述", 《北京工业大学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196928A (en) * 2019-05-17 2019-09-03 北京邮电大学 Fully parallelized end-to-end more wheel conversational systems and method with field scalability
CN110196928B (en) * 2019-05-17 2021-03-30 北京邮电大学 Fully parallelized end-to-end multi-turn dialogue system with domain expansibility and method

Also Published As

Publication number Publication date
CN109255020B (en) 2022-04-01

Similar Documents

Publication Publication Date Title
Shlezinger et al. UVeQFed: Universal vector quantization for federated learning
Liu et al. Deep neural network architectures for modulation classification
WO2020258668A1 (en) Facial image generation method and apparatus based on adversarial network model, and nonvolatile readable storage medium and computer device
CN109919204B (en) Noise image-oriented deep learning clustering method
CN109271646A (en) Text interpretation method, device, readable storage medium storing program for executing and computer equipment
CN109906460A (en) Dynamic cooperation attention network for question and answer
CN108846323A (en) A kind of convolutional neural networks optimization method towards Underwater Targets Recognition
CN108829756B (en) Method for solving multi-turn video question and answer by using hierarchical attention context network
Xu et al. Deep neural network self-distillation exploiting data representation invariance
CN110349588A (en) A kind of LSTM network method for recognizing sound-groove of word-based insertion
CN108446766A (en) A kind of method of quick trained storehouse own coding deep neural network
CN112289338B (en) Signal processing method and device, computer equipment and readable storage medium
Hahne et al. Attention on abstract visual reasoning
CN112233012A (en) Face generation system and method
CN114445420A (en) Image segmentation model with coding and decoding structure combined with attention mechanism and training method thereof
Li et al. Detection of multiple steganography methods in compressed speech based on code element embedding, Bi-LSTM and CNN with attention mechanisms
CN107547088A (en) Enhanced self-adapted segmentation orthogonal matching pursuit method based on compressed sensing
CN113283577A (en) Industrial parallel data generation method based on meta-learning and generation countermeasure network
Kim et al. WaveNODE: A continuous normalizing flow for speech synthesis
CN109255020A (en) A method of talked with using convolution and generates model solution dialogue generation task
CN115054270A (en) Sleep staging method and system for extracting sleep spectrogram features based on GCN
CN112380843B (en) Random disturbance network-based open answer generation method
CN116306780B (en) Dynamic graph link generation method
Le et al. Data selection for acoustic emotion recognition: Analyzing and comparing utterance and sub-utterance selection strategies
CN113361505B (en) Non-specific human sign language translation method and system based on contrast decoupling element learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant