CN109255020A - A method of talked with using convolution and generates model solution dialogue generation task - Google Patents
A method of talked with using convolution and generates model solution dialogue generation task Download PDFInfo
- Publication number
- CN109255020A CN109255020A CN201811057115.9A CN201811057115A CN109255020A CN 109255020 A CN109255020 A CN 109255020A CN 201811057115 A CN201811057115 A CN 201811057115A CN 109255020 A CN109255020 A CN 109255020A
- Authority
- CN
- China
- Prior art keywords
- word
- vector
- convolution
- output
- dimension value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Error Detection And Correction (AREA)
Abstract
The invention discloses a kind of methods for being talked with using convolution and generating model and solving dialogue generation task, include the following steps: the above of the next word for being directed to the dialogue to be generated, the meaning vector of obtained word and the position vector of word are added, obtain the Integrative expression vector of word;It is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, obtains Integrative expression above;The last one word above is converted into the meaning vector of last word, and combines the position vector of last word, the two is added the Integrative expression for obtaining last word;It is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and combines Integrative expression above, obtains next expression that generate word.Talk with present invention utilizes convolution and generate model, can overcome and be led to not in the prior art using Recognition with Recurrent Neural Network using the parallel feature of GPU, and Recognition with Recurrent Neural Network will lead to the problem of gradient disappears.
Description
Technical field
The present invention relates to dialogue generation task technical fields, and in particular to a kind of to generate model solution pair using convolution dialogue
The method for talking about generation task.
Background technique
Instantly, the dialogue generation of non task guiding has attracted wide attention, and becomes an important service, but mesh
Before have this service effect be not fine.
For existing technology mainly using doing based on Recognition with Recurrent Neural Network, this method, which mainly passes through, utilizes circulation mind
The timing meaning possessed through network, to complete the generation of dialogue.But Recognition with Recurrent Neural Network is due to being related to timing, so nothing
Method uses the parallel feature of GPU (Graphics Processing Unit, graphics processor).Simultaneously as circulation nerve
The chain type derivation of network, causes Recognition with Recurrent Neural Network to be prone to gradient extinction tests.In order to overcome these defects, this method
Convolution dialogue will be used to generate model and complete dialogue generation task.
The present invention will obtain current session context first with the convolutional neural networks with attention mechanism module
This expression, is input in decoder module by expression later, and the next word talked with needed for obtaining successively carries out, and generates entire
Dialogue.
Summary of the invention
It is an object of the invention to solve the problems of the prior art, in order to overcome in the prior art using circulation nerve net
Network leads to not using the parallel feature of GPU, and Recognition with Recurrent Neural Network will lead to the problem of gradient disappears, and the present invention provides a kind of benefit
Talk with the method for generating model and solving dialogue generation task with convolution.
Specific technical solution of the present invention is:
A method of talked with using convolution and generate model solution dialogue generation task, included the following steps:
1) it is directed to (context) above of the next word for the dialogue to be generated, the word above that carries out is mapped to
Corresponding meaning vector (obtaining word expression above), and obtain the position vector of word, later containing obtained word
Adopted vector is added with the position vector of word, obtains the Integrative expression vector of word;
The Integrative expression vector for the word that will acquire is input to the coding for combining convolutional layer in conjunction with gate-type linear unit
Network obtains Integrative expression above;
2) by the last one word (word that last time generates, letter above of the next word for the dialogue to be generated
Claim last word) it is converted into the meaning vector (expression for obtaining last word) of last word, and combine the position of last word
Vector, the two are added the Integrative expression for obtaining last word;
The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and
In conjunction with the Integrative expression above that step 1) obtains, obtains next expression that generate word and (obtained using the expression next
A word to be generated);
3) it by training, obtains final convolution dialogue and generates model, required context can be generated using the model
Dialogue.
In step 1), the meaning vector of the word is wc={ wc1,...,wcn, wcFor c-th of word meaning to
Amount, wc1For c-th of word meaning to 1 dimension value of flow control, wcnFor the n-th dimension value of meaning vector of c-th of word;
The position vector of the word is pc={ pc1,...,pcn, pcFor the position vector of c-th of word, pc1For c
The 1st dimension value of position vector of a word, pcnFor the n-th dimension value of position vector of c-th of word;
The Integrative expression vector o of the wordc={ oc1,...,ocn, ocFor the Integrative expression vector of c-th of word,
oc1For c-th of word Integrative expression to 1 dimension value of flow control, ocnFor the n-th dimension value of Integrative expression vector of c-th of word.
In step 1), the Integrative expression vector for the word that will acquire, which is input to, combines convolutional layer and gate-type linear unit knot
The coding network of conjunction obtains Integrative expression above, specifically includes:
1.1) by the Integrative expression vector o of wordc={ oc1,...,ocnCirculation be input in m convolution module, utilize this
M convolution module obtains Integrative expression vector q abovem;Each convolution module is by a convolutional calculation in m convolution module
Operation is formed with the operation of NONLINEAR CALCULATION, convolutional calculation operation can be generated according to following formula two column d dimensional vector Y=[A,
B]∈R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X)
Convolution operation is represented, X represents the input mapping expression vector of convolutional calculation operation, WmIt represents in m-th of convolutional calculation operation
Weight matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d;
1.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 1.1) convolution operation2dIn
Two column d dimensional vector B obtain the output g=δ (B) of information flow momentum in control network, the output in conjunction with door operation function δ (B)
Next neuron will be transmitted to;Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, in conjunction with
The output g=δ (B) of information flow momentum, exports according to the convolution module that following formula obtains encoder in the control network of generation,
Wherein,Represent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is fixed
The good parameter (such as 3,5,7 etc. can be determined) of justice,Represent the i-th dimension of the output of the m-1 encoder convolution module
Value;
By the continuous operation of m convolution module, Integrative expression q above can be obtainedm。
In step 2), the meaning vector of the last word is ww={ ww1,...,wwn, wwFor the meaning of last word
Vector, wwFor last word meaning to 1 dimension value of flow control, wwnFor the n-th dimension value of meaning vector of last word;
The position vector of the last word is pw={ pw1,...,pwn, pwFor the position vector of last word, pw1For
The 1st dimension value of position vector of last word, pwnFor the n-th dimension value of position vector of last word;
The Integrative expression of the last word is ow={ ow1,...,own, owFor the Integrative expression vector of last word,
ow1For last word Integrative expression to 1 dimension value of flow control, ownFor the n-th dimension value of Integrative expression vector of last word.
The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and
In conjunction with the Integrative expression above that step 1) obtains, obtains next expression that generate word and (obtained using the expression next
A word to be generated), it specifically includes:
2.1) by the Integrative expression o of last wordw={ ow1,...,ownCirculation is input to and m identical in encoder is a
In convolution module, r is expressed using the prediction that this m convolution module obtains next word to be generatedm;Each convolution module
It is operated to operate with a NONLINEAR CALCULATION by a convolutional calculation and be formed, convolutional calculation operation can generate two column according to following formula
D dimensional vector Y=[A, B] ∈ R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X)
Convolution operation is represented, X represents the input mapping expression vector of convolutional calculation operation, WmIt represents in m-th of convolutional calculation operation
Weight matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d;
2.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 2.1) convolution operation2dIn
Two column d dimensional vector B obtain the output g=δ (B) of information flow momentum in control network, the output in conjunction with door operation function δ (B)
Next neuron will be transmitted to;
Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, in conjunction with the control network of generation
The output g=δ (B) of middle information flow momentum is exported according to the convolution module that following formula obtains encoder;
Wherein ri mRepresent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is definition
A good parameter (such as 3,5,7 etc. can be determined), ri m-1Represent the i-th dimension value of the output of the m-1 encoder convolution module;
2.3) following formula is utilized, in conjunction with the i-th dimension value r of the output of m-th of convolution module of decoderi m, obtain the decoding
Device convolution module corresponds to the i-th dimension value of attention mechanism output
Wherein,Weight matrix is represented,Represent bias vector, giRepresentation parameter coefficient (giIt can be manually set);
It is exported later using the available correspondence attention mechanism corresponding to m-th of convolution module of decoder of following formula
I-th dimension valueIn conjunction with the jth dimension value in m-th of convolution module output of encoder For Integrative expression in step 1) to
Measure qmJth dimension value obtains corresponding activation parameter
The jth dimension value of encoder overall output is combined laterIn conjunction with word in encoder step 1) Integrative expression to
Measure oc={ oc1,...,ocnJth dimension value ocj, ocjFor the Integrative expression vector jth dimension value of c-th of word, decoder is obtained
The i-th dimension value activating part add-ins of m-th of convolution module output
The i-th dimension value activating part add-ins that m-th of convolution module of decoder of generation is exportedWith m-th of decoder
The i-th dimension value r of convolution module outputi mIt is added, by the circular treatment of m convolution module, obtains final decoder output rm;
2.3) by by the output r of decoderm, it is input in softmax function, will be generated according to the acquisition of following formula
Next word probability,
p(yi+1|y1,...,yi)=softmax (Worm+bo)
Wherein, WoRepresent weight matrix, boBias vector is represented, softmax () represents softmax function, general using this
Rate output finds the corresponding word of maximum probability as the next word output of dialogue generated.p(yi+1|y1,...,yi) be
The probability of next word, yi+1|y1,...,yiIn, yi+1Indicate i+1 word, y1To indicate the 1st word, yiIndicate the
I word.
Compared with prior art, the present invention has the advantage that
The present invention generates the method that model solves dialogue generation task using convolution dialogue, generates compared to general dialogue
Solution talks with present invention utilizes convolution and generates model, and can overcome is caused using Recognition with Recurrent Neural Network in the prior art
The parallel feature of GPU can not be utilized, and Recognition with Recurrent Neural Network will lead to the problem of gradient disappears.The present invention is in dialogue generation task
Acquired effect is more preferable compared to traditional method.
Detailed description of the invention
Fig. 1 is the flow diagram that the present invention generates that model solves the method for dialogue generation task using convolution dialogue.
Specific embodiment
As shown in Figure 1, a kind of talk with the method for generating model and solving dialogue generation task using convolution, including walk as follows
It is rapid:
1) it is directed to (context) above of the next word for the dialogue to be generated, the word above that carries out is mapped to
Corresponding meaning vector (obtaining word expression above), and obtain the position vector of word, later containing obtained word
Adopted vector is added with the expression of the position vector of word, obtains the Integrative expression vector of word;
The Integrative expression vector for the word that will acquire is input to the coding for combining convolutional layer in conjunction with gate-type linear unit
Network obtains Integrative expression above;
In step 1), the meaning vector of word is wc={ wc1,...,wcn, wcFor the meaning vector of c-th of word, wc1For
The meaning of c-th of word is to 1 dimension value of flow control, wcnFor the n-th dimension value of meaning vector of c-th of word;
The position vector of word is pc={ pc1,...,pcn, pcFor the position vector of c-th of word, pc1For c-th of word
The 1st dimension value of position vector, pcnFor the n-th dimension value of position vector of c-th of word;
The Integrative expression vector o of wordc={ oc1,...,ocn, ocFor the Integrative expression vector of c-th of word, oc1For c
The Integrative expression of a word is to 1 dimension value of flow control, ocnFor the n-th dimension value of Integrative expression vector of c-th of word.
The Integrative expression vector for the word that will acquire is input to the coding for combining convolutional layer in conjunction with gate-type linear unit
Network obtains Integrative expression above, specifically includes:
1.1) by the Integrative expression vector o of wordc={ oc1,...,ocnCirculation be input in m convolution module, utilize this
M convolution module obtains Integrative expression vector q abovem;Each convolution module is by a convolutional calculation in m convolution module
Operation is formed with the operation of NONLINEAR CALCULATION, convolutional calculation operation can be generated according to following formula two column d dimensional vector Y=[A,
B]∈R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X)
Convolution operation is represented, X represents the input mapping expression vector of convolutional calculation operation, WmIt represents in m-th of convolutional calculation operation
Weight matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d;
1.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 1.1) convolution operation2dIn
Two column d dimensional vector B obtain the output g=δ (B) of information flow momentum in control network, the output in conjunction with door operation function δ (B)
Next neuron will be transmitted to;Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, in conjunction with
The output g=δ (B) of information flow momentum, exports according to the convolution module that following formula obtains encoder in the control network of generation,
Wherein,Represent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is fixed
The good parameter (such as 3,5,7 etc. can be determined) of justice,Represent the i-th dimension of the output of the m-1 encoder convolution module
Value;
By the continuous operation of m convolution module, Integrative expression q above can be obtainedm。
2) by the last one word above of next word of the dialogue to be generated (word of last time generation,
Referred to as last word) it is converted into the meaning vector (expression for obtaining last word) of last word, and combine the position of last word
Vector is set, the two is added the Integrative expression for obtaining last word;
The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and
In conjunction with the Integrative expression above that step 1) obtains, obtains next expression that generate word and (obtained using the expression next
A word to be generated);
In step 2), the meaning vector of last word is ww={ ww1,...,wwn, wwFor the meaning vector of last word,
wwFor last word meaning to 1 dimension value of flow control, wwnFor the n-th dimension value of meaning vector of last word;
The position vector of last word is pw={ pw1,...,pwn, pwFor the position vector of last word, pw1It is last single
The 1st dimension value of position vector of word, pwnFor the n-th dimension value of position vector of last word;
The Integrative expression of last word is ow={ ow1,...,own, owFor the Integrative expression vector of last word, ow1For most
The Integrative expression of word is to 1 dimension value of flow control, o afterwardswnFor the n-th dimension value of Integrative expression vector of last word.
The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and
In conjunction with the Integrative expression above that step 1) obtains, obtains next expression that generate word and (obtained using the expression next
A word to be generated), it specifically includes:
2.1) by the Integrative expression o of last wordw={ ow1,...,ownCirculation is input to and m identical in encoder is a
In convolution module, r is expressed using the prediction that this m convolution module obtains next word to be generatedm;Each convolution module
It is operated to operate with a NONLINEAR CALCULATION by a convolutional calculation and be formed, convolutional calculation operation can generate two column according to following formula
D dimensional vector Y=[A, B] ∈ R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X)
Convolution operation is represented, X represents the input mapping expression vector of convolutional calculation operation, WmIt represents in m-th of convolutional calculation operation
Weight matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d;
2.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 2.1) convolution operation2dIn
Two column d dimensional vector B obtain the output g=δ (B) of information flow momentum in control network, the output in conjunction with door operation function δ (B)
Next neuron will be transmitted to;
Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, in conjunction with the control network of generation
The output g=δ (B) of middle information flow momentum is exported according to the convolution module that following formula obtains encoder;
Wherein ri mRepresent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is definition
A good parameter (such as 3,5,7 etc. can be determined), ri m-1Represent the i-th dimension value of the output of the m-1 encoder convolution module;
2.3) following formula is utilized, in conjunction with the i-th dimension value r of the output of m-th of convolution module of decoderi m, obtain the decoding
Device convolution module corresponds to the i-th dimension value of attention mechanism output
Wherein,Weight matrix is represented,Represent bias vector, giRepresentation parameter coefficient (giIt can be manually set);
It is exported later using the available correspondence attention mechanism corresponding to m-th of convolution module of decoder of following formula
I-th dimension valueIn conjunction with the jth dimension value in m-th of convolution module output of encoder For Integrative expression in step 1) to
Measure qmJth dimension value obtains corresponding activation parameter
The jth dimension value of encoder overall output is combined laterIn conjunction with word in encoder step 1) Integrative expression to
Measure oc={ oc1,...,ocnJth dimension value ocj, ocjFor the Integrative expression vector jth dimension value of c-th of word, decoder is obtained
The i-th dimension value activating part add-ins of m-th of convolution module output
The i-th dimension value activating part add-ins that m-th of convolution module of decoder of generation is exportedWith decoder m
The i-th dimension value r of a convolution module outputi mIt is added, by the circular treatment of m convolution module, obtains final decoder output
rm;
2.3) by by the output r of decoderm, it is input in softmax function, will be generated according to the acquisition of following formula
Next word probability,
p(yi+1|y1,...,yi)=softmax (Worm+bo)
Wherein, WoRepresent weight matrix, boBias vector is represented, softmax () represents softmax function, general using this
Rate output finds the corresponding word of maximum probability as the next word output of dialogue generated.p(yi+1|y1,...,yi) be
The probability of next word, yi+1|y1,...,yiIn, yi+1Indicate i+1 word, y1To indicate the 1st word, yiIndicate the
I word.
3) it by training, obtains final convolution dialogue and generates model, required context can be generated using the model
Dialogue.
The above method is applied in the following example below, it is specific in embodiment to embody technical effect of the invention
Step repeats no more.
Embodiment
The present invention tests on DailyDialog data set.In order to objectively evaluate the performance of algorithm of the invention,
The present invention has used Average, Greedy, Extrema, these four evaluations of Training Time in selected test set
Standard evaluates effect of the invention.According to step described in specific embodiment, resulting experimental result is such as
Shown in table 1, the present invention is directed to Average, Greedy, Extrema, the test result of tetra- kinds of standards of Training Time, sheet
Method is expressed as ConvTalker.
Table 1
Claims (5)
1. a kind of talk with the method for generating model and solving dialogue generation task using convolution, which comprises the steps of:
1) be directed to the above of the next word for the dialogue to be generated, will above carry out word be mapped to corresponding meaning to
Amount, and the position vector of word is obtained, the meaning vector of obtained word is added with the position vector of word later, is obtained single
The Integrative expression vector of word;
The Integrative expression vector for the word that will acquire is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit,
Obtain Integrative expression above;
2) the last one word above of the next word for the dialogue to be generated is converted into the meaning vector of last word, and
In conjunction with the position vector of last word, the two is added the Integrative expression for obtaining last word;
The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and is combined
The Integrative expression above that step 1) obtains obtains next expression that generate word;
3) it by training, obtains final convolution dialogue and generates model, the context needed for being generated using the model is talked with.
2. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists
In in step 1), the meaning vector of the word is wc={ wc1,...,wcn, wcFor the meaning vector of c-th of word, wc1
For c-th of word meaning to 1 dimension value of flow control, wcnFor the n-th dimension value of meaning vector of c-th of word;
The position vector of the word is pc={ pc1,...,pcn, pcFor the position vector of c-th of word, pc1It is single for c-th
The 1st dimension value of position vector of word, pcnFor the n-th dimension value of position vector of c-th of word;
The Integrative expression vector o of the wordc={ oc1,...,ocn, ocFor the Integrative expression vector of c-th of word, oc1For
The Integrative expression of c-th of word is to 1 dimension value of flow control, ocnFor the n-th dimension value of Integrative expression vector of c-th of word.
3. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists
In in step 1), the Integrative expression vector for the word that will acquire, which is input to, combines convolutional layer in conjunction with gate-type linear unit
Coding network obtains Integrative expression above, specifically includes:
1.1) by the Integrative expression vector o of wordc={ oc1,...,ocnCirculation is input in m convolution module, utilize this m a
Convolution module obtains Integrative expression vector q abovem;Each convolution module is grasped by a convolutional calculation in m convolution module
Make to form with a NONLINEAR CALCULATION operation, convolutional calculation operation can generate two column d dimensional vector Y=[A, B] according to following formula
∈R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X) it represents
Convolution operation, X represent the input mapping expression vector of convolutional calculation operation, WmRepresent the weight in m-th of convolutional calculation operation
Matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d;
1.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 1.1) convolution operation2dIn secondary series d
Dimensional vector B obtains the output g=δ (B) of information flow momentum in control network in conjunction with door operation function δ (B), which will transmit
To next neuron;
Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, believe in the control network in conjunction with generation
The output g=δ (B) for ceasing amount of flow is exported according to the convolution module that following formula obtains encoder,
Wherein,Represent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is fixed
The good parameter (such as 3,5,7 etc. can be determined) of justice,Represent the i-th dimension of the output of the m-1 encoder convolution module
Value;
By the continuous operation of m convolution module, Integrative expression q above can be obtainedm。
4. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists
In in step 2), the meaning vector of the last word is ww={ ww1,...,wwn, wwFor the meaning vector of last word,
wwFor last word meaning to 1 dimension value of flow control, wwnFor the n-th dimension value of meaning vector of last word;
The position vector of the last word is pw={ pw1,...,pwn, pwFor the position vector of last word, pw1It is last
The 1st dimension value of position vector of word, pwnFor the n-th dimension value of position vector of last word;
The Integrative expression of the last word is ow={ ow1,...,own, owFor the Integrative expression vector of last word, ow1For
The Integrative expression of last word is to 1 dimension value of flow control, ownFor the n-th dimension value of Integrative expression vector of last word.
5. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists
In the Integrative expression of last word being input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and combine
The Integrative expression above that step 1) obtains obtains next expression that generate word, specifically includes:
2.1) by the Integrative expression o of last wordw={ ow1,...,ownCirculation be input to the identical m convolution with encoder
In module, r is expressed using the prediction that this m convolution module obtains next word to be generatedm;Each convolution module is by one
A convolutional calculation operation is formed with a NONLINEAR CALCULATION operation, and convolutional calculation operation can generate two column d dimensions according to following formula
Vector Y=[A, B] ∈ R2d,
Y=fconv(X)=WmX+bm
Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R2dFor all vector set of 2d dimension, fconv(X) it represents
Convolution operation, X represent the input mapping expression vector of convolutional calculation operation, WmRepresent the weight in m-th of convolutional calculation operation
Matrix, bmRepresent the bias vector in m-th of convolutional calculation operation;
Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation2d;
2.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 2.1) convolution operation2dIn secondary series d
Dimensional vector B obtains the output g=δ (B) of information flow momentum in control network in conjunction with door operation function δ (B), which will transmit
To next neuron;
Output Y=[A, B] the ∈ R that convolution operation is generated2dIn first row d dimensional vector A, believe in the control network in conjunction with generation
The output g=δ (B) for ceasing amount of flow is exported according to the convolution module that following formula obtains encoder;
Wherein ri mRepresent the i-th dimension value of the output of m-th of encoder convolution module, fconv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is definition
A good parameter, ri m-1Represent the i-th dimension value of the output of the m-1 encoder convolution module;
2.3) following formula is utilized, in conjunction with the i-th dimension value r of the output of m-th of convolution module of decoderi m, obtain decoder volume
Volume module corresponds to the i-th dimension value of attention mechanism output
Wherein,Weight matrix is represented,Represent bias vector, giRepresentation parameter coefficient;
The of the available correspondence attention mechanism output corresponding to m-th of convolution module of decoder of following formula is utilized later
I dimension valueIn conjunction with the jth dimension value in m-th of convolution module output of encoder For Integrative expression vector q in step 1)m
Jth dimension value obtains corresponding activation parameter
The jth dimension value of encoder overall output is combined laterIn conjunction with the Integrative expression vector o of word in encoder step 1)c
={ oc1,...,ocnJth dimension value ocj, ocjFor the Integrative expression vector jth dimension value of c-th of word, decoder m is obtained
The i-th dimension value activating part add-ins of a convolution module output
The i-th dimension value activating part add-ins that m-th of convolution module of decoder of generation is exportedWith m-th of convolution of decoder
The i-th dimension value r of module outputi mIt is added, by the circular treatment of m convolution module, obtains final decoder output rm;
2.3) by by the output r of decoderm, it is input in softmax function, will be generated down according to the acquisition of following formula
The probability of one word,
p(yi+1|y1,...,yi)=softmax (Worm+bo)
Wherein, WoRepresent weight matrix, boBias vector is represented, softmax () represents softmax function, defeated using the probability
Out, the corresponding word of maximum probability is found as the next word output of dialogue generated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811057115.9A CN109255020B (en) | 2018-09-11 | 2018-09-11 | Method for solving dialogue generation task by using convolution dialogue generation model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811057115.9A CN109255020B (en) | 2018-09-11 | 2018-09-11 | Method for solving dialogue generation task by using convolution dialogue generation model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109255020A true CN109255020A (en) | 2019-01-22 |
CN109255020B CN109255020B (en) | 2022-04-01 |
Family
ID=65046678
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811057115.9A Active CN109255020B (en) | 2018-09-11 | 2018-09-11 | Method for solving dialogue generation task by using convolution dialogue generation model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109255020B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110196928A (en) * | 2019-05-17 | 2019-09-03 | 北京邮电大学 | Fully parallelized end-to-end more wheel conversational systems and method with field scalability |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140201126A1 (en) * | 2012-09-15 | 2014-07-17 | Lotfi A. Zadeh | Methods and Systems for Applications for Z-numbers |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107273487A (en) * | 2017-06-13 | 2017-10-20 | 北京百度网讯科技有限公司 | Generation method, device and the computer equipment of chat data based on artificial intelligence |
CN107506823A (en) * | 2017-08-22 | 2017-12-22 | 南京大学 | A kind of construction method for being used to talk with the hybrid production style of generation |
CN107590153A (en) * | 2016-07-08 | 2018-01-16 | 微软技术许可有限责任公司 | Use the dialogue correlation modeling of convolutional neural networks |
US20180060301A1 (en) * | 2016-08-31 | 2018-03-01 | Microsoft Technology Licensing, Llc | End-to-end learning of dialogue agents for information access |
CN107980130A (en) * | 2017-11-02 | 2018-05-01 | 深圳前海达闼云端智能科技有限公司 | It is automatic to answer method, apparatus, storage medium and electronic equipment |
CN108388944A (en) * | 2017-11-30 | 2018-08-10 | 中国科学院计算技术研究所 | LSTM neural network chips and its application method |
-
2018
- 2018-09-11 CN CN201811057115.9A patent/CN109255020B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140201126A1 (en) * | 2012-09-15 | 2014-07-17 | Lotfi A. Zadeh | Methods and Systems for Applications for Z-numbers |
CN107590153A (en) * | 2016-07-08 | 2018-01-16 | 微软技术许可有限责任公司 | Use the dialogue correlation modeling of convolutional neural networks |
US20180060301A1 (en) * | 2016-08-31 | 2018-03-01 | Microsoft Technology Licensing, Llc | End-to-end learning of dialogue agents for information access |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107273487A (en) * | 2017-06-13 | 2017-10-20 | 北京百度网讯科技有限公司 | Generation method, device and the computer equipment of chat data based on artificial intelligence |
CN107506823A (en) * | 2017-08-22 | 2017-12-22 | 南京大学 | A kind of construction method for being used to talk with the hybrid production style of generation |
CN107980130A (en) * | 2017-11-02 | 2018-05-01 | 深圳前海达闼云端智能科技有限公司 | It is automatic to answer method, apparatus, storage medium and electronic equipment |
CN108388944A (en) * | 2017-11-30 | 2018-08-10 | 中国科学院计算技术研究所 | LSTM neural network chips and its application method |
Non-Patent Citations (3)
Title |
---|
MIN YANG ET AL.: "Investigating Deep Reinforcement Learning Techniques in Personalized Dialogue Generation", 《PROCEEDINGS OF THE 2018 SIAM INTERNATIONAL CONFERENCE ON DATA MINING》 * |
XIAOYU SHEN ET AL.: "Improving Variational Encoder-Decoders in Dialogue Generation", 《THE THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 * |
贾熹滨 等: "智能对话系统研究综述", 《北京工业大学学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110196928A (en) * | 2019-05-17 | 2019-09-03 | 北京邮电大学 | Fully parallelized end-to-end more wheel conversational systems and method with field scalability |
CN110196928B (en) * | 2019-05-17 | 2021-03-30 | 北京邮电大学 | Fully parallelized end-to-end multi-turn dialogue system with domain expansibility and method |
Also Published As
Publication number | Publication date |
---|---|
CN109255020B (en) | 2022-04-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shlezinger et al. | UVeQFed: Universal vector quantization for federated learning | |
Liu et al. | Deep neural network architectures for modulation classification | |
WO2020258668A1 (en) | Facial image generation method and apparatus based on adversarial network model, and nonvolatile readable storage medium and computer device | |
CN109919204B (en) | Noise image-oriented deep learning clustering method | |
CN109271646A (en) | Text interpretation method, device, readable storage medium storing program for executing and computer equipment | |
CN109906460A (en) | Dynamic cooperation attention network for question and answer | |
CN108846323A (en) | A kind of convolutional neural networks optimization method towards Underwater Targets Recognition | |
CN108829756B (en) | Method for solving multi-turn video question and answer by using hierarchical attention context network | |
Xu et al. | Deep neural network self-distillation exploiting data representation invariance | |
CN110349588A (en) | A kind of LSTM network method for recognizing sound-groove of word-based insertion | |
CN108446766A (en) | A kind of method of quick trained storehouse own coding deep neural network | |
CN112289338B (en) | Signal processing method and device, computer equipment and readable storage medium | |
Hahne et al. | Attention on abstract visual reasoning | |
CN112233012A (en) | Face generation system and method | |
CN114445420A (en) | Image segmentation model with coding and decoding structure combined with attention mechanism and training method thereof | |
Li et al. | Detection of multiple steganography methods in compressed speech based on code element embedding, Bi-LSTM and CNN with attention mechanisms | |
CN107547088A (en) | Enhanced self-adapted segmentation orthogonal matching pursuit method based on compressed sensing | |
CN113283577A (en) | Industrial parallel data generation method based on meta-learning and generation countermeasure network | |
Kim et al. | WaveNODE: A continuous normalizing flow for speech synthesis | |
CN109255020A (en) | A method of talked with using convolution and generates model solution dialogue generation task | |
CN115054270A (en) | Sleep staging method and system for extracting sleep spectrogram features based on GCN | |
CN112380843B (en) | Random disturbance network-based open answer generation method | |
CN116306780B (en) | Dynamic graph link generation method | |
Le et al. | Data selection for acoustic emotion recognition: Analyzing and comparing utterance and sub-utterance selection strategies | |
CN113361505B (en) | Non-specific human sign language translation method and system based on contrast decoupling element learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |