CN109255020A

CN109255020A - A method of talked with using convolution and generates model solution dialogue generation task

Info

Publication number: CN109255020A
Application number: CN201811057115.9A
Authority: CN
Inventors: 赵洲; 章璇; 孟令涛; 梁伟欣; 金志华
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2018-09-11
Filing date: 2018-09-11
Publication date: 2019-01-22
Anticipated expiration: 2038-09-11
Also published as: CN109255020B

Abstract

The invention discloses a kind of methods for being talked with using convolution and generating model and solving dialogue generation task, include the following steps: the above of the next word for being directed to the dialogue to be generated, the meaning vector of obtained word and the position vector of word are added, obtain the Integrative expression vector of word；It is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, obtains Integrative expression above；The last one word above is converted into the meaning vector of last word, and combines the position vector of last word, the two is added the Integrative expression for obtaining last word；It is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and combines Integrative expression above, obtains next expression that generate word.Talk with present invention utilizes convolution and generate model, can overcome and be led to not in the prior art using Recognition with Recurrent Neural Network using the parallel feature of GPU, and Recognition with Recurrent Neural Network will lead to the problem of gradient disappears.

Description

A method of talked with using convolution and generates model solution dialogue generation task

Technical field

The present invention relates to dialogue generation task technical fields, and in particular to a kind of to generate model solution pair using convolution dialogue The method for talking about generation task.

Background technique

Instantly, the dialogue generation of non task guiding has attracted wide attention, and becomes an important service, but mesh Before have this service effect be not fine.

For existing technology mainly using doing based on Recognition with Recurrent Neural Network, this method, which mainly passes through, utilizes circulation mind The timing meaning possessed through network, to complete the generation of dialogue.But Recognition with Recurrent Neural Network is due to being related to timing, so nothing Method uses the parallel feature of GPU (Graphics Processing Unit, graphics processor).Simultaneously as circulation nerve The chain type derivation of network, causes Recognition with Recurrent Neural Network to be prone to gradient extinction tests.In order to overcome these defects, this method Convolution dialogue will be used to generate model and complete dialogue generation task.

The present invention will obtain current session context first with the convolutional neural networks with attention mechanism module This expression, is input in decoder module by expression later, and the next word talked with needed for obtaining successively carries out, and generates entire Dialogue.

Summary of the invention

It is an object of the invention to solve the problems of the prior art, in order to overcome in the prior art using circulation nerve net Network leads to not using the parallel feature of GPU, and Recognition with Recurrent Neural Network will lead to the problem of gradient disappears, and the present invention provides a kind of benefit Talk with the method for generating model and solving dialogue generation task with convolution.

Specific technical solution of the present invention is:

A method of talked with using convolution and generate model solution dialogue generation task, included the following steps:

1) it is directed to (context) above of the next word for the dialogue to be generated, the word above that carries out is mapped to Corresponding meaning vector (obtaining word expression above), and obtain the position vector of word, later containing obtained word Adopted vector is added with the position vector of word, obtains the Integrative expression vector of word；

The Integrative expression vector for the word that will acquire is input to the coding for combining convolutional layer in conjunction with gate-type linear unit Network obtains Integrative expression above；

2) by the last one word (word that last time generates, letter above of the next word for the dialogue to be generated Claim last word) it is converted into the meaning vector (expression for obtaining last word) of last word, and combine the position of last word Vector, the two are added the Integrative expression for obtaining last word；

The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and In conjunction with the Integrative expression above that step 1) obtains, obtains next expression that generate word and (obtained using the expression next A word to be generated)；

3) it by training, obtains final convolution dialogue and generates model, required context can be generated using the model Dialogue.

In step 1), the meaning vector of the word is w_c={ w_c1,...,w_cn, w_cFor c-th of word meaning to Amount, w_c1For c-th of word meaning to 1 dimension value of flow control, w_cnFor the n-th dimension value of meaning vector of c-th of word；

The position vector of the word is p_c={ p_c1,...,p_cn, p_cFor the position vector of c-th of word, p_c1For c The 1st dimension value of position vector of a word, p_cnFor the n-th dimension value of position vector of c-th of word；

The Integrative expression vector o of the word_c={ o_c1,...,o_cn, o_cFor the Integrative expression vector of c-th of word, o_c1For c-th of word Integrative expression to 1 dimension value of flow control, o_cnFor the n-th dimension value of Integrative expression vector of c-th of word.

In step 1), the Integrative expression vector for the word that will acquire, which is input to, combines convolutional layer and gate-type linear unit knot The coding network of conjunction obtains Integrative expression above, specifically includes:

1.1) by the Integrative expression vector o of word_c={ o_c1,...,o_cnCirculation be input in m convolution module, utilize this M convolution module obtains Integrative expression vector q above^m；Each convolution module is by a convolutional calculation in m convolution module Operation is formed with the operation of NONLINEAR CALCULATION, convolutional calculation operation can be generated according to following formula two column d dimensional vector Y=[A, B]∈R^2d,

Y=f_conv(X)=W^mX+b^m

Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R^2dFor all vector set of 2d dimension, f_conv(X) Convolution operation is represented, X represents the input mapping expression vector of convolutional calculation operation, W^mIt represents in m-th of convolutional calculation operation Weight matrix, b^mRepresent the bias vector in m-th of convolutional calculation operation；

Two column d dimensional vector Y=[A, B] ∈ R are obtained by calculation^2d；

1.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 1.1) convolution operation^2dIn Two column d dimensional vector B obtain the output g=δ (B) of information flow momentum in control network, the output in conjunction with door operation function δ (B) Next neuron will be transmitted to；Output Y=[A, B] the ∈ R that convolution operation is generated^2dIn first row d dimensional vector A, in conjunction with The output g=δ (B) of information flow momentum, exports according to the convolution module that following formula obtains encoder in the control network of generation,

Wherein,Represent the i-th dimension value of the output of m-th of encoder convolution module, f_conv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is fixed The good parameter (such as 3,5,7 etc. can be determined) of justice,Represent the i-th dimension of the output of the m-1 encoder convolution module Value；

By the continuous operation of m convolution module, Integrative expression q above can be obtained^m。

In step 2), the meaning vector of the last word is w_w={ w_w1,...,w_wn, w_wFor the meaning of last word Vector, w_wFor last word meaning to 1 dimension value of flow control, w_wnFor the n-th dimension value of meaning vector of last word；

The position vector of the last word is p_w={ p_w1,...,p_wn, p_wFor the position vector of last word, p_w1For The 1st dimension value of position vector of last word, p_wnFor the n-th dimension value of position vector of last word；

The Integrative expression of the last word is o_w={ o_w1,...,o_wn, o_wFor the Integrative expression vector of last word, o_w1For last word Integrative expression to 1 dimension value of flow control, o_wnFor the n-th dimension value of Integrative expression vector of last word.

The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and In conjunction with the Integrative expression above that step 1) obtains, obtains next expression that generate word and (obtained using the expression next A word to be generated), it specifically includes:

2.1) by the Integrative expression o of last word_w={ o_w1,...,o_wnCirculation is input to and m identical in encoder is a In convolution module, r is expressed using the prediction that this m convolution module obtains next word to be generated^m；Each convolution module It is operated to operate with a NONLINEAR CALCULATION by a convolutional calculation and be formed, convolutional calculation operation can generate two column according to following formula D dimensional vector Y=[A, B] ∈ R^2d,

Y=f_conv(X)=W^mX+b^m

2.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 2.1) convolution operation^2dIn Two column d dimensional vector B obtain the output g=δ (B) of information flow momentum in control network, the output in conjunction with door operation function δ (B) Next neuron will be transmitted to；

Output Y=[A, B] the ∈ R that convolution operation is generated^2dIn first row d dimensional vector A, in conjunction with the control network of generation The output g=δ (B) of middle information flow momentum is exported according to the convolution module that following formula obtains encoder；

Wherein r_i ^mRepresent the i-th dimension value of the output of m-th of encoder convolution module, f_conv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is definition A good parameter (such as 3,5,7 etc. can be determined), r_i ^m-1Represent the i-th dimension value of the output of the m-1 encoder convolution module；

2.3) following formula is utilized, in conjunction with the i-th dimension value r of the output of m-th of convolution module of decoder_i ^m, obtain the decoding Device convolution module corresponds to the i-th dimension value of attention mechanism output

Wherein,Weight matrix is represented,Represent bias vector, g_iRepresentation parameter coefficient (g_iIt can be manually set)；

It is exported later using the available correspondence attention mechanism corresponding to m-th of convolution module of decoder of following formula I-th dimension valueIn conjunction with the jth dimension value in m-th of convolution module output of encoder For Integrative expression in step 1) to Measure q^mJth dimension value obtains corresponding activation parameter

The jth dimension value of encoder overall output is combined laterIn conjunction with word in encoder step 1) Integrative expression to Measure o_c={ o_c1,...,o_cnJth dimension value o_cj, o_cjFor the Integrative expression vector jth dimension value of c-th of word, decoder is obtained The i-th dimension value activating part add-ins of m-th of convolution module output

The i-th dimension value activating part add-ins that m-th of convolution module of decoder of generation is exportedWith m-th of decoder The i-th dimension value r of convolution module output_i ^mIt is added, by the circular treatment of m convolution module, obtains final decoder output r^m；

2.3) by by the output r of decoder^m, it is input in softmax function, will be generated according to the acquisition of following formula Next word probability,

p(y_i+1|y₁,...,y_i)=softmax (W_or^m+b_o)

Wherein, W_oRepresent weight matrix, b_oBias vector is represented, softmax () represents softmax function, general using this Rate output finds the corresponding word of maximum probability as the next word output of dialogue generated.p(y_i+1|y₁,...,y_i) be The probability of next word, y_i+1|y₁,...,y_iIn, y_i+1Indicate i+1 word, y₁To indicate the 1st word, y_iIndicate the I word.

Compared with prior art, the present invention has the advantage that

The present invention generates the method that model solves dialogue generation task using convolution dialogue, generates compared to general dialogue Solution talks with present invention utilizes convolution and generates model, and can overcome is caused using Recognition with Recurrent Neural Network in the prior art The parallel feature of GPU can not be utilized, and Recognition with Recurrent Neural Network will lead to the problem of gradient disappears.The present invention is in dialogue generation task Acquired effect is more preferable compared to traditional method.

Detailed description of the invention

Fig. 1 is the flow diagram that the present invention generates that model solves the method for dialogue generation task using convolution dialogue.

Specific embodiment

As shown in Figure 1, a kind of talk with the method for generating model and solving dialogue generation task using convolution, including walk as follows It is rapid:

1) it is directed to (context) above of the next word for the dialogue to be generated, the word above that carries out is mapped to Corresponding meaning vector (obtaining word expression above), and obtain the position vector of word, later containing obtained word Adopted vector is added with the expression of the position vector of word, obtains the Integrative expression vector of word；

In step 1), the meaning vector of word is w_c={ w_c1,...,w_cn, w_cFor the meaning vector of c-th of word, w_c1For The meaning of c-th of word is to 1 dimension value of flow control, w_cnFor the n-th dimension value of meaning vector of c-th of word；

The position vector of word is p_c={ p_c1,...,p_cn, p_cFor the position vector of c-th of word, p_c1For c-th of word The 1st dimension value of position vector, p_cnFor the n-th dimension value of position vector of c-th of word；

The Integrative expression vector o of word_c={ o_c1,...,o_cn, o_cFor the Integrative expression vector of c-th of word, o_c1For c The Integrative expression of a word is to 1 dimension value of flow control, o_cnFor the n-th dimension value of Integrative expression vector of c-th of word.

The Integrative expression vector for the word that will acquire is input to the coding for combining convolutional layer in conjunction with gate-type linear unit Network obtains Integrative expression above, specifically includes:

Y=f_conv(X)=W^mX+b^m

2) by the last one word above of next word of the dialogue to be generated (word of last time generation, Referred to as last word) it is converted into the meaning vector (expression for obtaining last word) of last word, and combine the position of last word Vector is set, the two is added the Integrative expression for obtaining last word；

In step 2), the meaning vector of last word is w_w={ w_w1,...,w_wn, w_wFor the meaning vector of last word, w_wFor last word meaning to 1 dimension value of flow control, w_wnFor the n-th dimension value of meaning vector of last word；

The position vector of last word is p_w={ p_w1,...,p_wn, p_wFor the position vector of last word, p_w1It is last single The 1st dimension value of position vector of word, p_wnFor the n-th dimension value of position vector of last word；

The Integrative expression of last word is o_w={ o_w1,...,o_wn, o_wFor the Integrative expression vector of last word, o_w1For most The Integrative expression of word is to 1 dimension value of flow control, o afterwards_wnFor the n-th dimension value of Integrative expression vector of last word.

Y=f_conv(X)=W^mX+b^m

The i-th dimension value activating part add-ins that m-th of convolution module of decoder of generation is exportedWith decoder m The i-th dimension value r of a convolution module output_i ^mIt is added, by the circular treatment of m convolution module, obtains final decoder output r^m；

p(y_i+1|y₁,...,y_i)=softmax (W_or^m+b_o)

The above method is applied in the following example below, it is specific in embodiment to embody technical effect of the invention Step repeats no more.

Embodiment

The present invention tests on DailyDialog data set.In order to objectively evaluate the performance of algorithm of the invention, The present invention has used Average, Greedy, Extrema, these four evaluations of Training Time in selected test set Standard evaluates effect of the invention.According to step described in specific embodiment, resulting experimental result is such as Shown in table 1, the present invention is directed to Average, Greedy, Extrema, the test result of tetra- kinds of standards of Training Time, sheet Method is expressed as ConvTalker.

Table 1

Claims

1. a kind of talk with the method for generating model and solving dialogue generation task using convolution, which comprises the steps of:

1) be directed to the above of the next word for the dialogue to be generated, will above carry out word be mapped to corresponding meaning to Amount, and the position vector of word is obtained, the meaning vector of obtained word is added with the position vector of word later, is obtained single The Integrative expression vector of word；

The Integrative expression vector for the word that will acquire is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, Obtain Integrative expression above；

2) the last one word above of the next word for the dialogue to be generated is converted into the meaning vector of last word, and In conjunction with the position vector of last word, the two is added the Integrative expression for obtaining last word；

The Integrative expression of last word is input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and is combined The Integrative expression above that step 1) obtains obtains next expression that generate word；

3) it by training, obtains final convolution dialogue and generates model, the context needed for being generated using the model is talked with.

2. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists In in step 1), the meaning vector of the word is w_c={ w_c1,...,w_cn, w_cFor the meaning vector of c-th of word, w_c1 For c-th of word meaning to 1 dimension value of flow control, w_cnFor the n-th dimension value of meaning vector of c-th of word；

The position vector of the word is p_c={ p_c1,...,p_cn, p_cFor the position vector of c-th of word, p_c1It is single for c-th The 1st dimension value of position vector of word, p_cnFor the n-th dimension value of position vector of c-th of word；

The Integrative expression vector o of the word_c={ o_c1,...,o_cn, o_cFor the Integrative expression vector of c-th of word, o_c1For The Integrative expression of c-th of word is to 1 dimension value of flow control, o_cnFor the n-th dimension value of Integrative expression vector of c-th of word.

3. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists In in step 1), the Integrative expression vector for the word that will acquire, which is input to, combines convolutional layer in conjunction with gate-type linear unit Coding network obtains Integrative expression above, specifically includes:

1.1) by the Integrative expression vector o of word_c={ o_c1,...,o_cnCirculation is input in m convolution module, utilize this m a Convolution module obtains Integrative expression vector q above^m；Each convolution module is grasped by a convolutional calculation in m convolution module Make to form with a NONLINEAR CALCULATION operation, convolutional calculation operation can generate two column d dimensional vector Y=[A, B] according to following formula ∈R^2d,

Y=f_conv(X)=W^mX+b^m

Wherein, A is first row d dimensional vector, and B is secondary series d dimensional vector, R^2dFor all vector set of 2d dimension, f_conv(X) it represents Convolution operation, X represent the input mapping expression vector of convolutional calculation operation, W^mRepresent the weight in m-th of convolutional calculation operation Matrix, b^mRepresent the bias vector in m-th of convolutional calculation operation；

1.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 1.1) convolution operation^2dIn secondary series d Dimensional vector B obtains the output g=δ (B) of information flow momentum in control network in conjunction with door operation function δ (B), which will transmit To next neuron；

Output Y=[A, B] the ∈ R that convolution operation is generated^2dIn first row d dimensional vector A, believe in the control network in conjunction with generation The output g=δ (B) for ceasing amount of flow is exported according to the convolution module that following formula obtains encoder,

4. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists In in step 2), the meaning vector of the last word is w_w={ w_w1,...,w_wn, w_wFor the meaning vector of last word, w_wFor last word meaning to 1 dimension value of flow control, w_wnFor the n-th dimension value of meaning vector of last word；

The position vector of the last word is p_w={ p_w1,...,p_wn, p_wFor the position vector of last word, p_w1It is last The 1st dimension value of position vector of word, p_wnFor the n-th dimension value of position vector of last word；

The Integrative expression of the last word is o_w={ o_w1,...,o_wn, o_wFor the Integrative expression vector of last word, o_w1For The Integrative expression of last word is to 1 dimension value of flow control, o_wnFor the n-th dimension value of Integrative expression vector of last word.

5. according to claim 1 talk with the method for generating model and solving dialogue generation task using convolution, feature exists In the Integrative expression of last word being input to the coding network for combining convolutional layer in conjunction with gate-type linear unit, and combine The Integrative expression above that step 1) obtains obtains next expression that generate word, specifically includes:

2.1) by the Integrative expression o of last word_w={ o_w1,...,o_wnCirculation be input to the identical m convolution with encoder In module, r is expressed using the prediction that this m convolution module obtains next word to be generated^m；Each convolution module is by one A convolutional calculation operation is formed with a NONLINEAR CALCULATION operation, and convolutional calculation operation can generate two column d dimensions according to following formula Vector Y=[A, B] ∈ R^2d,

Y=f_conv(X)=W^mX+b^m

2.2) output Y=[A, B] the ∈ R that NONLINEAR CALCULATION operation can be generated using step 2.1) convolution operation^2dIn secondary series d Dimensional vector B obtains the output g=δ (B) of information flow momentum in control network in conjunction with door operation function δ (B), which will transmit To next neuron；

Output Y=[A, B] the ∈ R that convolution operation is generated^2dIn first row d dimensional vector A, believe in the control network in conjunction with generation The output g=δ (B) for ceasing amount of flow is exported according to the convolution module that following formula obtains encoder；

Wherein r_i ^mRepresent the i-th dimension value of the output of m-th of encoder convolution module, f_conv() represents convolution operation,(i-k/2) to (i+k/2) dimension of the output of the m-1 encoder convolution module is represented, k is definition A good parameter, r_i ^m-1Represent the i-th dimension value of the output of the m-1 encoder convolution module；

2.3) following formula is utilized, in conjunction with the i-th dimension value r of the output of m-th of convolution module of decoder_i ^m, obtain decoder volume Volume module corresponds to the i-th dimension value of attention mechanism output

Wherein,Weight matrix is represented,Represent bias vector, g_iRepresentation parameter coefficient；

The of the available correspondence attention mechanism output corresponding to m-th of convolution module of decoder of following formula is utilized later I dimension valueIn conjunction with the jth dimension value in m-th of convolution module output of encoder For Integrative expression vector q in step 1)^m Jth dimension value obtains corresponding activation parameter

The jth dimension value of encoder overall output is combined laterIn conjunction with the Integrative expression vector o of word in encoder step 1)_c ={ o_c1,...,o_cnJth dimension value o_cj, o_cjFor the Integrative expression vector jth dimension value of c-th of word, decoder m is obtained The i-th dimension value activating part add-ins of a convolution module output

The i-th dimension value activating part add-ins that m-th of convolution module of decoder of generation is exportedWith m-th of convolution of decoder The i-th dimension value r of module output_i ^mIt is added, by the circular treatment of m convolution module, obtains final decoder output r^m；

2.3) by by the output r of decoder^m, it is input in softmax function, will be generated down according to the acquisition of following formula The probability of one word,

p(y_i+1|y₁,...,y_i)=softmax (W_or^m+b_o)

Wherein, W_oRepresent weight matrix, b_oBias vector is represented, softmax () represents softmax function, defeated using the probability Out, the corresponding word of maximum probability is found as the next word output of dialogue generated.