CN106933785A

CN106933785A - A kind of abstraction generating method based on recurrent neural network

Info

Publication number: CN106933785A
Application number: CN201710099638.9A
Authority: CN
Inventors: 贾江龙; 刘聪
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2017-02-23
Filing date: 2017-02-23
Publication date: 2017-07-07

Abstract

The present invention relates to a kind of abstraction generating method based on recurrent neural network, in current time t, by the state vector s of recurrent neural network decoder_tState vector with each moment of recurrent neural network encoder is contrasted, and is found out and state vector s_tThe most strong state vector H of relevance, then d state vector of utilization state vector H and the state vector H left and right sides be calculated state vector c_t, utilization state vector c_t, state vector s_tIt is calculated new state vector d_t, then according to state vector d_tDecoding obtains next word or word of output sequence, and wherein d is the integer more than or equal to 1.

Description

A kind of abstraction generating method based on recurrent neural network

Technical field

The present invention relates to field of neural networks, more particularly, to a kind of summarization generation side based on recurrent neural network Method.

Background technology

Summarization generation is a major issue of natural language processing field, and it mainly has two kinds of different forms：It is a kind of It is the purport for generating source document, another kind is the title for generating source document.The former is general long, potentially include dozens of word or Word, and the latter is relatively short, general only ten words or so.Summary is that the height of source document is summarized, and it must be simple and clear Expression source document the meaning.Traditional abstraction generating method can be divided into three steps：First, according to certain standard (such as participle) Source document is divided into many small pieces；2nd, according to each segment weight (such as tf-idf), therefrom select weight ratio compared with Those big segments；3rd, the larger segment of those weights is combined into by new sentence according to certain algorithm, as plucking for source document Will.

Prior art provides a kind of abstraction generating method for being based on recurrent neural network encoder and decoder, should Method is actually machine-learning process of the sequence to sequence, and its input can be a sentence, a paragraph or One article, its output is the purport or title of corresponding input, therefore input and output can be regarded as by word or The time series of word composition.Compared with traditional abstraction generating method, this is a kind of abstract summarization generation process.Its basis is given Fixed list entries, method by whole vocabulary, recursively make constituting a new sentence from front to back by search keyword Be output sequence, i.e. summary info.

The effect of wherein recurrent neural network encoder be using recurrent neural network will be given list entries conversion or Person's mapping turns into an intermediate expression, and the paragraph or article that will be input into are converted into a vector expression H.Assuming that input Sequence is：X={ x₁,x₂,…,x_n, wherein n represents the length of list entries.As shown in figure 1, the effect of encoder can be used to Lower expression formula is represented：

h_t=f (x_t,h_t-1)

Wherein, x_tRepresent t-th corresponding vector of element, h in list entries_tRepresent the state vector of t encoder, f WithRepresent nonlinear mapping function.H represents the vector expression of list entries, generally takes H=h_n, i.e., with passing Return neutral net encoder last moment state vector as list entries intermediate vector expression formula.Eos is a spy Different mark, represents the termination of list entries, and encoder work end and the beginning of decoder functions.

Accordingly, the effect of recurrent neural network decoder is the intermediate vector expression formula H next life using encoder generation Into output sequence.Assuming that output sequence is：Y={ y₁,y₂,…,y_m, wherein m represents the length of output sequence.Should be noted It is that decoder is not disposably to generate whole output sequence, but is generated according to each moment of order from front to back and exported One word or a word of order, untill the whole output sequence of generation.As shown in Fig. 2 the effect of decoder can be used Following formula is represented：

P(y_t|y₁..., y_t-1, H) and=g (s_t,H)

s_t=f (y_t-1, s_t-1)

Wherein, P (Y │ X) represents the probability for obtaining being input into Y according to input X；y_tRepresent the decoding of t in output sequence The word or word for going out, s_tRepresent the state vector of t decoder.F and g represent non-linear transfer function, and g is used here Softmax functions.

In such scheme, the abstraction generating method based on recurrent neural network encoder and decoder has a deficiency, i.e., Output sequence generation only it is relevant with the state vector of the last moment of encoder, but with encoder in other state vectors without Close.When recurrent neural network length increase when, recurrent neural network extract characteristic vector often with list entries after The state relation in face increases, but is reduced with the state relation before list entries, and this may cause the decay of information. Therefore only being decoded according to the final state of encoder can cause the relevance between output sequence and list entries to die down.

The content of the invention

The present invention can cause the defect of information attenuation to solve above prior art during decoding, there is provided a kind of Abstraction generating method based on recurrent neural network

To realize above goal of the invention, the technical scheme of use is：

A kind of abstraction generating method based on recurrent neural network, in current time t, by recurrent neural network decoder State vector s_tState vector with each moment of recurrent neural network encoder is contrasted, and is found out and state vector s_tClose The most strong state vector H of connection property, then d state vector of utilization state vector H and the state vector H left and right sides be calculated State vector c_t, utilization state vector c_t, state vector s_tObtain new state vector d_t, then according to state vector d_tDecode To next word or word of output sequence, wherein d is the integer more than or equal to 1.

In such scheme, when next word or word of output sequence to be generated, the method that the present invention is provided is simultaneously Do not decoded with the current state vector of decoder directly；But the current state vector s of decoder_tWith each moment of encoder State vector contrasted, find out and s_tThe state vector H of most like encoder, then by H and its both sides several State vector is calculated a state vector c_t, and utilization state vector c_t, state vector s_tIt is calculated new state vector d_t, it has reacted the word currently to be generated or word should be proportionate with it, i.e., it is the next word or word that will be generated Alignment information.Finally according to state vector d_tNext word or word of output sequence are obtained to decode.By finding H and profit State vector c is generated with H_t, and utilization state vector c_t, state vector s_tIt is calculated new state vector d_t, finally using shape State vector d_tDecoding obtains next word or word of output sequence so that method is solved between output sequence and list entries Alignment relation, improve the quality and efficiency of output sequence, and make relevance between output sequence and list entries so as to Level higher can be maintained at.

Preferably, it is described to find out and state vector s_tThe detailed process of the most strong state vector H of relevance is as follows：

Wherein p_tPositions of the state vector H in recurrent neural network coder state sequence vector is represented, n represents input The length of sequence,w_pThe parameter for needing study is represented, sigmoid functions are specifically expressed as follows：

Sigmoid (x)=1/ (1+e^-x)

Tanh function representations are as follows：

Preferably, it is described to be calculated state vector c_tDetailed process it is as follows：

Wherein, α_tiRepresent weight, h_iRepresent the state vector of recurrent neural network encoder.

Preferably, the α_tiTo ask for process as follows：

Wherein e_tiRepresent the relevance weight of state vector in state vector and encoder in decoder：

e_ti=s_i*h_i。

Preferably, the state vector d_tCalculating process it is as follows：

d_t=sigmoid (c_t*s_t)

Wherein, sigmoid functions are specifically expressed as follows：

Sigmoid (x)=1/ (1+e^-x)。

In such scheme, state vector d_tIt is in fact a kind of strategy of local optimum for the treatment of alignment relation, output Sequence and the alignment relation of list entries should be overall related rather than should be local correlation, when this means that current The alignment information at quarter not only has an impact to the decoding at current time, and should can also have an impact to follow-up decoding.Therefore, originally The alignment information d of the method current time of offer is provided_tAs in next state of extra information input to decoder.

Preferably, if recurrent neural network encoder is sandwich construction with decoder, by state vector c_tAnd/or d_tDirectly Connect and be input to the ground floor of decoder or last layer and be applied in the decoding of decoder subsequent time；If recurrent neural net Network encoder is single layer structure with decoder, then by state vector c_tAnd/or d_tIt is directly inputted to decoder.

Compared with prior art, the beneficial effects of the invention are as follows：

When next word or word of output sequence is generated, the method that the present invention is provided is not for the method for the present invention Directly decoded with the current state vector of decoder；But the current state vector s of decoder_tWith each moment of encoder State vector is contrasted, and is found out and s_tThe state vector H of most like encoder, then by H and several shapes of its both sides State vector is calculated a state vector c_t, and utilization state vector c_t, state vector s_tIt is calculated new state vector d_t, it has reacted the word currently to be generated or word should be proportionate with it, i.e., it is the next word or word that will be generated Alignment information.Finally according to state vector d_tNext word or word of output sequence are obtained to decode.By finding H and profit State vector c is generated with H_t, and utilization state vector c_t, state vector s_tIt is calculated new state vector d_t, finally using shape State vector d_tDecoding obtains next word or word of output sequence so that method is solved between output sequence and list entries Alignment relation, improve the quality and efficiency of output sequence, and make relevance between output sequence and list entries so as to Level higher can be maintained at.

Brief description of the drawings

Fig. 1 is the effect schematic diagram of encoder.

Fig. 2 is the effect schematic diagram of decoder.

Fig. 3 is the effect schematic diagram of attention mechanism.

(a), (b), (c) of Fig. 4, (d) are respectively four kinds of effect schematic diagrames of difference feed mechanism.

Specific embodiment

Accompanying drawing being for illustration only property explanation, it is impossible to be interpreted as the limitation to this patent；

Below in conjunction with drawings and Examples, the present invention is further elaborated.

Embodiment 1

The primary object of the method that the present invention is provided is that increased attention mechanism, as shown in figure 3, its is specific It is as follows：

In current time t, by the state vector s of recurrent neural network decoder_tIt is every with recurrent neural network encoder The state vector at individual moment is contrasted, and is found out and state vector s_tThe most strong state vector H of relevance, then utilization state to D state vector of amount H and the state vector H left and right sides is calculated state vector c_t, utilization state vector c_t, state vector s_tObtain new state vector d_t, then according to state vector d_tDecoding obtains next word or word of output sequence, wherein d It is the integer more than or equal to 1.

Embodiment 2

Attention mechanism solves the alignment relation between output sequence and list entries to a certain extent, so that Improve the quality and efficiency of output sequence.But attention mechanism is in fact one kind local most for the treatment of alignment relation Excellent strategy.In order to further improve the quality of output sequence i.e. source document summary, the present embodiment has used a kind of new complete The optimal strategy of office, i.e. feed mechanism.

There are two kinds of alignment informations in fact in attention mechanism, one kind is direct alignment information c_t, another kind is Indirectly alignment informationd _t, they have all reacted position and the content that next word or word in output sequence should align.Output sequence Row should be overall related rather than that should be local correlation to the alignment relation of list entries, and this means that current time Alignment information not only have an impact to the decoding at current time, and should can also have an impact to follow-up decoding.Therefore, it can Using the alignment information at current time as extra information input to next state of decoder in.

Can be individual layer, or multilayer based on recurrent neural network encoder and decoder, as shown in Figure 4. The present embodiment employs two kinds of distribution mechanisms：One is the subsequent time which kind of alignment information is input to decoder, is directly alignd Information or indirectly alignment information.Two is that last layer is also to which layer of the subsequent time of decoder by alignment information input It is last layer.Based on both mechanism, four kinds of difference feed mechanism can be obtained, as shown in figure 4, they are respectively：Indirectly Alignment information is input to last layer, and direct alignment information is input to last layer, and indirect alignment information is input to ground floor, directly Connect alignment information and be input to ground floor.

Embodiment 3

The sequence of summarization generation essentially belongs to a kind of machine learning method to the learning method of sequence, therefore Can be trained using the general training method of machine learning.In order to accelerate training speed, the present embodiment is using minimum lot size ladder Descent algorithm (mini-batch gradient descent) is spent to train.Because the vocabulary of Chinese is very big, needed for decoding process Want the time very long, in order to reduce the decoding time, the present embodiment uses word table, i.e., only use 4000 the most frequently used Chinese character conducts Word table, for other words, is replaced using special mark.Tables 1 and 2 is made by using test data set to generation method Test and evaluation

Table 1

Model	R-1	R-2	R-L	BLEU
					Multilayer RNN	30.4	16.0	27.3	9.5
Multilayer RNN+attention	32.2	17.8	28.8	11.4
					Multilayer RNN+attention+feed1	32.2	17.7	28.7	11.3
Multilayer RNN+attention+feed2	33.1	18.3	29.5	12.0
					Multilayer RNN+attention+feed3	30.1	16.7	27.6	10.3
Multilayer RNN+attention+feed4	31.1	17.1	27.9	10.9

Table 2

Model	Informedness	Grammer	Terseness
				Multilayer RNN+attention	2.87	3.95	3.28
Multilayer RNN+attention+feed1	2.85	4.01	2.93
				Multilayer RNN+attention+feed2	3.02	4.23	3.30
Multilayer RNN+attention+feed3	3.00	3.93	3.10
				Multilayer RNN+attention+feed4	2.88	4.03	3.03
Manually	3.88	4.42	3.80

Table 1 employs two kinds of machine index ROUGE and BLEU to be estimated generation method.ROUGE is from recall rate Angle assess, and BLEU is angle from accurate rate assesses.From table 1 it follows that using second feed Strategy, will be when directly alignment information be input to last layer of decoder, and the result of test behaves oneself best, more than only making With the generation method of attention mechanism, this illustrates the validity of generation method.

Table 2 employs three-type-person's work index to assess generation method.Wherein informedness reflects the forgiven information content of summary, The degree of grammer reflection summary grammatical, the succinct degree of terseness reflection summary, artificial model represents manually generated Summary info.Can show that the conclusion similar with table 1, i.e. feed mechanism have exceeded attention mechanism according to table 2.Equally will Last layer of this strategy that direct alignment information is input to decoder shows outstanding.

By two contrasts of form, it can be deduced that abstraction generating method has high efficiency, the conclusion of high accuracy, especially When being using second feed mechanism.

Obviously, the above embodiment of the present invention is only intended to clearly illustrate example of the present invention, and is not right The restriction of embodiments of the present invention.For those of ordinary skill in the field, may be used also on the basis of the above description To make other changes in different forms.There is no need and unable to be exhaustive to all of implementation method.It is all this Any modification, equivalent and improvement made within the spirit and principle of invention etc., should be included in the claims in the present invention Protection domain within.

Claims

1. a kind of abstraction generating method based on recurrent neural network, it is characterised in that：In current time t, by recurrent neural net The state vector s of network decoder_tState vector with each moment of recurrent neural network encoder is contrasted, and is found out and shape State vector s_tThe most strong state vector H of relevance, then d state of utilization state vector H and the state vector H left and right sides to Amount is calculated state vector c_t, utilization state vector c_t, state vector s_tIt is calculated new state vector d_t, then basis State vector d_tDecoding obtains next word or word of output sequence, and wherein d is the integer more than or equal to 1.

2. the abstraction generating method based on recurrent neural network according to claim 1, it is characterised in that：

It is described to find out and state vector s_tThe detailed process of the most strong state vector H of relevance is as follows：

p_{t} = n * s i g m o i d (v_{p}^{T} * \tanh (w_{p} * s_{t}))

Wherein p_tPositions of the state vector H in recurrent neural network coder state sequence vector is represented, n represents list entries Length,w_pThe parameter for needing study is represented, sigmoid functions are specifically expressed as follows：

Sigmoid (x)=1/ (1+e^-x)

Tanh function representations are as follows：

\tanh (x) = \{\begin{matrix} - 1, x < - 1 \\ x, - 1 \leq x \leq 1 \\ 1, x &GreaterEqual; 1 \end{matrix} .

3. the abstraction generating method based on recurrent neural network according to claim 2, it is characterised in that：It is described to calculate To state vector c_tDetailed process it is as follows：

c_{t} = Σ_{i = p_{t} - d}^{p_{t} + d} α_{t i} * h_{i}

4. the abstraction generating method based on recurrent neural network according to claim 3, it is characterised in that：The α_tiAsk Take process as follows：

α_{t i} = \exp (e_{t i}) / Σ_{k = 1}^{n} e_{t k}

e_ti=s_i*h_i。

5. the abstraction generating method based on recurrent neural network according to claim 1, it is characterised in that：

The state vector d_tCalculating process it is as follows：

d_t=sigmoid (c_t*s_t)

Wherein, sigmoid functions are specifically expressed as follows：

Sigmoid (x)=1/ (1+e^-x)。

6. the abstraction generating method based on recurrent neural network according to claim 1, it is characterised in that：By state vector c_tAnd/or d_tIt is added in the decoder of decoder subsequent time.

7. the abstraction generating method based on recurrent neural network according to claim 6, it is characterised in that：If recurrent neural Network encoder is sandwich construction with decoder, then by state vector c_tAnd/or d_tIt is directly inputted to the ground floor or most of decoder Later layer is applied in the decoding of decoder subsequent time；If recurrent neural network encoder is single layer structure with decoder, Then by state vector c_tAnd/or d_tIt is directly inputted to decoder.