CN109508457A

CN109508457A - A kind of transfer learning method reading series model based on machine

Info

Publication number: CN109508457A
Application number: CN201811284309.2A
Authority: CN
Inventors: 潘博远; 蔡登�; 李�昊; 陈哲乾; 赵洲; 何晓飞
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2019-03-22
Anticipated expiration: 2038-10-31
Also published as: CN109508457B

Abstract

The invention discloses a kind of transfer learning methods that series model is read based on machine, comprising the following steps: (1) one machine reading model of pre-training, the machine reading model include coding layer and model layer based on Recognition with Recurrent Neural Network；(2) series model is established, the series model includes the encoder and decoder based on Recognition with Recurrent Neural Network；(3) parameter for extracting coding layer and model layer in trained machine reading model, moves in series model to be trained, partially-initialized parameter when as training sequence model；(4) training sequence model, until model is restrained；(5) text sequence is carried out using trained series model predict task.Using the present invention, it can more in depth excavate text and contain information, promote the quality for generating text sequence.

Description

A kind of transfer learning method reading series model based on machine

Technical field

The invention belongs to natural language processing technique fields, read series model based on machine more particularly, to a kind of Transfer learning method.

Background technique

Machine reading is one of the most popular and stubborn problem in natural language processing, it requires model to understand nature language It says and existing knowledge can be used.Task most popular at present can generally give an article and a problem, it would be desirable to Answer is found from article according to problem.With the publication of several quality data collection in recent years, based on neural network Performance of the model in machine reading is become better and better, or even on some data sets has been more than the mankind.One efficient machine is read Read model can various fields based on by semantic understanding be used widely, such as talk with robot, question answering system and search Index is held up.

The series model of subsidiary attention mechanism is mainly made of an encoder and a decoder, encoder will be defeated Simultaneously formation sequence is sequentially output by decoder after the sequential coding entered.Such structure is turned in spatial term task such as machine It translates, huge success is achieved in text snippet and conversational system.However, in the such coder-decoder of training, we Simply by the fixed reference sample of the Comparative result of output is optimized, be difficult it is deep understand contain in text it is potential Semantic information.

Transfer learning refers to the knowledge of multiple fields or feature in conjunction with establishing new model and probability distribution.In nature Language Processing field, transfer learning are widely used.For example it is published within 2011 the international top machine Learning Theory phase Print " the Natural Language Processing on Journal of Machine Learning Research (almost) from Scratch " it discloses a kind of unified neural network structure and unsupervised learning can be applied to numerous Natural language processing task such as part-of-speech tagging, entity are named in identification；It is published within 2017 international top calculating neural theory meeting Discuss " the Learned on Annual Conference on Neural Information Processing Systems Translation:Contextualized Word Vectors " disclose a kind of encoder pre-training by machine translation after It moves in text categorization task and question answering system, the richness of original term vector is promoted as a kind of new term vector； It is published within 2018 international top natural language processing meeting Proceedings of the 56th Annual Meeting of " Discourse Marker Augmented on the Association for Computational Linguistics Network with Reinforcement Learning for Natural Language Inference " disclose one kind Training method based on conjunction, one encoder of training first in conjunction prediction task, is then embedded in one for this encoder The logical capability of model is improved in a natural language inference model.

However, existing natural language processing transfer learning method is few to be transferred to other tasks for multilayer neural network On, coding layer migration can only be lost to the information of a large amount of original pre-training models.

Summary of the invention

The present invention provides a kind of transfer learning methods that series model is read based on machine, can more in depth dig Pick text contains information, promotes the quality for generating text sequence.

The technical solution adopted by the invention is as follows:

A kind of transfer learning method reading series model based on machine, comprising the following steps:

(1) one machine reading model of pre-training, the machine reading model include the coding based on Recognition with Recurrent Neural Network Layer and model layer；

(2) establish a series model, the series model include encoder based on Recognition with Recurrent Neural Network, decoder and Attention mechanism；

(3) parameter for extracting coding layer and model layer in trained machine reading model, moves to sequence to be trained Partially-initialized parameter in model, as training sequence model；

(4) training sequence model, until model is restrained；

(5) text sequence is carried out using trained series model predict task.

The present invention pre-training one first machine reading model comprising coding layer and model layer is as migration source, then Its coding layer and model layer are embedded into series model and merged with existing coding result, the probability point of final output label Cloth.This method can help series model to be more fully understood from text to accumulate meaning and generate more natural text.

In step (1), the Recognition with Recurrent Neural Network in the coding layer is two-way length memory network in short-term, the model Recognition with Recurrent Neural Network in layer is unidirectional long memory network in short-term.

In step (1), the specific steps of pre-training machine mould are as follows:

(1-1) selects training data, does word insertion to input text using term vector Glove, is sent into coding layer later Two-way length is in short-term in memory network；

(1-2) links together each hidden unit side by side forms the expression of the entire sentence of the direction, and by two sides To sentence expression be incorporated as the final expression of list entries；

The final expression of article sequence and the final expression of sequence of question are combined the attention machine for being sent to model by (1-3) In system, attention matrix is exported；

(1-4) uses hiding for the network in the unidirectional long memory network in short-term of attention Input matrix to model layer Unit carries out regularization, exports the probability distribution of prediction；

(1-5) repeats the above steps, until machine reading model is restrained.

In step (2), series model is mainly made of an encoder and a decoder, for the ginseng with migration source Number keeps unified, same using long major parameter component part of the memory network as series model in short-term, the encoder In Recognition with Recurrent Neural Network be two-way length memory network in short-term.

In step (3), the coding layer and model layer parameter of extraction are the Recognition with Recurrent Neural Network in coding layer and model layer.It will The network of coding layer and the network of model layer extract respectively, move in series model to be trained, as training sequence The partially-initialized parameter of model.

The specific steps of step (4) are as follows:

The volume for the machine reading model that the word sequence of input is sent into the encoder of series model by (4-1) simultaneously and migration comes Merging vector in code layer, after being encoded；

(4-2) will merge vector one unidirectional long short-term memory of feeding and integrate, and obtain to input text sequence integration Coding vector afterwards；

(4-3) using the coding vector after integration as the initialization vector of decoder, and by the hidden unit of decoder and The unit for integrating vector carries out attention interaction, and gain attention force vector a_t, wherein t is decoded t-th of word；

(4-4) will pay attention to force vector a_tIt is input in the machine reading model layer that migration comes, then by the output of model layer Vector r_tWith attention force vector a_tIt is integrated and is sent into softmax function with linear function and obtain the probability distribution of forecasting sequence；Institute The formula for the softmax function stated are as follows:

P(y_t|y_{< t}, x) and=softmax (W_pa_t+W_qr_t+b_p)

Wherein, W_p、W_qAnd b_pIt is all parameter to be trained, y_tIt is t-th of word of decoder output.

(4-5) repeats the above steps, until model is restrained.

The invention has the following advantages:

1, the Knowledge Conversion learned in other question answering systems has been arrived text generation task using transfer learning and worked as by the present invention In, the accuracy rate of coder-decoder structure is improved, entire model simple is intuitive.

2, the present invention takes full advantage of the high-performance of existing machine reading model, and the parameter of migration contains multilayer nerve net Trained machine reading model parameter alternative sequence model parameter random initializtion can be helped series model more by network It in depth excavates the information that text contains and promotes the quality for generating text sequence so that content more has depth.

Detailed description of the invention

Fig. 1 is a kind of flow diagram for the transfer learning method that series model is read based on machine of the present invention；

Fig. 2 is the overall structure diagram of machine reading model and series model in the present invention.

Specific embodiment

It is clear in order to be more clear the purpose of the present invention, technical solution and advantageous effects, below in conjunction with attached drawing into One step detailed description of the present invention technology contents and specific embodiment.It should be understood that described in this specification specific Embodiment is not intended to limit the present invention just for the sake of explaining the present invention.

As shown in Figure 1, a kind of transfer learning method for reading series model based on machine, comprising the following steps:

S01, one machine reading model of pre-training.

We use question and answer data set SQuAD this extensive high quality corpus in Stamford as training set, we Task is a given article and a problem to predict answer, which is a continuation field in article.

Referring to fig. 2, it is embedding that we do word to input text with existing term vector Glove to the structure of machine reading model Enter, is sent into the two-way length of coding layer (Encoding Layer) later in short-term in memory network (BiLSTM).We are each hidden Hiding unit all connects together side by side forms the expression of the entire sentence of the direction, and the expression of the sentence of both direction is merged Final expression as list entries.Then, the expression of article sequence and the expression of sequence of question are combined and are sent to note by we In meaning power mechanism (Attention Mechanism).Attention mechanism is a series of regularization linear operations and logical operation group At function, specifically can refer to and be published within 2017 international top study and characterize meeting International Conference " Bi-directional Attention Flow for Machine on Learning Representations Comprehension " in page 3 to page 4 of content.The output of our attention mechanism is an article word quantity The matrix of attention vector composition.Finally, we are by attention Input matrix to model layer (Modeling Layer) In unidirectional long memory network (LSTM) in short-term, and regularization is carried out using the hidden unit of the network and uses softmax function To export the probability distribution of prediction.

S02, the coding layer and model layer parameter of extraction machine reading model.In step S01, we refer to length in short-term Memory network, this is one kind of Recognition with Recurrent Neural Network and the parameter that we will extract.We by the network of coding layer and The network of model layer extracts respectively, prepares the initiation parameter as next task.

S03, the parameter extracted in step S02 is embedded in series model, the initialization as the partial parameters.

Referring to fig. 2, series model is mainly by an encoder (Encoder) and a decoder for the structure of series model (Decoder) it forms, in order to keep unified with the parameter in migration source, we are equally using long memory network in short-term as sequence The major parameter component part of model.The word sequence of input is sent to encoder and the migration of series model by we simultaneously first In the encoder of the machine reading model come, the merging vector after being encoded；Then it will merge vector and be sent into a unidirectional length Short-term memory network is integrated, and obtains the encoder of two kinds of separate sources to the coding vector after input text sequence integration. We by the hidden unit of decoder and integrate vector using the coding vector after integration as the initialization vector of decoder Unit carries out attention interaction, and gain attention force vector a_t, wherein t is decoded t-th of word.For general series model For, notice that force vector is eventually sent to the probability distribution for carrying out regularization in a softmax function and generating prediction:

P(y_t|y_{< t}, x) and=softmax (W_pa_t+b_p)

Wherein W_pAnd b_pIt is all parameter to be trained, y_tIt is t-th of word of decoder output.However, in side of the invention In method, we will first notice that force vector is input in the machine reading model layer that migration comes, then by the output vector of model layer r_tIt integrates and is sent into softmax function with original attention force vector linear function and obtain the probability distribution of forecasting sequence:

P(y_t|y_{< t}, x) and=softmax (W_pa_t+W_qr_t+b_p)

Wherein, W_qIt is parameter to be trained.

S04, is initialization with trained transfer parameter, and other parameters are that random initializtion starts training sequence pattern die Type, until convergence.

S05 carries out text sequence using trained model and predicts task, such as machine translation, text snippet etc..

In order to prove the validity of the method for the present invention, in neural machine translation and production text snippet the two tasks Comparative experiments is carried out.In machine translation task, we use WMT2014 and WMT2015 English to the corpus of German； On carrying out text snippet task, we use the two data sets of CNN/Daily Mail and Gigaword.Wherein CNN/ Daily Mail contains 287k training data pair after pretreatment, and Gigaword contains 3.8M training data after pretreatment It is right.

The contrast and experiment of machine translation task is as shown in table 1.In table 1, the first column is basic model, and middle column is this The addition one by one of each details of method, last column are this method.As can be seen that in machine translation task, it is of the invention Method (MacNet) comparison basic model (Baseline) is obviously improved, and comparative test card has all been carried out in each details Its bright validity.

Table 1

The contrast and experiment of text snippet task is as shown in table 2.This experiment is imitated on text snippet test set with current The best presentation method of fruit compares.Method of the invention (Pointer-Generator+MacNet) phase on the whole Than there is higher accuracy rate in other methods, and all reach in majority parameters on both data sets at present most Good effect.

Table 2

In addition to this, we also specifically illustrate several examples and prove to pluck generation text before and after the method for the present invention is added The visual influence wanted, as shown in table 3.

Table 3

In upper table, PG is the abbreviation of basic model pointer generator, Reference be in data set to ginseng Answer is examined, PG+Macnet is the model that the method for the present invention is added.As can be seen that when occurring uncommon word in original text, it is former Some basic models are difficult to sum up preferable subject-predicate object；And longer in original text and when structure is complicated, original basic model Even there is faulty wording.However after added method of the invention, the text snippet sentence smoothness ultimately generated is naturally, expressed Main body general idea also substantially in place.

Content described in this specification embodiment is only the explanation to invention, is not to limit the present invention System, protection scope of the present invention is not construed as being only limitted to particular content described in embodiment, in spirit and original of the invention Made any modification, replacement and change etc., are included within the scope of protection of the present invention within then.

Claims

1. a kind of transfer learning method for reading series model based on machine, which comprises the following steps:

(1) one machine reading model of pre-training, the machine reading model include coding layer based on Recognition with Recurrent Neural Network and Model layer；

(2) series model is established, the series model includes encoder, decoder and attention based on Recognition with Recurrent Neural Network Power mechanism；

(3) parameter for extracting coding layer and model layer in trained machine reading model, moves to series model to be trained In, partially-initialized parameter when as training sequence model；

(4) training sequence model, until model is restrained；

(5) text sequence is carried out using trained series model predict task.

2. the transfer learning method according to claim 1 for reading series model based on machine, which is characterized in that step (1) in, the Recognition with Recurrent Neural Network in the coding layer is two-way length memory network in short-term, the circulation mind in the model layer It is unidirectional long memory network in short-term through network.

3. the transfer learning method according to claim 2 for reading series model based on machine, which is characterized in that step (1) in, the specific steps of pre-training are as follows:

(1-1) selects training data, does word insertion to input text using term vector Glove, is sent into the two-way of coding layer later In long memory network in short-term；

(1-2) links together each hidden unit side by side forms the expression of the entire sentence of the direction, and by both direction Sentence expresses the final expression for being incorporated as list entries；

The final expression of article sequence and the final expression of sequence of question are combined the attention mechanism for being sent to model by (1-3) In, export attention matrix；

(1-4) uses the hidden unit of the network in the unidirectional long memory network in short-term of attention Input matrix to model layer Regularization is carried out, the probability distribution of prediction is exported；

(1-5) repeats the above steps, until machine reading model is restrained.

4. the transfer learning method according to claim 1 for reading series model based on machine, which is characterized in that step (2) in, the Recognition with Recurrent Neural Network in the encoder is two-way length memory network in short-term.

5. the transfer learning method according to claim 1 for reading series model based on machine, which is characterized in that step (4) specific steps are as follows:

The coding layer for the machine reading model that the word sequence of input is sent into the encoder of series model by (4-1) simultaneously and migration comes In, the merging vector after being encoded；

(4-2) will merge vector one unidirectional long short-term memory of feeding and integrate, and obtain to after input text sequence integration Coding vector；

(4-3) using the coding vector after integration as the initialization vector of decoder, and by the hidden unit of decoder and integration The unit of vector carries out attention interaction, and gain attention force vector a_t, wherein t is decoded t-th of word；

(4-4) will pay attention to force vector a_tIt is input in the machine reading model layer that migration comes, then by the output vector r of model layer_t With attention force vector a_tIt is integrated and is sent into softmax function with linear function and obtain the probability distribution of forecasting sequence；

(4-5) repeats the above steps, until model is restrained.

6. the transfer learning method according to claim 5 for reading series model based on machine, which is characterized in that step In (4-4), the formula of the softmax function are as follows:

P(y_t|y_{< t}, x) and=softmax (W_pa_t+W_qr_t+b_p)