CN109508457B - Transfer learning method based on machine reading to sequence model - Google Patents

Transfer learning method based on machine reading to sequence model Download PDF

Info

Publication number
CN109508457B
CN109508457B CN201811284309.2A CN201811284309A CN109508457B CN 109508457 B CN109508457 B CN 109508457B CN 201811284309 A CN201811284309 A CN 201811284309A CN 109508457 B CN109508457 B CN 109508457B
Authority
CN
China
Prior art keywords
model
sequence
vector
layer
machine reading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811284309.2A
Other languages
Chinese (zh)
Other versions
CN109508457A (en
Inventor
潘博远
蔡登�
李�昊
陈哲乾
赵洲
何晓飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201811284309.2A priority Critical patent/CN109508457B/en
Publication of CN109508457A publication Critical patent/CN109508457A/en
Application granted granted Critical
Publication of CN109508457B publication Critical patent/CN109508457B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a transfer learning method based on machine reading to a sequence model, which comprises the following steps: (1) pre-training a machine reading model, wherein the machine reading model comprises a coding layer and a model layer based on a recurrent neural network; (2) establishing a sequence model, wherein the sequence model comprises a coder and a decoder based on a recurrent neural network; (3) extracting parameters of a coding layer and a model layer in a trained machine reading model, transferring the parameters into a sequence model to be trained, and using the parameters as part of initialization parameters when the sequence model is trained; (4) training the sequence model until the model converges; (5) and performing a text sequence prediction task by using the trained sequence model. By using the method and the device, the text inclusion information can be deeply mined, and the quality of the generated text sequence is improved.

Description

Transfer learning method based on machine reading to sequence model
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a transfer learning method based on machine reading to a sequence model.
Background
Machine reading is one of the most popular and troublesome problems in natural language processing, requiring models to understand natural language and be able to exploit existing knowledge. The most popular task at present is to give an article and a question, and we need to find the answer from the article according to the question. With the recent release of several high quality data sets, neural network based models have performed better and better on machine reading, even beyond humans on some data sets. An efficient machine-reading model can be widely applied to a plurality of fields based on semantic understanding, such as a conversation robot, a question-answering system, a search engine and the like.
The sequence model with attention mechanism mainly comprises an encoder and a decoder, wherein the encoder encodes an input sequence and then the decoder sequentially outputs the encoded input sequence and generates a sequence. Such structures have enjoyed tremendous success in natural language generation tasks such as machine translation, text summarization and dialog systems. However, when training such encoder-decoder, we can only optimize the output result against a fixed reference sample, and it is difficult to deeply understand the latent semantic information contained in the text.
And (4) transfer learning, which means that knowledge or characteristics in various fields are combined to establish a new model and probability distribution. In the field of natural language processing, transfer learning is widely applied. For example, in 2011, Natural Language Processing (almost) from Scratch published in the Journal of international top-level Machine Learning theory Research of international top-level Machine Learning theory discloses a uniform neural network structure and can apply unsupervised Learning to a plurality of Natural Language Processing tasks such as part of speech tagging and entity naming recognition; a ' rounded in transitions ' context-transformed Vectors ' published in 2017 on the International Top-level computing Neural theory Conference on Neural Information Processing Systems discloses a method for migrating a machine-translated coder after pre-training to a text classification task and question-answering system as a new Word vector to improve the richness of the original Word vector; a training method based on conjunctions is disclosed in 'Disable Marker augmented network with relationship Learning and Learning for Natural Language Inference' published in International Top-level Natural Language processing conference Proceedings of the 56th Annual Meeting and for computerized Linguities in 2018.
However, the existing natural language processing migration learning method rarely transfers the multilayer neural network to other tasks, and only migrating the coding layer can lose a large amount of information of the original pre-training model.
Disclosure of Invention
The invention provides a transfer learning method based on a sequence model read by a machine, which can more deeply mine text inclusion information and improve the quality of a generated text sequence.
The technical scheme adopted by the invention is as follows:
a migration learning method based on machine reading to sequence model comprises the following steps:
(1) pre-training a machine reading model, wherein the machine reading model comprises a coding layer and a model layer based on a recurrent neural network;
(2) establishing a sequence model, wherein the sequence model comprises an encoder, a decoder and an attention mechanism based on a recurrent neural network;
(3) extracting parameters of a coding layer and a model layer in a trained machine reading model, transferring the parameters into a sequence model to be trained, and using the parameters as part of initialization parameters of the training sequence model;
(4) training the sequence model until the model converges;
(5) and performing a text sequence prediction task by using the trained sequence model.
The method comprises the steps of pre-training a machine reading model comprising a coding layer and a model layer to serve as a migration source, embedding the coding layer and the model layer into a sequence model to be fused with an existing coding result, and finally outputting probability distribution of labels. The method can help the sequence model to understand the meaning of the text more deeply and generate a more natural text.
In the step (1), the recurrent neural network in the coding layer is a bidirectional long-short time memory network, and the recurrent neural network in the model layer is a unidirectional long-short time memory network.
In the step (1), the pre-training machine model comprises the following specific steps:
(1-1) selecting training data, performing word embedding on an input text by using a word vector Glove, and then sending the word embedded word into a bidirectional long-time memory network of a coding layer;
(1-2) connecting each hidden unit side by side to form the expression of the whole sentence in the direction, and combining the sentence expressions in two directions to be used as the final expression of the input sequence;
(1-3) combining the final expression of the article sequence and the final expression of the question sequence into an attention mechanism of a model, and outputting an attention matrix;
(1-4) inputting an attention moment array into a one-way long-short time memory network of a model layer, regularizing by using a hidden unit of the network, and outputting predicted probability distribution;
(1-5) repeating the above steps until the machine reading model converges.
In the step (2), the sequence model mainly comprises an encoder and a decoder, in order to keep the same with the parameters of the migration source, a long-time and short-time memory network is also adopted as the main parameter component of the sequence model, and a recurrent neural network in the encoder is a bidirectional long-time and short-time memory network.
In the step (3), the extracted parameters of the coding layer and the model layer are cyclic neural networks in the coding layer and the model layer. And respectively extracting the network of the coding layer and the network of the model layer, and transferring the networks into a sequence model to be trained to be used as part of initialization parameters of the training sequence model.
The specific steps of the step (4) are as follows:
(4-1) simultaneously sending the input word sequence into an encoder of the sequence model and a coding layer of the migrated machine reading model to obtain a coded merging vector;
(4-2) sending the merged vector into a one-way long-short-time memory for integration to obtain a coded vector integrated with the input text sequence;
(4-3) taking the integrated coding vector as an initialization vector of a decoder, and performing attention interaction on a hiding unit of the decoder and a unit integrating the vector to obtain an attention vector atWhere t is the t-th word decoded;
(4-4) attention vector atInputting the model layer into the migrated machine reading model layer, and then outputting the output vector r of the model layertAnd attention vector atIntegrating by using a linear function and sending the integrated result into a softmax function to obtain the probability distribution of the prediction sequence; the formula of the softmax function is as follows:
P(yt|y<t,x)=softmax(Wpat+Wqrt+bp)
wherein, Wp、WqAnd bpAre all parameters to be trained, ytIs the t-th word output by the decoder.
(4-5) repeating the above steps until the model converges.
The invention has the following beneficial effects:
1. the invention uses transfer learning to transfer the knowledge learned in other question-answering systems to the text generation task, thus improving the accuracy of the structure of the coder-decoder and ensuring the whole model to be simple and visual.
2. The method fully utilizes the high performance of the existing machine reading model, the transferred parameters comprise multilayer neural networks, the trained machine reading model parameters are randomly initialized instead of the sequence model parameters, and the sequence model can be helped to more deeply mine the information contained in the text, so that the content is deeper, and the quality of the generated text sequence is improved.
Drawings
FIG. 1 is a flow chart of a transfer learning method based on machine reading to sequence model according to the present invention;
FIG. 2 is a schematic diagram of the overall structure of the machine reading model and the sequence model of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantageous effects of the present invention more clearly apparent, the technical contents and specific embodiments of the present invention are described in further detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, a migration learning method based on machine reading to sequence model includes the following steps:
s01, pre-training a machine reading model.
We use the Stanford question-answer dataset SQuAD, a large-scale, high-quality corpus, as a training set, and our task is to predict the answer, which is a continuous field in an article, given an article and a question.
Referring to fig. 2, the existing word vector Glove is used for word embedding of an input text, and then the word vector Glove is sent into a bidirectional long-time memory network (BiLSTM) of an Encoding Layer (Encoding Layer). We join each hidden unit side by side to form the expression of the whole sentence in the direction, and merge the sentence expressions in two directions together as the final expression of the input sequence. Subsequently, we incorporated the expression of the article sequence and the expression of the question sequence into the Attention Mechanism (Attention Mechanism). The Attention mechanism is a function composed of a series of regularized linear operations and logical operations, and can be specifically referred to the contents from page 3 to page 4 in Bi-directional orientation Flow for mechanism compression published in the International conference on Learning and characterization conference International in 2017. The output of our attention mechanism is a matrix of attention vectors for the number of article words. Finally, we input the attention matrix into a model Layer (Modeling Layer) one-way long-short time memory network (LSTM), and use hidden units of the network to regularize and use the softmax function to output the predicted probability distribution.
And S02, extracting the coding layer and model layer parameters of the machine reading model. In step S01, we refer to a long-term and short-term memory network, which is a kind of recurrent neural network and is also the parameter we will extract. The network of the coding layer and the network of the model layer are extracted respectively and prepared to be used as initialization parameters of the next task.
S03, the parameters extracted in step S02 are embedded in the sequence model as initialization of the partial parameters.
Structure of sequence model referring to fig. 2, the sequence model mainly consists of an Encoder (Encoder) and a Decoder (Decoder), and in order to keep the same with the parameters of the migration source, we also use a long-term memory network as the main parameter component of the sequence model. We first input the input word sequence into the sequence model encoder and migrate itObtaining a coded merging vector in a coder of the machine reading model; and then, sending the merged vector into a one-way long-short time memory network for integration to obtain a coding vector integrated by the coders from two different sources on the input text sequence. The integrated coding vector is used as an initialization vector of a decoder, and attention interaction is carried out on a hiding unit of the decoder and a unit of the integrated vector to obtain an attention vector atWhere t is the t-th word decoded. For a general sequence model, the attention vector is finally sent to a softmax function for regularization and generation of a predicted probability distribution:
P(yt|y<t,x)=softmax(Wpat+bp)
wherein WpAnd bpAre all parameters to be trained, ytIs the t-th word output by the decoder. However, in the method of the present invention, we first input the attention vector into the migrated machine-read model layer, and then input the model layer's output vector rtAnd integrating the predicted sequence with the original attention vector by using a linear function and sending the integrated predicted sequence into a softmax function to obtain the probability distribution of the predicted sequence:
P(yt|y<t,x)=softmax(Wpat+Wqrt+bp)
wherein, WqIs the parameter to be trained.
And S04, starting training the sequence model by taking the trained migration parameters as initialization and other parameters as random initialization until convergence.
And S05, performing text sequence prediction tasks such as machine translation, text summarization and the like by using the trained model.
In order to prove the effectiveness of the method, a comparison experiment is carried out on two tasks of neural machine translation and generation type text summarization. On a machine translation task, a WMT2014 and WMT2015 English-to-German corpus is adopted; on the task of text summarization, two data sets of CNN/Daily Mail and Gigaword are adopted. The CNN/Daily Mail contains 287k training data pairs after being preprocessed, and the Gigaword contains 3.8M training data pairs after being preprocessed.
The results of comparative experiments on machine translation tasks are shown in table 1. In table 1, the first column is the base model, the middle column is the one-by-one addition of the details of the method, and the last column is the method. It can be seen that, on the machine translation task, the method (MacNet) of the invention is obviously improved compared with a basic model (Baseline), and the effectiveness is proved by performing comparison tests on all details.
TABLE 1
Figure BDA0001848698620000071
The results of the comparative experiments for the text summarization task are shown in table 2. This experiment was compared on the text summary test set with the published method that works best currently. Overall, the method of the invention (Pointer-Generator + MacNet) has a higher accuracy than other methods and achieves the best results at present on most of the indices on both data sets.
TABLE 2
Figure BDA0001848698620000072
In addition, we show in detail several examples demonstrating the visual impact on generating text summaries before and after the incorporation of the method of the present invention, as shown in table 3.
TABLE 3
Figure BDA0001848698620000073
In the above table, PG is an abbreviation of a basic model pointer generator, Reference is a Reference answer given in a data set, and PG + Macnet is a model added to the method of the present invention. It can be seen that when an uncommon word appears in the original text, the original basic model is difficult to summarize a better subject-to-predicate object; and when the original text is long and the structure is complex, the original basic model even shows the language sickness. However, after the method of the invention is added, the finally generated text abstract sentences are smooth and natural, and the expressed main body idea is basically in place.
The embodiments described in this specification are only for illustrative purposes and are not intended to limit the invention, the scope of the invention should not be limited to the specific embodiments described in the embodiments, and any modifications, substitutions, changes, etc. within the spirit and principle of the invention are included in the scope of the invention.

Claims (5)

1. A migration learning method based on machine reading to sequence model is characterized by comprising the following steps:
(1) pre-training a machine reading model, wherein the machine reading model comprises a coding layer and a model layer based on a recurrent neural network;
(2) establishing a sequence model, wherein the sequence model comprises an encoder, a decoder and an attention mechanism based on a recurrent neural network;
(3) extracting parameters of a coding layer and a model layer in a trained machine reading model, transferring the parameters into a sequence model to be trained, and using the parameters as part of initialization parameters when the sequence model is trained;
(4) training a sequence model, specifically comprising the following steps:
(4-1) simultaneously sending the input word sequence into an encoder of the sequence model and a coding layer of the migrated machine reading model to obtain a coded merging vector;
(4-2) sending the merged vector into a one-way long-short-time memory for integration to obtain a coded vector integrated with the input text sequence;
(4-3) taking the integrated coding vector as an initialization vector of a decoder, and performing attention interaction on a hiding unit and a vector integrating unit of the decoder to obtain an attention vector
Figure 314978DEST_PATH_IMAGE001
WhereintIs the first to decodetA word;
(4-4) will be notedVector of the intention force
Figure 777184DEST_PATH_IMAGE001
Inputting the model layer into the migrated machine reading model layer, and then outputting the output vector of the model layer
Figure 265934DEST_PATH_IMAGE002
And attention vector
Figure 206208DEST_PATH_IMAGE001
Integrating by using a linear function and sending the integrated result into a softmax function to obtain the probability distribution of the prediction sequence;
(4-5) repeating the above steps until the model converges;
(5) and performing a text sequence prediction task by using the trained sequence model.
2. The method according to claim 1, wherein in step (1), the recurrent neural network in the coding layer is a bidirectional long-term and short-term memory network, and the recurrent neural network in the model layer is a unidirectional long-term and short-term memory network.
3. The machine-readable sequence model-based migration learning method according to claim 2, wherein in the step (1), the pre-training comprises the following specific steps:
(1-1) selecting training data, performing word embedding on an input text by using a word vector Glove, and then sending the word embedded word into a bidirectional long-time memory network of a coding layer;
(1-2) connecting each hidden unit side by side to form the expression of the whole sentence in the direction, and combining the sentence expressions in two directions to be used as the final expression of the input sequence;
(1-3) combining the final expression of the article sequence and the final expression of the question sequence into an attention mechanism of a model, and outputting an attention matrix;
(1-4) inputting an attention moment array into a one-way long-short time memory network of a model layer, regularizing by using a hidden unit of the network, and outputting predicted probability distribution;
(1-5) repeating the above steps until the machine reading model converges.
4. The method according to claim 1, wherein in step (2), the recurrent neural network in the encoder is a bidirectional long-term and short-term memory network.
5. The machine-readable sequence model-based migration learning method according to claim 1, wherein in step (4-4), the formula of the softmax function is:
Figure DEST_PATH_IMAGE003
wherein,
Figure 667276DEST_PATH_IMAGE004
Figure 565962DEST_PATH_IMAGE005
and
Figure 276429DEST_PATH_IMAGE006
are all the parameters to be trained and,
Figure 285974DEST_PATH_IMAGE007
is the output of the decodertA word.
CN201811284309.2A 2018-10-31 2018-10-31 Transfer learning method based on machine reading to sequence model Active CN109508457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811284309.2A CN109508457B (en) 2018-10-31 2018-10-31 Transfer learning method based on machine reading to sequence model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811284309.2A CN109508457B (en) 2018-10-31 2018-10-31 Transfer learning method based on machine reading to sequence model

Publications (2)

Publication Number Publication Date
CN109508457A CN109508457A (en) 2019-03-22
CN109508457B true CN109508457B (en) 2020-05-29

Family

ID=65747209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811284309.2A Active CN109508457B (en) 2018-10-31 2018-10-31 Transfer learning method based on machine reading to sequence model

Country Status (1)

Country Link
CN (1) CN109508457B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200364303A1 (en) * 2019-05-15 2020-11-19 Nvidia Corporation Grammar transfer using one or more neural networks
CN110188182B (en) * 2019-05-31 2023-10-27 中国科学院深圳先进技术研究院 Model training method, dialogue generating method, device, equipment and medium
CN110188331B (en) * 2019-06-03 2023-05-26 腾讯科技(深圳)有限公司 Model training method, dialogue system evaluation method, device, equipment and storage medium
CN110415702A (en) * 2019-07-04 2019-11-05 北京搜狗科技发展有限公司 Training method and device, conversion method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228571A (en) * 2018-02-01 2018-06-29 北京百度网讯科技有限公司 Generation method, device, storage medium and the terminal device of distich

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521656B (en) * 2011-12-29 2014-02-26 北京工商大学 Integrated transfer learning method for classification of unbalance samples
US20160350653A1 (en) * 2015-06-01 2016-12-01 Salesforce.Com, Inc. Dynamic Memory Network
US10776707B2 (en) * 2016-03-08 2020-09-15 Shutterstock, Inc. Language translation based on search results and user interaction data
CN105787560B (en) * 2016-03-18 2018-04-03 北京光年无限科技有限公司 Dialogue data interaction processing method and device based on Recognition with Recurrent Neural Network
US20180260474A1 (en) * 2017-03-13 2018-09-13 Arizona Board Of Regents On Behalf Of The University Of Arizona Methods for extracting and assessing information from literature documents
CN107341146B (en) * 2017-06-23 2020-08-04 上海交大知识产权管理有限公司 Migratable spoken language semantic analysis system based on semantic groove internal structure and implementation method thereof
CN107590138B (en) * 2017-08-18 2020-01-31 浙江大学 neural machine translation method based on part-of-speech attention mechanism

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228571A (en) * 2018-02-01 2018-06-29 北京百度网讯科技有限公司 Generation method, device, storage medium and the terminal device of distich

Also Published As

Publication number Publication date
CN109508457A (en) 2019-03-22

Similar Documents

Publication Publication Date Title
CN109508457B (en) Transfer learning method based on machine reading to sequence model
CN107357789B (en) Neural machine translation method fusing multi-language coding information
CN108717574B (en) Natural language reasoning method based on word connection marking and reinforcement learning
CN111783462A (en) Chinese named entity recognition model and method based on dual neural network fusion
WO2021022816A1 (en) Intent identification method based on deep learning network
CN109992669B (en) Keyword question-answering method based on language model and reinforcement learning
CN111078866B (en) Chinese text abstract generation method based on sequence-to-sequence model
CN111723547A (en) Text automatic summarization method based on pre-training language model
CN111581962B (en) Text representation method based on subject word vector and hybrid neural network
CN108549644A (en) Omission pronominal translation method towards neural machine translation
CN110765264A (en) Text abstract generation method for enhancing semantic relevance
CN110874411A (en) Cross-domain emotion classification system based on attention mechanism fusion
CN116306652A (en) Chinese naming entity recognition model based on attention mechanism and BiLSTM
CN114881042B (en) Chinese emotion analysis method based on graph-convolution network fusion of syntactic dependency and part of speech
CN113407663B (en) Image-text content quality identification method and device based on artificial intelligence
Li et al. Cm-gen: A neural framework for chinese metaphor generation with explicit context modelling
KR20210058059A (en) Unsupervised text summarization method based on sentence embedding and unsupervised text summarization device using the same
CN113887251A (en) Mongolian Chinese machine translation method combining Meta-KD framework and fine-grained compression
CN113743095A (en) Chinese problem generation unified pre-training method based on word lattice and relative position embedding
CN117932066A (en) Pre-training-based 'extraction-generation' answer generation model and method
CN114997143B (en) Text generation model training method and system, text generation method and storage medium
CN114519353B (en) Model training method, emotion message generation method and device, equipment and medium
CN113377908B (en) Method for extracting aspect-level emotion triple based on learnable multi-word pair scorer
Cho Introduction to neural machine translation with GPUs (part 3)
Wang Text emotion detection based on Bi-LSTM network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant