CN111581350A

CN111581350A - Multi-task learning, reading and understanding method based on pre-training language model

Info

Publication number: CN111581350A
Application number: CN202010365779.2A
Authority: CN
Inventors: 王春辉; 胡勇
Original assignee: Knowledge Intelligence Technology Beijing Co ltd
Current assignee: Knowledge Intelligence Technology Beijing Co ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-08-25

Abstract

The invention discloses a multi-task learning, reading and understanding method based on a pre-training language model. The method comprises the following steps: training based on a corpus to establish a pre-training language model, and obtaining context perception representation of input documents and problems by using the pre-training language model; obtaining the vector representation of each word by setting semantic information between interaction layer fusion problems formed by attention networks and documents; and performing multi-task learning based on whether the question can be answered or not and the task of obtaining the answer to obtain the result whether the question can be answered or not and the answer to the question. According to the method, the inclusion relation between sentence pairs can be obtained by establishing the pre-training language model; semantic information between the problems and the documents can be fully fused by setting an interaction layer, so that the model has better expression capability; by performing multitask learning, whether a question is answered or not can be adaptively predicted, and an answer to the question can be acquired.

Description

Multi-task learning, reading and understanding method based on pre-training language model

Technical Field

The invention belongs to the technical field of natural language understanding, and particularly relates to a multi-task learning, reading and understanding method based on a pre-training language model.

Background

Large-scale data makes machine-reading understanding a key task within natural language understanding tasks. The current machine reading understanding task can be divided into: a full fill type and a segment extraction type. The machine reading understanding task of segment extraction requires continuous text to be extracted from the input document as an answer. However, most of the machine reading understanding tasks of segment extraction have a strong assumption that every question can find an answer from an article. Under this assumption, only the boundaries of the answer need to be found by simple pattern matching, ignoring whether the question can be really answered or not, so true natural language understanding still cannot be achieved, and the predictive ability of whether the question can be answered or not is lacking. However, in the real world, unanswered questions are ubiquitous.

Currently, there are two main methods for predicting whether a question can be answered: one is to use a simple classifier to classify whether the question is answered or not answered. The disadvantage of this approach is the lack of interaction and implication relationships between the problem and the document; secondly, by using a verifiable mechanism, firstly, a plausible answer is extracted, then verification is carried out on the basis of the plausible answer, and whether the question can be answered or not is judged. However, the plausible answer may be wrong, for example, when the question is determined to be unanswerable, the plausible answer extracted by the model becomes the wrong answer. It is not reasonable to verify on the wrong answer.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a multi-task learning, reading and understanding method based on a pre-training language model.

In order to achieve the purpose, the invention adopts the following technical scheme:

a multi-task learning, reading and understanding method based on a pre-training language model comprises the following steps:

step 1, training based on a corpus to establish a pre-training language model, and obtaining context perception representation of input documents and questions by using the pre-training language model, wherein the input documents and questions are represented by word vectors, position vectors and paragraph vectors;

step 2, obtaining the vector representation of each word by setting semantic information between interaction layer fusion problems formed by attention networks and documents;

and 3, performing multi-task learning based on the question answering prediction task and the answer obtaining task to obtain the question answering result and the question answer.

Compared with the prior art, the invention has the following beneficial effects:

the pre-training language model is established by training based on the corpus, so that the implication relation between sentence pairs can be obtained; semantic information between the problems and the documents can be fully fused by setting an interaction layer, so that the model has better expression capability; by performing multitask learning, whether a question is answered or not can be adaptively predicted, and an answer to the question can be acquired.

Drawings

Fig. 1 is a flowchart of a multitask learning, reading and understanding method based on a pre-trained language model according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The embodiment of the invention provides a multi-task learning, reading and understanding method based on a pre-training language model, a flow chart is shown in figure 1, and the method comprises the following steps:

s101, training based on a corpus to establish a pre-training language model, and obtaining context perception representation of input documents and questions by using the pre-training language model, wherein the input documents and questions are represented by word vectors, position vectors and paragraph vectors;

s102, obtaining vector representation of each word by setting semantic information between interaction layer fusion problems formed by an attention network and documents;

s103, multi-task learning based on the question answering prediction task and the answer obtaining task is carried out, and the question answering result and the question answer are obtained.

In this embodiment, step S101 is mainly used to establish a pre-training language model. The input to the pre-trained language model is documents and questions, represented as word vectors, position vectors, and paragraph vectors; the output of the pre-trained language model is a context-aware representation of the input documents and questions. The first position of the context is added with a [ CLS ] vector representing the implication relation between the sentence pair, and a [ SEP ] vector for distinguishing two sentences is added between the two sentences. The pre-training language model is established based on large-scale corpus training, so that external common knowledge, lexical, syntactic and grammatical relations can be fully captured, and the implication relation between sentence pairs can be learned.

In the present embodiment, step S102 is mainly used to enable the problem and the document to be fused more deeply. And setting an attention (attention) network, repeatedly carrying out attention operation on the question and the document, and if the attention operation is carried out for 3 times, enabling the question and the document to be fully fused to obtain a hidden state (represented as a vector) of each word in the document.

In the present embodiment, step S103 is mainly used for predicting whether the question can be answered or not based on the multitask learning and outputting the answer to the question. The multi-task learning is a sub-field of machine learning, and aims to utilize useful information contained in a plurality of learning tasks to help each task to learn to obtain a more accurate learner, and to learn a more universal representation by utilizing the similarity between different tasks, so that the model performance is further improved. Both experiments and theory show that when all or a part of the tasks in multi-task learning are related, the combined learning of multiple tasks can obtain better performance than the individual learning. In this embodiment, the multitask includes 2 tasks: one to predict whether a question is answered and one to obtain an answer to the question. If the prediction result is not answerable, outputting No Answer; and if the prediction result is answered, outputting the answer to the question. Obviously, the 2 tasks of predicting whether a question can be answered and obtaining answers to the question are related, and the model performance can be improved by utilizing multi-task learning.

As an alternative embodiment, the pre-trained language model is a multi-layered bi-directional Transformer encoder.

This embodiment shows a specific structure of the pre-trained language model. The Pre-training Language model using the multi-layer bidirectional Transformer coder is also called BERT (Pre-training of Deep bidirectional transformers for Language Understanding, Encoder of bidirectional Transformer), which has the best effect among 11 natural Language processing tasks. The multi-layer bidirectional Transformer encoder is formed by stacking a plurality of layers of Transformer structures, wherein each layer is composed of a self-attention network and a forward propagation network connected through a residual error network and a layer normalization network. In the model training process, the parameters of the model are optimized mainly by masking the language model and predicting the next sentence at the same time. The special training method can capture the co-occurrence relation of the contexts and learn the implication relation between two sentence pairs.

As an alternative embodiment, the method for predicting whether the question is answered and obtaining the answer in step S103 includes:

performing secondary classification by using the implication relation between the question and the document, scoring based on the interactive information and the implication relation between the question and the document, and if the score is higher than a set threshold value, the question can be answered; otherwise the question is not answered.

And scoring each word, then normalizing to obtain probability distribution of the starting position and the ending position, wherein the content between indexes corresponding to the maximum value in the two probability distributions is the answer.

This embodiment provides a specific solution for predicting whether a question can be answered and obtaining an answer to the question by using multi-task learning. To facilitate understanding of the technical solution, a simple example is given below, assuming that the input document D and the corresponding question Q are respectively represented as:

D＝([CLS],q₁,q₂,q_m,[SEP],x₁,x₂,…,x_n,[SEP]) (1)

Q＝(q₁,q₂,…,q_m) (2)

wherein m represents the length of the question, n represents the length of the article, [ CLS ] represents the implication relationship between sentence pairs, [ SEP ] represents the separator of two sentences;

the input documents and questions are pre-trained with the language model BERT, resulting in intermediate representations d BERT (d) and q BERT (q). d_tIs the t-th word vector in d, t is 1,2, …, m + n +3, d₁Is [ CLS ]]The corresponding vector. Repeatedly performing attention operation (multiple iterations) on the problems and the documents through an interaction layer to obtain:

d′_t＝q*Softmax(RELU(d_t*q^T)) (3)

wherein RELU and Softmax represent activation functions.

In the task of predicting whether the question can be answered, the implication relation between the question and the document is used. d'₁Is d₁([CLS]Corresponding vector) processed by the inter-layer, i.e., formula (3), d'₁The method not only contains the interaction information between the question and the document, but also contains the implication relationship between the question and the document. To d'₁Two classifications were made and scored to obtain score. If score is above a set threshold, the question is considered to be answerable; otherwise the question is not answered. The scoring formula is:

score＝σ(w_p*RELU(d′₁w_c+b_c)) (4)

wherein, w_p、w_cAnd b_cAnd outputting a value between 0 and 1 as a training parameter, wherein the sigma is an activation function.

In the task of obtaining answers, calculating a score for each word vector after passing through an interaction layer, and then normalizing through Softmax to obtain probability distribution s of a starting position_tAnd probability distribution e of end position_t：

s_t＝Softmax(RELU(w_s(d′_t*q^T)+b_s)) (5)

e_t＝Softmax(RELU(w_e(d′_t*q^T)+b_e)) (6)

Wherein, w_s、b_s、w_e、b_eAre training parameters.

The segment between the two indexes (t) corresponding to the maximum value of s and e is the answer of the question.

When multi-task learning is carried out, the Loss function Loss is a Loss function Loss of two tasks₁、Loss₂Weighted summation of (2):

Loss＝μLoss₁+λLoss₂(7)

where μ and λ are weights of loss functions of the two tasks, and μ + λ is 1.

A set of experimental data on the SQuAD2.0 data set using the method of the present invention and the prior art is given below. The SQuAD2.0 data set contains 15 ten thousand questions, of which one third is accounted for by unanswered questions. Evaluation indexes used were EM value (Exact Match, absolute Match) and F1 value (F1 score). The EM is used for measuring the consistency of the predicted answer and the standard answer, and the F1 value is used for measuring the similarity of the predicted answer and the standard answer. The results of the experiment are shown in table 1. The existing 1 is a scheme of using a simple classifier to perform two-classification on whether a question can be answered or not, the existing 2 is a verification mechanism scheme, BERT is a scheme of using a pre-training language model BERT, the scheme of removing an interaction layer is a scheme of not using the interaction layer in the invention, and the scheme of removing multiple tasks is a scheme of not using multiple tasks in the invention.

TABLE 1 comparison of experimental data

Prior art 1

Prior art 2

BERT

Removing the interaction layer

De-multitasking

The invention

EM

65.10％

72.30％

76.06％

76.80％

76.56％

77.12％

F1

67.60％

74.80％

80.07％

80.78％

80.49％

81.11％

As can be seen from table 1, EM and F1 in the latter 4 schemes are improved over the 2 schemes in the prior art, and the scheme of the present invention has the best effect.

The above description is only for the purpose of illustrating a few embodiments of the present invention, and should not be taken as limiting the scope of the present invention, in which all equivalent changes, modifications, or equivalent scaling-up or down, etc. made in accordance with the spirit of the present invention should be considered as falling within the scope of the present invention.

Claims

1. A multi-task learning, reading and understanding method based on a pre-training language model is characterized by comprising the following steps:

2. The method of claim 1, wherein the pre-trained language model is a multi-layered bi-directional Transformer encoder.

3. The method for learning, reading and understanding multitasks based on pre-trained language model as claimed in claim 1, wherein said step 3 of predicting whether the question can be answered and obtaining the answer comprises:

performing secondary classification by using the implication relation between the question and the document, scoring based on the interactive information and the implication relation between the question and the document, and if the score is higher than a set threshold value, the question can be answered; otherwise the question is not answered;