CN114385805B

CN114385805B - Small sample learning method for improving adaptability of deep text matching model

Info

Publication number: CN114385805B
Application number: CN202111534340.9A
Authority: CN
Inventors: 宋大为; 张博; 张辰; 马放
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2024-05-10
Anticipated expiration: 2041-12-15
Also published as: CN114385805A

Abstract

The invention relates to a small sample learning method for improving adaptability of a deep text matching model, and belongs to the technical field of text matching in natural language processing. The method integrates the small sample learning and cross-domain adaptability method applied to the text matching model, and gradient descent is carried out on the weight of the source domain data along the direction of minimizing the loss of the small sample data set of the target domain, so that the problem that the traditional cross-domain text matching method is insufficient in performance under the small sample learning setting is solved, and the adaptability of the text matching model in a small sample learning environment is enhanced. The method is irrelevant to the basic model, and can be applied to various text matching models based on deep learning.

Description

Small sample learning method for improving adaptability of deep text matching model

Technical Field

The invention relates to a small sample learning method, in particular to a small sample learning method for improving adaptability of a deep text matching model, and belongs to the technical field of text matching in natural language processing.

Background

Text matching, which aims at identifying the relationship between two text fragments, has been a key research problem in natural language processing and information retrieval. Many specific tasks can be considered as text matching in a specific form, such as question-answering systems, natural language reasoning, and synonym recognition.

With the rapid development of deep learning, in recent years, many neural network models have been applied to the field of text matching. Due to its strong ability to learn text representations and to interact with modeled text pairs, the deep text matching method achieves impressive performance on each of the benchmark tasks. However, some work has shown that deep learning based methods typically require a large amount of tag data to train, i.e., have a strong dependence on the size of the tagged data. When available tag data is limited, poor performance of the model is often caused, and generalization and adaptability of the deep text matching model are hindered. Therefore, how to effectively solve the problem is a key to further improve the ability of deep learning to be practically applied.

For a scene of small sample learning text matching, at present, a classical solution is to invest a large amount of resources to acquire or annotate relevant training data, so that the available tagged data scale is sufficient to meet the requirement of conventional deep learning model training. For example, the semantic matching function of a product search system needs to handle matching between some common sense text and product information text, and if the tagged data in this aspect is not sufficient, the product side consumes a lot of manpower and time cost to collect and tag the data. In contrast, another approach that is considered more efficient is to perform model training with other similar data sets while improving the adaptability of the model to different fields of data, thus solving the problem of small sample learning on the current data set. Thus, the small sample learning problem can be solved in combination with an adaptive approach to the model.

Data that is different from the domain of training data is referred to as out-of-domain data. In practical applications, there is often a case where the deep text matching model predicts the data outside the domain, and the performance of the model is reduced, so a method of model adaptation is required to alleviate the performance loss of the model on the data outside the domain. Currently, existing model adaptation techniques are mostly based on the premise that the target domain and the source domain are comparable in data scale. However, this precondition is impractical in many cases because in practical applications it is difficult to collect a corresponding large-scale tagged data set for all outside-domain data. Therefore, how to solve the problems of small sample learning and model adaptability of the deep text matching model is of great importance.

Disclosure of Invention

Aiming at the defects existing in the prior art and aiming at the problem of how to improve the cross-field adaptability of the small sample learning depth text matching model, the invention creatively provides a small sample learning method for improving the adaptability of the depth text matching model.

The innovation point of the method is that: and integrating a small sample learning and cross-domain adaptability method applied to the text matching model, and carrying out gradient descent on the weight of the source domain data along the direction of minimizing the loss of the small sample data set of the target domain.

The invention is realized by adopting the following technical scheme.

A small sample learning method for improving adaptability of a deep text matching model comprises the following steps:

step 1: and establishing a calculation graph relation between the sample weight and the model parameter.

Specifically, step 1 includes the steps of:

step 1.1: forward propagating the text matching model over a batch of source domain training set data and calculating corresponding penalty values:

Cost^s(y_i,l_i)＝CE^s(y_i,l_i) (1)

Where Cost ^s represents the loss value of the model over the source domain; CE ^s represents the cross entropy loss function; l _i denotes the tag value of the i-th sample; y _i is the model's predicted value for the i-th sample:

y_i＝TMM^s(a_i,b_i,θ) (2)

Wherein TMM ^s represents a text matching model trained on a task or dataset of a source domain; a _i、b_i represents two sentences which are input into the model for text matching respectively; θ represents a parameter of the deep text matching model.

Step 1.2: an initialization weight is assigned to each sample corresponding to the loss value. In consideration of large data distribution difference between the source domain and the target domain, the present invention sets the initial value of the sample weight to 0. Then, the sum of weighted loss values on the source domain data is calculated as the source domain loss value:

Wherein Loss ^s represents a source domain Loss value, y represents a predicted value of the model on a source domain sample, and l represents a label value of the source domain sample; The weight value for the i-th sample in the source domain is initialized to 0, i e {1,2, …, N }.

Step 1.3: to connect the computation graph between the sample weights and the source domain Loss values, the model parameters θ are gradient-descent updated with the source domain Loss values Loss ^s:

Wherein, Representing model parameters after updating one step on the source domain samples; alpha represents a learning rate; /(I)Representing the partial derivative of the source domain loss value to the model parameter; w ^s denotes the weight of the source domain samples. /(I)Is an operator of the partial derivative.

Thereby establishing a computational graph relationship between the sample weights and the model parameters. To this end, computational graph connections are established without changing the values of the model parameters.

Step 2: the weight of the samples is adjusted by meta-gradient descent.

Specifically, step 2 includes the steps of:

Step 2.1: in order to compare the difference between the source domain distribution and the gradient descent direction of the model on the target domain distribution, training the current model on a target small sample set, and calculating the training loss:

Wherein Loss ^t represents a target domain Loss value; TMM ^t represents the deep text matching model when trained on the target domain; m represents the number of target domain samples.

The weight of the target domain samples is set to a constant of 1. This is because there is no difference in distribution between the target domain sample data compared to the source domain sample.

Step 2.2: due to the formation of Loss ^t (y, l)When the second derivative for the source domain sample weight w ^s is calculated from the target domain Loss value Loss ^t (y, l), the gradient can naturally flow through/>Thus, the comparison information carried by the gradients is accumulated over the weight gradients of the source domain samples. The weight adjustment process of the source domain samples is as follows:

Wherein, Representing updated source domain sample weights, alpha representing learning rate,/>Representing the second partial derivative of the loss value of the model over a small sample set of the target domain versus the source domain sample weight.

Step 2.3: inspired by a model independent element learning algorithm, the gradient descent direction is compared by adopting a second derivative, and the weight is updated according to the comparison result.

The meta-weight adjustment first eliminates the negative values of the adjusted weights and then normalizes them in batches to make the performance more stable:

Wherein, Representing the current source domain sample weight to be normalized,/>Representing the weights of other source domain samples in the batch data, m is the data batch size of the target domain training set, and k represents the sequence number of the kth sample in the source domain batch data.

Step 3: a text matching model is trained on the weighted source domain samples.

Specifically, the calculated sample weights are assigned to the source domain samples by meta-weight adjustment to obtain a weighted loss after training a text matching model on the source domain samples:

Where Loss ^s represents the final weighted Loss value of the model over the source domain samples, i e {1,2,...

Therefore, the data which are more similar to the target domain data in the source domain data can obtain larger weight distribution, the trend of updating the parameters of the basic model is promoted to be determined to a larger extent, and finally the performance of the basic model on question-answer matching data is improved.

Advantageous effects

Compared with the prior art, the invention has the following advantages:

The invention adopts a meta-weight adjustment mode, solves the problem of insufficient performance of the traditional cross-domain text matching method under the small sample learning setting, and enhances the adaptability of the text matching model in the small sample learning environment. The method is irrelevant to the basic model, and can be applied to various text matching models based on deep learning.

Through carrying out comprehensive comparison experiments on a series of text matching data sets, the method has the effect of improving the adaptability of different data sets and tasks on small sample learning settings. Experimental results show that the method is obviously superior to the existing method, and the adaptability of the depth text matching model to a few-sample target task or data set is effectively improved.

Drawings

Fig. 1 is a flow chart of the method of the present invention.

Detailed Description

The process according to the invention is described in further detail below with reference to the accompanying drawings.

Examples

A small sample learning method for improving adaptability of a deep text matching model, as shown in fig. 1, comprises the following steps:

Step 1: and (3) establishing a calculation graph relation between the natural language reasoning source domain data sample weight and the BERT model parameters.

Specifically, step 1 includes the steps of:

Step 1.1: using a natural language reasoning training set as a source domain, and using a text matching model BERT to forward propagate on one batch of data of the source domain so as to calculate a corresponding source domain loss value:

Cost^s(y_i,l_i)＝CE^s(y_i,l_i)

y_i＝BERT^s(a_i,b_i,θ)

Wherein BERT ^s represents a text matching model BERT trained on natural language inference source domain tasks; a _i、b_i represents two sentences which are input into the model for text matching respectively; θ represents a parameter of the deep text matching model.

Wherein, Representing model parameters after updating one step on the source domain samples; alpha represents a learning rate; /(I)Representing the partial derivative of the source domain loss value to the model parameter; w ^s denotes the weight of the source domain samples.

Thus, a calculation graph relation is established between the natural language reasoning sentence pair weight and the model parameter. To this end, computational graph connections are established without changing the BERT model parameter values.

Step 2: the weight of the samples is adjusted by meta-gradient descent.

Step 2.1: to compare the differences in the gradient descent direction of the BERT model on the distribution of natural language reasoning and the distribution of question-answer matching, the current BERT model is trained on a small sample set of question-answer matching and the training loss is calculated:

wherein Loss ^t represents a target domain Loss value; BERT ^t represents the deep text matching model BERT when trained on the target domain; m represents the number of target domain samples.

Step 2.3: inspired by the model independent element learning MAML algorithm, the gradient descent direction is compared by adopting a second derivative, and the weight is updated according to the comparison result.

Wherein, Representing the current source domain sample weight that requires normalizationRepresenting the weights of other source domain samples in the batch data, m is the data batch size of the target domain training set, and k represents the sequence number of the kth sample in the source domain batch data.

Step 3: text matching BERT models are trained on weighted source domain samples.

Specifically, the calculated sample weights are assigned to the source domain samples by meta-weight adjustment to obtain a weight loss after training the text matching BERT model on the source domain samples:

Where Loss ^s represents the final weighted Loss value of the model over the source domain samples, i e {1,2,... Therefore, in the natural language reasoning data, data which are more similar to the question-answer matching data obtain larger weight distribution, the trend of the BERT model parameter update is determined to a larger extent, and finally the performance of the BERT model on the question-answer matching data is improved.

The foregoing is a preferred embodiment of the present invention, and the present invention should not be limited to the embodiment and the disclosure of the drawings. All equivalents and modifications that come within the spirit of the disclosure are desired to be protected.

Claims

1. The small sample learning method for improving the adaptability of the deep text matching model is characterized by comprising the following steps of:

Step 1: establishing a calculation graph relation between sample weights and model parameters, comprising the following steps:

Cost^s(y_i,l_i)＝CE^s(y_i,l_i) (1)

y_i＝TMM^s(a_i,b_i,θ) (2)

Wherein TMM ^s represents a text matching model trained on a task or dataset of a source domain; a _i、b_i represents two sentences which are input into the model for text matching respectively; θ represents a parameter of the deep text matching model;

step 1.2: assigning an initialization weight to each sample corresponding to the loss value, and setting the initial value of the sample weight to 0;

then, the sum of weighted loss values on the source domain data is calculated as the source domain loss value:

Wherein Loss ^s represents a source domain Loss value, y represents a predicted value of the model on a source domain sample, and l represents a label value of the source domain sample; The weight value of the i-th sample in the source domain is initialized to 0, i e {1,2, …, N };

step 1.3: gradient descent updating is carried out on the model parameter theta by using the source domain Loss value Loss ^s:

Wherein, Representing model parameters after updating one step on the source domain samples; alpha represents a learning rate; /(I)Representing the partial derivative of the source domain loss value to the model parameter; w ^s denotes the weight of the source domain samples; /(I)Operators that are partial derivatives;

step 2: the weight of the sample is adjusted by meta-gradient descent, comprising the steps of:

Step 2.1: training a current model on a target small sample set, and calculating training loss:

Wherein Loss ^t represents a target domain Loss value; TMM ^t represents the deep text matching model when trained on the target domain; m represents the number of target domain samples;

Step 2.2: the comparison information carried by the gradient is accumulated on the weight gradient of the source domain sample, and the weight adjustment process of the source domain sample is as follows:

Wherein, Representing updated source domain sample weights, alpha representing learning rate,/>Representing the second partial derivative of the loss value of the model on the small sample set of the target domain to the sample weight of the source domain;

Step 2.3: comparing the gradient descending direction by adopting the second derivative, and updating the weight according to the comparison result;

Meta-weight adjustment first removes the negative values of the adjusted weights and then normalizes them in batches:

Wherein, Representing the current source domain sample weight to be normalized,/>The weight of other source domain samples in the batch data is represented, n is the data batch size of the target domain training set, and k represents the serial number of the kth sample in the source domain batch data;

step 3: a text matching model is trained on the weighted source domain samples.

2. The small sample learning method for improving adaptability of deep text matching model as claimed in claim 1, wherein the weight of the target field sample is set to 1 in step 2.

3. The small sample learning method for improving adaptability of deep text matching model as claimed in claim 1, wherein in step 3, the calculated sample weights are assigned to the source domain samples through meta weight adjustment, and the weighting loss is obtained after training the text matching model on the source domain samples: