CN114385805B - Small sample learning method for improving adaptability of deep text matching model - Google Patents
Small sample learning method for improving adaptability of deep text matching model Download PDFInfo
- Publication number
- CN114385805B CN114385805B CN202111534340.9A CN202111534340A CN114385805B CN 114385805 B CN114385805 B CN 114385805B CN 202111534340 A CN202111534340 A CN 202111534340A CN 114385805 B CN114385805 B CN 114385805B
- Authority
- CN
- China
- Prior art keywords
- source domain
- sample
- model
- text matching
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000012549 training Methods 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 4
- 230000001902 propagating effect Effects 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 5
- 238000003058 natural language processing Methods 0.000 abstract description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 101100455978 Arabidopsis thaliana MAM1 gene Proteins 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004580 weight loss Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a small sample learning method for improving adaptability of a deep text matching model, and belongs to the technical field of text matching in natural language processing. The method integrates the small sample learning and cross-domain adaptability method applied to the text matching model, and gradient descent is carried out on the weight of the source domain data along the direction of minimizing the loss of the small sample data set of the target domain, so that the problem that the traditional cross-domain text matching method is insufficient in performance under the small sample learning setting is solved, and the adaptability of the text matching model in a small sample learning environment is enhanced. The method is irrelevant to the basic model, and can be applied to various text matching models based on deep learning.
Description
Technical Field
The invention relates to a small sample learning method, in particular to a small sample learning method for improving adaptability of a deep text matching model, and belongs to the technical field of text matching in natural language processing.
Background
Text matching, which aims at identifying the relationship between two text fragments, has been a key research problem in natural language processing and information retrieval. Many specific tasks can be considered as text matching in a specific form, such as question-answering systems, natural language reasoning, and synonym recognition.
With the rapid development of deep learning, in recent years, many neural network models have been applied to the field of text matching. Due to its strong ability to learn text representations and to interact with modeled text pairs, the deep text matching method achieves impressive performance on each of the benchmark tasks. However, some work has shown that deep learning based methods typically require a large amount of tag data to train, i.e., have a strong dependence on the size of the tagged data. When available tag data is limited, poor performance of the model is often caused, and generalization and adaptability of the deep text matching model are hindered. Therefore, how to effectively solve the problem is a key to further improve the ability of deep learning to be practically applied.
For a scene of small sample learning text matching, at present, a classical solution is to invest a large amount of resources to acquire or annotate relevant training data, so that the available tagged data scale is sufficient to meet the requirement of conventional deep learning model training. For example, the semantic matching function of a product search system needs to handle matching between some common sense text and product information text, and if the tagged data in this aspect is not sufficient, the product side consumes a lot of manpower and time cost to collect and tag the data. In contrast, another approach that is considered more efficient is to perform model training with other similar data sets while improving the adaptability of the model to different fields of data, thus solving the problem of small sample learning on the current data set. Thus, the small sample learning problem can be solved in combination with an adaptive approach to the model.
Data that is different from the domain of training data is referred to as out-of-domain data. In practical applications, there is often a case where the deep text matching model predicts the data outside the domain, and the performance of the model is reduced, so a method of model adaptation is required to alleviate the performance loss of the model on the data outside the domain. Currently, existing model adaptation techniques are mostly based on the premise that the target domain and the source domain are comparable in data scale. However, this precondition is impractical in many cases because in practical applications it is difficult to collect a corresponding large-scale tagged data set for all outside-domain data. Therefore, how to solve the problems of small sample learning and model adaptability of the deep text matching model is of great importance.
Disclosure of Invention
Aiming at the defects existing in the prior art and aiming at the problem of how to improve the cross-field adaptability of the small sample learning depth text matching model, the invention creatively provides a small sample learning method for improving the adaptability of the depth text matching model.
The innovation point of the method is that: and integrating a small sample learning and cross-domain adaptability method applied to the text matching model, and carrying out gradient descent on the weight of the source domain data along the direction of minimizing the loss of the small sample data set of the target domain.
The invention is realized by adopting the following technical scheme.
A small sample learning method for improving adaptability of a deep text matching model comprises the following steps:
step 1: and establishing a calculation graph relation between the sample weight and the model parameter.
Specifically, step 1 includes the steps of:
step 1.1: forward propagating the text matching model over a batch of source domain training set data and calculating corresponding penalty values:
Costs(yi,li)=CEs(yi,li) (1)
Where Cost s represents the loss value of the model over the source domain; CE s represents the cross entropy loss function; l i denotes the tag value of the i-th sample; y i is the model's predicted value for the i-th sample:
yi=TMMs(ai,bi,θ) (2)
Wherein TMM s represents a text matching model trained on a task or dataset of a source domain; a i、bi represents two sentences which are input into the model for text matching respectively; θ represents a parameter of the deep text matching model.
Step 1.2: an initialization weight is assigned to each sample corresponding to the loss value. In consideration of large data distribution difference between the source domain and the target domain, the present invention sets the initial value of the sample weight to 0. Then, the sum of weighted loss values on the source domain data is calculated as the source domain loss value:
Wherein Loss s represents a source domain Loss value, y represents a predicted value of the model on a source domain sample, and l represents a label value of the source domain sample; The weight value for the i-th sample in the source domain is initialized to 0, i e {1,2, …, N }.
Step 1.3: to connect the computation graph between the sample weights and the source domain Loss values, the model parameters θ are gradient-descent updated with the source domain Loss values Loss s:
Wherein, Representing model parameters after updating one step on the source domain samples; alpha represents a learning rate; /(I)Representing the partial derivative of the source domain loss value to the model parameter; w s denotes the weight of the source domain samples. /(I)Is an operator of the partial derivative.
Thereby establishing a computational graph relationship between the sample weights and the model parameters. To this end, computational graph connections are established without changing the values of the model parameters.
Step 2: the weight of the samples is adjusted by meta-gradient descent.
Specifically, step 2 includes the steps of:
Step 2.1: in order to compare the difference between the source domain distribution and the gradient descent direction of the model on the target domain distribution, training the current model on a target small sample set, and calculating the training loss:
Wherein Loss t represents a target domain Loss value; TMM t represents the deep text matching model when trained on the target domain; m represents the number of target domain samples.
The weight of the target domain samples is set to a constant of 1. This is because there is no difference in distribution between the target domain sample data compared to the source domain sample.
Step 2.2: due to the formation of Loss t (y, l)When the second derivative for the source domain sample weight w s is calculated from the target domain Loss value Loss t (y, l), the gradient can naturally flow through/>Thus, the comparison information carried by the gradients is accumulated over the weight gradients of the source domain samples. The weight adjustment process of the source domain samples is as follows:
Wherein, Representing updated source domain sample weights, alpha representing learning rate,/>Representing the second partial derivative of the loss value of the model over a small sample set of the target domain versus the source domain sample weight.
Step 2.3: inspired by a model independent element learning algorithm, the gradient descent direction is compared by adopting a second derivative, and the weight is updated according to the comparison result.
The meta-weight adjustment first eliminates the negative values of the adjusted weights and then normalizes them in batches to make the performance more stable:
Wherein, Representing the current source domain sample weight to be normalized,/>Representing the weights of other source domain samples in the batch data, m is the data batch size of the target domain training set, and k represents the sequence number of the kth sample in the source domain batch data.
Step 3: a text matching model is trained on the weighted source domain samples.
Specifically, the calculated sample weights are assigned to the source domain samples by meta-weight adjustment to obtain a weighted loss after training a text matching model on the source domain samples:
Where Loss s represents the final weighted Loss value of the model over the source domain samples, i e {1,2,...
Therefore, the data which are more similar to the target domain data in the source domain data can obtain larger weight distribution, the trend of updating the parameters of the basic model is promoted to be determined to a larger extent, and finally the performance of the basic model on question-answer matching data is improved.
Advantageous effects
Compared with the prior art, the invention has the following advantages:
The invention adopts a meta-weight adjustment mode, solves the problem of insufficient performance of the traditional cross-domain text matching method under the small sample learning setting, and enhances the adaptability of the text matching model in the small sample learning environment. The method is irrelevant to the basic model, and can be applied to various text matching models based on deep learning.
Through carrying out comprehensive comparison experiments on a series of text matching data sets, the method has the effect of improving the adaptability of different data sets and tasks on small sample learning settings. Experimental results show that the method is obviously superior to the existing method, and the adaptability of the depth text matching model to a few-sample target task or data set is effectively improved.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
Detailed Description
The process according to the invention is described in further detail below with reference to the accompanying drawings.
Examples
A small sample learning method for improving adaptability of a deep text matching model, as shown in fig. 1, comprises the following steps:
Step 1: and (3) establishing a calculation graph relation between the natural language reasoning source domain data sample weight and the BERT model parameters.
Specifically, step 1 includes the steps of:
Step 1.1: using a natural language reasoning training set as a source domain, and using a text matching model BERT to forward propagate on one batch of data of the source domain so as to calculate a corresponding source domain loss value:
Costs(yi,li)=CEs(yi,li)
Where Cost s represents the loss value of the model over the source domain; CE s represents the cross entropy loss function; l i denotes the tag value of the i-th sample; y i is the model's predicted value for the i-th sample:
yi=BERTs(ai,bi,θ)
Wherein BERT s represents a text matching model BERT trained on natural language inference source domain tasks; a i、bi represents two sentences which are input into the model for text matching respectively; θ represents a parameter of the deep text matching model.
Step 1.2: an initialization weight is assigned to each sample corresponding to the loss value. In consideration of large data distribution difference between the source domain and the target domain, the present invention sets the initial value of the sample weight to 0. Then, the sum of weighted loss values on the source domain data is calculated as the source domain loss value:
Wherein Loss s represents a source domain Loss value, y represents a predicted value of the model on a source domain sample, and l represents a label value of the source domain sample; The weight value for the i-th sample in the source domain is initialized to 0, i e {1,2, …, N }.
Step 1.3: to connect the computation graph between the sample weights and the source domain Loss values, the model parameters θ are gradient-descent updated with the source domain Loss values Loss s:
Wherein, Representing model parameters after updating one step on the source domain samples; alpha represents a learning rate; /(I)Representing the partial derivative of the source domain loss value to the model parameter; w s denotes the weight of the source domain samples.
Thus, a calculation graph relation is established between the natural language reasoning sentence pair weight and the model parameter. To this end, computational graph connections are established without changing the BERT model parameter values.
Step 2: the weight of the samples is adjusted by meta-gradient descent.
Step 2.1: to compare the differences in the gradient descent direction of the BERT model on the distribution of natural language reasoning and the distribution of question-answer matching, the current BERT model is trained on a small sample set of question-answer matching and the training loss is calculated:
wherein Loss t represents a target domain Loss value; BERT t represents the deep text matching model BERT when trained on the target domain; m represents the number of target domain samples.
The weight of the target domain samples is set to a constant of 1. This is because there is no difference in distribution between the target domain sample data compared to the source domain sample.
Step 2.2: due to the formation of Loss t (y, l)When the second derivative for the source domain sample weight w s is calculated from the target domain Loss value Loss t (y, l), the gradient can naturally flow through/>Thus, the comparison information carried by the gradients is accumulated over the weight gradients of the source domain samples. The weight adjustment process of the source domain samples is as follows:
Wherein, Representing updated source domain sample weights, alpha representing learning rate,/>Representing the second partial derivative of the loss value of the model over a small sample set of the target domain versus the source domain sample weight.
Step 2.3: inspired by the model independent element learning MAML algorithm, the gradient descent direction is compared by adopting a second derivative, and the weight is updated according to the comparison result.
The meta-weight adjustment first eliminates the negative values of the adjusted weights and then normalizes them in batches to make the performance more stable:
Wherein, Representing the current source domain sample weight that requires normalizationRepresenting the weights of other source domain samples in the batch data, m is the data batch size of the target domain training set, and k represents the sequence number of the kth sample in the source domain batch data.
Step 3: text matching BERT models are trained on weighted source domain samples.
Specifically, the calculated sample weights are assigned to the source domain samples by meta-weight adjustment to obtain a weight loss after training the text matching BERT model on the source domain samples:
Where Loss s represents the final weighted Loss value of the model over the source domain samples, i e {1,2,... Therefore, in the natural language reasoning data, data which are more similar to the question-answer matching data obtain larger weight distribution, the trend of the BERT model parameter update is determined to a larger extent, and finally the performance of the BERT model on the question-answer matching data is improved.
The foregoing is a preferred embodiment of the present invention, and the present invention should not be limited to the embodiment and the disclosure of the drawings. All equivalents and modifications that come within the spirit of the disclosure are desired to be protected.
Claims (3)
1. The small sample learning method for improving the adaptability of the deep text matching model is characterized by comprising the following steps of:
Step 1: establishing a calculation graph relation between sample weights and model parameters, comprising the following steps:
step 1.1: forward propagating the text matching model over a batch of source domain training set data and calculating corresponding penalty values:
Costs(yi,li)=CEs(yi,li) (1)
Where Cost s represents the loss value of the model over the source domain; CE s represents the cross entropy loss function; l i denotes the tag value of the i-th sample; y i is the model's predicted value for the i-th sample:
yi=TMMs(ai,bi,θ) (2)
Wherein TMM s represents a text matching model trained on a task or dataset of a source domain; a i、bi represents two sentences which are input into the model for text matching respectively; θ represents a parameter of the deep text matching model;
step 1.2: assigning an initialization weight to each sample corresponding to the loss value, and setting the initial value of the sample weight to 0;
then, the sum of weighted loss values on the source domain data is calculated as the source domain loss value:
Wherein Loss s represents a source domain Loss value, y represents a predicted value of the model on a source domain sample, and l represents a label value of the source domain sample; The weight value of the i-th sample in the source domain is initialized to 0, i e {1,2, …, N };
step 1.3: gradient descent updating is carried out on the model parameter theta by using the source domain Loss value Loss s:
Wherein, Representing model parameters after updating one step on the source domain samples; alpha represents a learning rate; /(I)Representing the partial derivative of the source domain loss value to the model parameter; w s denotes the weight of the source domain samples; /(I)Operators that are partial derivatives;
step 2: the weight of the sample is adjusted by meta-gradient descent, comprising the steps of:
Step 2.1: training a current model on a target small sample set, and calculating training loss:
Wherein Loss t represents a target domain Loss value; TMM t represents the deep text matching model when trained on the target domain; m represents the number of target domain samples;
Step 2.2: the comparison information carried by the gradient is accumulated on the weight gradient of the source domain sample, and the weight adjustment process of the source domain sample is as follows:
Wherein, Representing updated source domain sample weights, alpha representing learning rate,/>Representing the second partial derivative of the loss value of the model on the small sample set of the target domain to the sample weight of the source domain;
Step 2.3: comparing the gradient descending direction by adopting the second derivative, and updating the weight according to the comparison result;
Meta-weight adjustment first removes the negative values of the adjusted weights and then normalizes them in batches:
Wherein, Representing the current source domain sample weight to be normalized,/>The weight of other source domain samples in the batch data is represented, n is the data batch size of the target domain training set, and k represents the serial number of the kth sample in the source domain batch data;
step 3: a text matching model is trained on the weighted source domain samples.
2. The small sample learning method for improving adaptability of deep text matching model as claimed in claim 1, wherein the weight of the target field sample is set to 1 in step 2.
3. The small sample learning method for improving adaptability of deep text matching model as claimed in claim 1, wherein in step 3, the calculated sample weights are assigned to the source domain samples through meta weight adjustment, and the weighting loss is obtained after training the text matching model on the source domain samples:
Where Loss s represents the final weighted Loss value of the model over the source domain samples, i e {1,2,...
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111534340.9A CN114385805B (en) | 2021-12-15 | 2021-12-15 | Small sample learning method for improving adaptability of deep text matching model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111534340.9A CN114385805B (en) | 2021-12-15 | 2021-12-15 | Small sample learning method for improving adaptability of deep text matching model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114385805A CN114385805A (en) | 2022-04-22 |
CN114385805B true CN114385805B (en) | 2024-05-10 |
Family
ID=81197910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111534340.9A Active CN114385805B (en) | 2021-12-15 | 2021-12-15 | Small sample learning method for improving adaptability of deep text matching model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114385805B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015184335A1 (en) * | 2014-05-30 | 2015-12-03 | Tootitaki Holdings Pte Ltd | Real-time audience segment behavior prediction |
CN111401928A (en) * | 2020-04-01 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Method and device for determining semantic similarity of text based on graph data |
CN112699966A (en) * | 2021-01-14 | 2021-04-23 | 中国人民解放军海军航空大学 | Radar HRRP small sample target recognition pre-training and fine-tuning method based on deep migration learning |
CN112925888A (en) * | 2019-12-06 | 2021-06-08 | 上海大岂网络科技有限公司 | Method and device for training question-answer response and small sample text matching model |
CN112926547A (en) * | 2021-04-13 | 2021-06-08 | 北京航空航天大学 | Small sample transfer learning method for classifying and identifying aircraft electric signals |
CN113705215A (en) * | 2021-08-27 | 2021-11-26 | 南京大学 | Meta-learning-based large-scale multi-label text classification method |
-
2021
- 2021-12-15 CN CN202111534340.9A patent/CN114385805B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015184335A1 (en) * | 2014-05-30 | 2015-12-03 | Tootitaki Holdings Pte Ltd | Real-time audience segment behavior prediction |
CN112925888A (en) * | 2019-12-06 | 2021-06-08 | 上海大岂网络科技有限公司 | Method and device for training question-answer response and small sample text matching model |
CN111401928A (en) * | 2020-04-01 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Method and device for determining semantic similarity of text based on graph data |
CN112699966A (en) * | 2021-01-14 | 2021-04-23 | 中国人民解放军海军航空大学 | Radar HRRP small sample target recognition pre-training and fine-tuning method based on deep migration learning |
CN112926547A (en) * | 2021-04-13 | 2021-06-08 | 北京航空航天大学 | Small sample transfer learning method for classifying and identifying aircraft electric signals |
CN113705215A (en) * | 2021-08-27 | 2021-11-26 | 南京大学 | Meta-learning-based large-scale multi-label text classification method |
Also Published As
Publication number | Publication date |
---|---|
CN114385805A (en) | 2022-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Luan et al. | Scientific information extraction with semi-supervised neural tagging | |
CN108334891B (en) | Task type intention classification method and device | |
CN108932342A (en) | A kind of method of semantic matches, the learning method of model and server | |
CN110737758A (en) | Method and apparatus for generating a model | |
CN106844349B (en) | Comment spam recognition methods based on coorinated training | |
CN113254667A (en) | Scientific and technological figure knowledge graph construction method and device based on deep learning model and terminal | |
CN110362814B (en) | Named entity identification method and device based on improved loss function | |
CN113887643B (en) | New dialogue intention recognition method based on pseudo tag self-training and source domain retraining | |
CN111127246A (en) | Intelligent prediction method for transmission line engineering cost | |
CN113010683A (en) | Entity relationship identification method and system based on improved graph attention network | |
CN115270797A (en) | Text entity extraction method and system based on self-training semi-supervised learning | |
CN112328748A (en) | Method for identifying insurance configuration intention | |
CN109741824A (en) | A kind of medical way of inquisition based on machine learning | |
CN114462409A (en) | Audit field named entity recognition method based on countermeasure training | |
CN116912624A (en) | Pseudo tag unsupervised data training method, device, equipment and medium | |
CN114722176A (en) | Intelligent question answering method, device, medium and electronic equipment | |
CN108694176A (en) | Method, apparatus, electronic equipment and the readable storage medium storing program for executing of document sentiment analysis | |
CN117151069B (en) | Security scheme generation system | |
Li et al. | Dual pseudo supervision for semi-supervised text classification with a reliable teacher | |
CN112905750A (en) | Generation method and device of optimization model | |
CN114385805B (en) | Small sample learning method for improving adaptability of deep text matching model | |
CN109189915B (en) | Information retrieval method based on depth correlation matching model | |
CN116402025A (en) | Sentence breaking method, sentence creating method, training device, sentence breaking equipment and sentence breaking medium | |
CN114357166B (en) | Text classification method based on deep learning | |
CN115600595A (en) | Entity relationship extraction method, system, equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |