CN114385805B - Small sample learning method for improving adaptability of deep text matching model - Google Patents

Small sample learning method for improving adaptability of deep text matching model Download PDF

Info

Publication number
CN114385805B
CN114385805B CN202111534340.9A CN202111534340A CN114385805B CN 114385805 B CN114385805 B CN 114385805B CN 202111534340 A CN202111534340 A CN 202111534340A CN 114385805 B CN114385805 B CN 114385805B
Authority
CN
China
Prior art keywords
source domain
sample
model
text matching
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111534340.9A
Other languages
Chinese (zh)
Other versions
CN114385805A (en
Inventor
宋大为
张博
张辰
马放
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202111534340.9A priority Critical patent/CN114385805B/en
Publication of CN114385805A publication Critical patent/CN114385805A/en
Application granted granted Critical
Publication of CN114385805B publication Critical patent/CN114385805B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a small sample learning method for improving adaptability of a deep text matching model, and belongs to the technical field of text matching in natural language processing. The method integrates the small sample learning and cross-domain adaptability method applied to the text matching model, and gradient descent is carried out on the weight of the source domain data along the direction of minimizing the loss of the small sample data set of the target domain, so that the problem that the traditional cross-domain text matching method is insufficient in performance under the small sample learning setting is solved, and the adaptability of the text matching model in a small sample learning environment is enhanced. The method is irrelevant to the basic model, and can be applied to various text matching models based on deep learning.

Description

Small sample learning method for improving adaptability of deep text matching model
Technical Field
The invention relates to a small sample learning method, in particular to a small sample learning method for improving adaptability of a deep text matching model, and belongs to the technical field of text matching in natural language processing.
Background
Text matching, which aims at identifying the relationship between two text fragments, has been a key research problem in natural language processing and information retrieval. Many specific tasks can be considered as text matching in a specific form, such as question-answering systems, natural language reasoning, and synonym recognition.
With the rapid development of deep learning, in recent years, many neural network models have been applied to the field of text matching. Due to its strong ability to learn text representations and to interact with modeled text pairs, the deep text matching method achieves impressive performance on each of the benchmark tasks. However, some work has shown that deep learning based methods typically require a large amount of tag data to train, i.e., have a strong dependence on the size of the tagged data. When available tag data is limited, poor performance of the model is often caused, and generalization and adaptability of the deep text matching model are hindered. Therefore, how to effectively solve the problem is a key to further improve the ability of deep learning to be practically applied.
For a scene of small sample learning text matching, at present, a classical solution is to invest a large amount of resources to acquire or annotate relevant training data, so that the available tagged data scale is sufficient to meet the requirement of conventional deep learning model training. For example, the semantic matching function of a product search system needs to handle matching between some common sense text and product information text, and if the tagged data in this aspect is not sufficient, the product side consumes a lot of manpower and time cost to collect and tag the data. In contrast, another approach that is considered more efficient is to perform model training with other similar data sets while improving the adaptability of the model to different fields of data, thus solving the problem of small sample learning on the current data set. Thus, the small sample learning problem can be solved in combination with an adaptive approach to the model.
Data that is different from the domain of training data is referred to as out-of-domain data. In practical applications, there is often a case where the deep text matching model predicts the data outside the domain, and the performance of the model is reduced, so a method of model adaptation is required to alleviate the performance loss of the model on the data outside the domain. Currently, existing model adaptation techniques are mostly based on the premise that the target domain and the source domain are comparable in data scale. However, this precondition is impractical in many cases because in practical applications it is difficult to collect a corresponding large-scale tagged data set for all outside-domain data. Therefore, how to solve the problems of small sample learning and model adaptability of the deep text matching model is of great importance.
Disclosure of Invention
Aiming at the defects existing in the prior art and aiming at the problem of how to improve the cross-field adaptability of the small sample learning depth text matching model, the invention creatively provides a small sample learning method for improving the adaptability of the depth text matching model.
The innovation point of the method is that: and integrating a small sample learning and cross-domain adaptability method applied to the text matching model, and carrying out gradient descent on the weight of the source domain data along the direction of minimizing the loss of the small sample data set of the target domain.
The invention is realized by adopting the following technical scheme.
A small sample learning method for improving adaptability of a deep text matching model comprises the following steps:
step 1: and establishing a calculation graph relation between the sample weight and the model parameter.
Specifically, step 1 includes the steps of:
step 1.1: forward propagating the text matching model over a batch of source domain training set data and calculating corresponding penalty values:
Costs(yi,li)=CEs(yi,li) (1)
Where Cost s represents the loss value of the model over the source domain; CE s represents the cross entropy loss function; l i denotes the tag value of the i-th sample; y i is the model's predicted value for the i-th sample:
yi=TMMs(ai,bi,θ) (2)
Wherein TMM s represents a text matching model trained on a task or dataset of a source domain; a i、bi represents two sentences which are input into the model for text matching respectively; θ represents a parameter of the deep text matching model.
Step 1.2: an initialization weight is assigned to each sample corresponding to the loss value. In consideration of large data distribution difference between the source domain and the target domain, the present invention sets the initial value of the sample weight to 0. Then, the sum of weighted loss values on the source domain data is calculated as the source domain loss value:
Wherein Loss s represents a source domain Loss value, y represents a predicted value of the model on a source domain sample, and l represents a label value of the source domain sample; The weight value for the i-th sample in the source domain is initialized to 0, i e {1,2, …, N }.
Step 1.3: to connect the computation graph between the sample weights and the source domain Loss values, the model parameters θ are gradient-descent updated with the source domain Loss values Loss s:
Wherein, Representing model parameters after updating one step on the source domain samples; alpha represents a learning rate; /(I)Representing the partial derivative of the source domain loss value to the model parameter; w s denotes the weight of the source domain samples. /(I)Is an operator of the partial derivative.
Thereby establishing a computational graph relationship between the sample weights and the model parameters. To this end, computational graph connections are established without changing the values of the model parameters.
Step 2: the weight of the samples is adjusted by meta-gradient descent.
Specifically, step 2 includes the steps of:
Step 2.1: in order to compare the difference between the source domain distribution and the gradient descent direction of the model on the target domain distribution, training the current model on a target small sample set, and calculating the training loss:
Wherein Loss t represents a target domain Loss value; TMM t represents the deep text matching model when trained on the target domain; m represents the number of target domain samples.
The weight of the target domain samples is set to a constant of 1. This is because there is no difference in distribution between the target domain sample data compared to the source domain sample.
Step 2.2: due to the formation of Loss t (y, l)When the second derivative for the source domain sample weight w s is calculated from the target domain Loss value Loss t (y, l), the gradient can naturally flow through/>Thus, the comparison information carried by the gradients is accumulated over the weight gradients of the source domain samples. The weight adjustment process of the source domain samples is as follows:
Wherein, Representing updated source domain sample weights, alpha representing learning rate,/>Representing the second partial derivative of the loss value of the model over a small sample set of the target domain versus the source domain sample weight.
Step 2.3: inspired by a model independent element learning algorithm, the gradient descent direction is compared by adopting a second derivative, and the weight is updated according to the comparison result.
The meta-weight adjustment first eliminates the negative values of the adjusted weights and then normalizes them in batches to make the performance more stable:
Wherein, Representing the current source domain sample weight to be normalized,/>Representing the weights of other source domain samples in the batch data, m is the data batch size of the target domain training set, and k represents the sequence number of the kth sample in the source domain batch data.
Step 3: a text matching model is trained on the weighted source domain samples.
Specifically, the calculated sample weights are assigned to the source domain samples by meta-weight adjustment to obtain a weighted loss after training a text matching model on the source domain samples:
Where Loss s represents the final weighted Loss value of the model over the source domain samples, i e {1,2,...
Therefore, the data which are more similar to the target domain data in the source domain data can obtain larger weight distribution, the trend of updating the parameters of the basic model is promoted to be determined to a larger extent, and finally the performance of the basic model on question-answer matching data is improved.
Advantageous effects
Compared with the prior art, the invention has the following advantages:
The invention adopts a meta-weight adjustment mode, solves the problem of insufficient performance of the traditional cross-domain text matching method under the small sample learning setting, and enhances the adaptability of the text matching model in the small sample learning environment. The method is irrelevant to the basic model, and can be applied to various text matching models based on deep learning.
Through carrying out comprehensive comparison experiments on a series of text matching data sets, the method has the effect of improving the adaptability of different data sets and tasks on small sample learning settings. Experimental results show that the method is obviously superior to the existing method, and the adaptability of the depth text matching model to a few-sample target task or data set is effectively improved.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
Detailed Description
The process according to the invention is described in further detail below with reference to the accompanying drawings.
Examples
A small sample learning method for improving adaptability of a deep text matching model, as shown in fig. 1, comprises the following steps:
Step 1: and (3) establishing a calculation graph relation between the natural language reasoning source domain data sample weight and the BERT model parameters.
Specifically, step 1 includes the steps of:
Step 1.1: using a natural language reasoning training set as a source domain, and using a text matching model BERT to forward propagate on one batch of data of the source domain so as to calculate a corresponding source domain loss value:
Costs(yi,li)=CEs(yi,li)
Where Cost s represents the loss value of the model over the source domain; CE s represents the cross entropy loss function; l i denotes the tag value of the i-th sample; y i is the model's predicted value for the i-th sample:
yi=BERTs(ai,bi,θ)
Wherein BERT s represents a text matching model BERT trained on natural language inference source domain tasks; a i、bi represents two sentences which are input into the model for text matching respectively; θ represents a parameter of the deep text matching model.
Step 1.2: an initialization weight is assigned to each sample corresponding to the loss value. In consideration of large data distribution difference between the source domain and the target domain, the present invention sets the initial value of the sample weight to 0. Then, the sum of weighted loss values on the source domain data is calculated as the source domain loss value:
Wherein Loss s represents a source domain Loss value, y represents a predicted value of the model on a source domain sample, and l represents a label value of the source domain sample; The weight value for the i-th sample in the source domain is initialized to 0, i e {1,2, …, N }.
Step 1.3: to connect the computation graph between the sample weights and the source domain Loss values, the model parameters θ are gradient-descent updated with the source domain Loss values Loss s:
Wherein, Representing model parameters after updating one step on the source domain samples; alpha represents a learning rate; /(I)Representing the partial derivative of the source domain loss value to the model parameter; w s denotes the weight of the source domain samples.
Thus, a calculation graph relation is established between the natural language reasoning sentence pair weight and the model parameter. To this end, computational graph connections are established without changing the BERT model parameter values.
Step 2: the weight of the samples is adjusted by meta-gradient descent.
Step 2.1: to compare the differences in the gradient descent direction of the BERT model on the distribution of natural language reasoning and the distribution of question-answer matching, the current BERT model is trained on a small sample set of question-answer matching and the training loss is calculated:
wherein Loss t represents a target domain Loss value; BERT t represents the deep text matching model BERT when trained on the target domain; m represents the number of target domain samples.
The weight of the target domain samples is set to a constant of 1. This is because there is no difference in distribution between the target domain sample data compared to the source domain sample.
Step 2.2: due to the formation of Loss t (y, l)When the second derivative for the source domain sample weight w s is calculated from the target domain Loss value Loss t (y, l), the gradient can naturally flow through/>Thus, the comparison information carried by the gradients is accumulated over the weight gradients of the source domain samples. The weight adjustment process of the source domain samples is as follows:
Wherein, Representing updated source domain sample weights, alpha representing learning rate,/>Representing the second partial derivative of the loss value of the model over a small sample set of the target domain versus the source domain sample weight.
Step 2.3: inspired by the model independent element learning MAML algorithm, the gradient descent direction is compared by adopting a second derivative, and the weight is updated according to the comparison result.
The meta-weight adjustment first eliminates the negative values of the adjusted weights and then normalizes them in batches to make the performance more stable:
Wherein, Representing the current source domain sample weight that requires normalizationRepresenting the weights of other source domain samples in the batch data, m is the data batch size of the target domain training set, and k represents the sequence number of the kth sample in the source domain batch data.
Step 3: text matching BERT models are trained on weighted source domain samples.
Specifically, the calculated sample weights are assigned to the source domain samples by meta-weight adjustment to obtain a weight loss after training the text matching BERT model on the source domain samples:
Where Loss s represents the final weighted Loss value of the model over the source domain samples, i e {1,2,... Therefore, in the natural language reasoning data, data which are more similar to the question-answer matching data obtain larger weight distribution, the trend of the BERT model parameter update is determined to a larger extent, and finally the performance of the BERT model on the question-answer matching data is improved.
The foregoing is a preferred embodiment of the present invention, and the present invention should not be limited to the embodiment and the disclosure of the drawings. All equivalents and modifications that come within the spirit of the disclosure are desired to be protected.

Claims (3)

1. The small sample learning method for improving the adaptability of the deep text matching model is characterized by comprising the following steps of:
Step 1: establishing a calculation graph relation between sample weights and model parameters, comprising the following steps:
step 1.1: forward propagating the text matching model over a batch of source domain training set data and calculating corresponding penalty values:
Costs(yi,li)=CEs(yi,li) (1)
Where Cost s represents the loss value of the model over the source domain; CE s represents the cross entropy loss function; l i denotes the tag value of the i-th sample; y i is the model's predicted value for the i-th sample:
yi=TMMs(ai,bi,θ) (2)
Wherein TMM s represents a text matching model trained on a task or dataset of a source domain; a i、bi represents two sentences which are input into the model for text matching respectively; θ represents a parameter of the deep text matching model;
step 1.2: assigning an initialization weight to each sample corresponding to the loss value, and setting the initial value of the sample weight to 0;
then, the sum of weighted loss values on the source domain data is calculated as the source domain loss value:
Wherein Loss s represents a source domain Loss value, y represents a predicted value of the model on a source domain sample, and l represents a label value of the source domain sample; The weight value of the i-th sample in the source domain is initialized to 0, i e {1,2, …, N };
step 1.3: gradient descent updating is carried out on the model parameter theta by using the source domain Loss value Loss s:
Wherein, Representing model parameters after updating one step on the source domain samples; alpha represents a learning rate; /(I)Representing the partial derivative of the source domain loss value to the model parameter; w s denotes the weight of the source domain samples; /(I)Operators that are partial derivatives;
step 2: the weight of the sample is adjusted by meta-gradient descent, comprising the steps of:
Step 2.1: training a current model on a target small sample set, and calculating training loss:
Wherein Loss t represents a target domain Loss value; TMM t represents the deep text matching model when trained on the target domain; m represents the number of target domain samples;
Step 2.2: the comparison information carried by the gradient is accumulated on the weight gradient of the source domain sample, and the weight adjustment process of the source domain sample is as follows:
Wherein, Representing updated source domain sample weights, alpha representing learning rate,/>Representing the second partial derivative of the loss value of the model on the small sample set of the target domain to the sample weight of the source domain;
Step 2.3: comparing the gradient descending direction by adopting the second derivative, and updating the weight according to the comparison result;
Meta-weight adjustment first removes the negative values of the adjusted weights and then normalizes them in batches:
Wherein, Representing the current source domain sample weight to be normalized,/>The weight of other source domain samples in the batch data is represented, n is the data batch size of the target domain training set, and k represents the serial number of the kth sample in the source domain batch data;
step 3: a text matching model is trained on the weighted source domain samples.
2. The small sample learning method for improving adaptability of deep text matching model as claimed in claim 1, wherein the weight of the target field sample is set to 1 in step 2.
3. The small sample learning method for improving adaptability of deep text matching model as claimed in claim 1, wherein in step 3, the calculated sample weights are assigned to the source domain samples through meta weight adjustment, and the weighting loss is obtained after training the text matching model on the source domain samples:
Where Loss s represents the final weighted Loss value of the model over the source domain samples, i e {1,2,...
CN202111534340.9A 2021-12-15 2021-12-15 Small sample learning method for improving adaptability of deep text matching model Active CN114385805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111534340.9A CN114385805B (en) 2021-12-15 2021-12-15 Small sample learning method for improving adaptability of deep text matching model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111534340.9A CN114385805B (en) 2021-12-15 2021-12-15 Small sample learning method for improving adaptability of deep text matching model

Publications (2)

Publication Number Publication Date
CN114385805A CN114385805A (en) 2022-04-22
CN114385805B true CN114385805B (en) 2024-05-10

Family

ID=81197910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111534340.9A Active CN114385805B (en) 2021-12-15 2021-12-15 Small sample learning method for improving adaptability of deep text matching model

Country Status (1)

Country Link
CN (1) CN114385805B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015184335A1 (en) * 2014-05-30 2015-12-03 Tootitaki Holdings Pte Ltd Real-time audience segment behavior prediction
CN111401928A (en) * 2020-04-01 2020-07-10 支付宝(杭州)信息技术有限公司 Method and device for determining semantic similarity of text based on graph data
CN112699966A (en) * 2021-01-14 2021-04-23 中国人民解放军海军航空大学 Radar HRRP small sample target recognition pre-training and fine-tuning method based on deep migration learning
CN112925888A (en) * 2019-12-06 2021-06-08 上海大岂网络科技有限公司 Method and device for training question-answer response and small sample text matching model
CN112926547A (en) * 2021-04-13 2021-06-08 北京航空航天大学 Small sample transfer learning method for classifying and identifying aircraft electric signals
CN113705215A (en) * 2021-08-27 2021-11-26 南京大学 Meta-learning-based large-scale multi-label text classification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015184335A1 (en) * 2014-05-30 2015-12-03 Tootitaki Holdings Pte Ltd Real-time audience segment behavior prediction
CN112925888A (en) * 2019-12-06 2021-06-08 上海大岂网络科技有限公司 Method and device for training question-answer response and small sample text matching model
CN111401928A (en) * 2020-04-01 2020-07-10 支付宝(杭州)信息技术有限公司 Method and device for determining semantic similarity of text based on graph data
CN112699966A (en) * 2021-01-14 2021-04-23 中国人民解放军海军航空大学 Radar HRRP small sample target recognition pre-training and fine-tuning method based on deep migration learning
CN112926547A (en) * 2021-04-13 2021-06-08 北京航空航天大学 Small sample transfer learning method for classifying and identifying aircraft electric signals
CN113705215A (en) * 2021-08-27 2021-11-26 南京大学 Meta-learning-based large-scale multi-label text classification method

Also Published As

Publication number Publication date
CN114385805A (en) 2022-04-22

Similar Documents

Publication Publication Date Title
Luan et al. Scientific information extraction with semi-supervised neural tagging
CN108334891B (en) Task type intention classification method and device
CN108932342A (en) A kind of method of semantic matches, the learning method of model and server
CN110737758A (en) Method and apparatus for generating a model
CN106844349B (en) Comment spam recognition methods based on coorinated training
CN113254667A (en) Scientific and technological figure knowledge graph construction method and device based on deep learning model and terminal
CN110362814B (en) Named entity identification method and device based on improved loss function
CN113887643B (en) New dialogue intention recognition method based on pseudo tag self-training and source domain retraining
CN111127246A (en) Intelligent prediction method for transmission line engineering cost
CN113010683A (en) Entity relationship identification method and system based on improved graph attention network
CN115270797A (en) Text entity extraction method and system based on self-training semi-supervised learning
CN112328748A (en) Method for identifying insurance configuration intention
CN109741824A (en) A kind of medical way of inquisition based on machine learning
CN114462409A (en) Audit field named entity recognition method based on countermeasure training
CN116912624A (en) Pseudo tag unsupervised data training method, device, equipment and medium
CN114722176A (en) Intelligent question answering method, device, medium and electronic equipment
CN108694176A (en) Method, apparatus, electronic equipment and the readable storage medium storing program for executing of document sentiment analysis
CN117151069B (en) Security scheme generation system
Li et al. Dual pseudo supervision for semi-supervised text classification with a reliable teacher
CN112905750A (en) Generation method and device of optimization model
CN114385805B (en) Small sample learning method for improving adaptability of deep text matching model
CN109189915B (en) Information retrieval method based on depth correlation matching model
CN116402025A (en) Sentence breaking method, sentence creating method, training device, sentence breaking equipment and sentence breaking medium
CN114357166B (en) Text classification method based on deep learning
CN115600595A (en) Entity relationship extraction method, system, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant