CN113688621B

CN113688621B - Text matching method and device for texts with different lengths under different granularities

Info

Publication number: CN113688621B
Application number: CN202111023691.3A
Authority: CN
Inventors: 魏骁勇; 谢东霖; 张栩禄; 杨震群
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2023-04-07
Anticipated expiration: 2041-09-01
Also published as: CN113688621A

Abstract

The invention relates to the field of natural language processing, and provides a text matching method and a text matching device for texts with different lengths under different granularities. The invention aims to solve the problems of different matching granularities, multiple subtasks, unbalanced categories and the need to process ultra-long texts. The main scheme of the invention comprises 1) preparing a data set; 2) Data enhancement; 3) Continuing the pre-training on the task-specific data set; 4) Processing the long text; 5) Designing a multi-task framework; 6) Optimizing the multitask weight; 7) And (5) fine adjustment and training of a neural network model structure. The method is used for text matching of texts with different lengths under different granularities.

Description

Text matching method and device for texts with different lengths under different granularities

Technical Field

The invention relates to text matching of texts with different lengths under different granularities, which can be used for matching of texts with different lengths under two granularities of thickness and belongs to the field of text matching of natural language processing.

Background

Text matching is an important basic problem in natural language processing, and can be applied to a large number of NLP tasks, such as information retrieval, question-answering systems, dialogue systems, machine translation and the like, and the NLP tasks can be abstracted into text matching problems to a large extent. For example, web searching may be abstracted as a relevance matching question of web pages and user search Query, automatic question answering may be abstracted as a satisfaction matching question of candidate answers and questions, text deduplication may be abstracted as a similarity matching question of text and text.

The traditional text matching technology comprises algorithms such as BoW, VSM, TF-IDF, BM25, jaccord, simHash and the like, for example, the BM25 algorithm calculates the matching score between a network field and a query field according to the coverage degree of the network field to the query field, and the higher the score, the better the matching degree between a webpage and the query. The method mainly solves the matching problem of the vocabulary level or the similarity problem of the vocabulary level. In practice, matching algorithms based on vocabulary contact ratio have great limitations, including semantic limitations, structural limitations, and knowledge limitations. For example, semantic restrictions: "taxi" and "taxi" are literally dissimilar, but actually are the same vehicle; "apple" in different contexts means something different, either fruit or company. Therefore, for the text matching task, the text matching task cannot only stay at the literal matching level, and the matching at the semantic level is more required. The semantic matching firstly faces the problem of how to represent and calculate the semantics.

In recent years, the deep neural network model has a remarkable effect when being applied to natural language processing tasks, and the deep neural network model can well perform semantic representation on word vectors. Early bag of words model word2vec learned word expressions by using surrounding words to predict the central word and using the central word to predict surrounding words, but this is a static word vector, i.e. the learned word vector is the same expression in different contexts, and different scenarios of solving word ambiguity. The bert model proposed by Google in 2018 well solves the problem of word ambiguity, the structure of the adopted transform can enable word vectors to well add context semantics, and meanwhile, when the transform is applied to a downstream task, the transform is applied to the whole network instead of only using an embedding layer like word2vec, so that the dynamic structure can give different expressions to the same word according to different contexts.

Disclosure of Invention

In view of the above technical problems, the present invention aims to solve the problems of different matching granularities, multiple subtasks, category imbalance and the need to process very long texts.

The technical scheme adopted by the invention is as follows:

a text matching method of texts with different lengths under different granularities comprises the following steps,

step 1, preparing a data set, and labeling text pairs under different matching particle sizes according to the thickness of the matching particle sizes;

step 2, performing data enhancement on the data set in the step 1, and increasing the generalization capability of the model;

step 3, performing model pre-training on the data set subjected to data enhancement to obtain a pre-training model;

step 4, carrying out truncation processing on the long text in the data set subjected to data enhancement in the step 2 to obtain a text after the long text is truncated;

step 5, designing a multi-task framework, wherein information among different model training tasks is also supplemented;

step 6, optimizing the weight of the multi-task framework, and continuing to train the neural network model;

and 7, fine tuning and training of the neural network model structure based on the weight-optimized multitask framework to obtain the neural network model capable of judging whether the text pairs are similar under different granularities.

In the above technical solution, step 1 includes the following steps:

step 1.1, preparing Chinese text pair data sets with different lengths, wherein the Chinese text pair data sets comprise two kinds of matching granularity of thick and thin, the matching of the thick granularity only needs that two texts belong to the same topic, and the matching of the fine granularity requires that the two texts need to describe the same event;

and step 1.2, each granularity comprises three text pairs with different lengths, namely short text pairs, short text pairs and long text pairs, the data sets under the two granularities are not completely the same, and the text pairs under different matching granularities are labeled according to the thickness of the matching granularity.

In the above technical solution, the data enhancement in step 2 includes the following steps:

2.1, enhancing according to a transitivity rule of the similarity;

2.2, enhancement among different granularities, and the conditions are stricter when the fine granularity is matched with the coarse granularity, so that the fine granularity can be used as one enhancement of the coarse granularity.

In the above technical solution, in step 3:

adopting an open-source RoBERTA-WWM Chinese pre-training model, pre-training a text corpus with different lengths in the scene, performing Chinese word segmentation processing on the text in the step 2 by using a jieba library, then randomly selecting 15% of masks in each round, performing mask on 80% of the 15%, immediately replacing 10% of the masks by using other words, and keeping the remaining 10% of the masks unchanged, wherein the masks use a WWM strategy, namely mask masking the whole words to obtain a trained pre-training model;

the loss function for the pre-training phase is as follows:

where N is the total number of samples, M is the number of classes, y _ic Is a symbolic function, y if the sample i belongs to the class c _ic Get 1, otherwise y _ic Take 0,p _ic Representing the probability that the observed sample i belongs to class c.

In the above technical solution, in step 4:

truncating the long texts in the short text pairs and the long text pairs in the data set in the step 2, wherein a truncation method for extracting key sentences is adopted:

4.1, firstly, carrying out clause division on the long text according to separators;

4.2, taking each sentence as a node, dividing the sentences into words, filtering stop words, and calculating the similarity between every two sentences;

4.3, constructing a node connection graph, taking the similarity between sentences as the weight values of edges between corresponding nodes, and setting a threshold value to filter out the edges with lower weight values;

4.4, calculating the weight value of each sentence according to the weight of the edge, and then iteratively propagating the weight value of each node according to the connection graph until convergence;

and 4.5, sequencing in a reverse order according to the weight value of the sentences, and splicing the sentences meeting the length requirement in the original text according to the sequence in the original text to be used as the text after the truncation of the long text.

In the above technical scheme, in step 5

And 5: the method comprises the steps of using a multi-task model training framework of MTL, mutually supplementing information among different model training tasks, setting coarse-grained text matching as a task A, fine-grained text matching as a task B, and using a hard parameter sharing mode in the MTL, namely sharing pre-training model parameters to different model training tasks under the multi-task model framework.

In the above technical solution, in step 6:

optimizing the weights of different model training tasks, wherein in different training stages, each stage sets dynamic weight for each task, the F1 value of each task after each iteration is used as an evaluation standard of model difficulty, if the F1 value of a certain task is larger, the task is easy to learn, and the task has smaller weight, and the calculation formula of the weights is as follows:

wherein w _i Representing the weight, k, of task i at each iteration _i Refer to KPI, a measure of task iWhere is the F1 value, γ _i Is that the hyper-parameter is used to adjust the weight of task i.

In the above technical scheme, in step 7

And 7: weighting samples of a task A and a task B in the multi-task model training task in the step 6, so that the model can focus on the samples which are difficult to learn, increasing the weight of the sample which is wrongly classified at the last time in each iteration, reducing the weight of the sample which is correctly classified, and finally, the loss of the whole model is the loss weighted sum of the two tasks of the task A and the task B, wherein the expression of the loss function of the final model is as follows;

wherein w _A And w _B Respectively, the task A and the task B are calculated according to the weight calculation formula in the step 6 at each iteration, and N is _A And N _B The data amounts, alpha, of task A and task B, respectively _A And alpha _B And gamma _A And gamma _B The hyperparameters, p, for adjusting the sample weights in tasks A and B, respectively _i Is the probability of being predicted as a true value.

And 8: and splicing the text pairs in the data set subjected to data enhancement in the step 2, transmitting the spliced text pairs into the network model in the step 7, training a neural network by using a gradient descending strategy by using a label as supervision information, and obtaining the neural network which can judge whether the pairs are similar under different granularities after a plurality of iteration processes.

The invention also provides a text matching device of texts with different lengths under different granularities, which comprises the following modules,

the data set preparation module is used for preparing a data set, labeling the text pairs under different matching granularities according to the thickness of the matching granularity, and training a neural network;

the data set enhancement module is used for enhancing the data of the data set in the step 1 and increasing the generalization capability of the model;

the pre-training model module is used for performing model pre-training on the data set subjected to data enhancement to obtain a pre-training model;

the long text truncation module is used for truncating the long text in the data set subjected to data enhancement in the step 2 to obtain a text after the long text truncation;

the multi-task framework module is used for designing a multi-task framework, and information among different model training tasks is mutually supplemented;

the weight optimization module is used for optimizing the weight of the multi-task framework and continuing to train the neural network model;

and the neural network module is used for carrying out fine adjustment and training on the neural network model structure based on the multitask framework after weight optimization to obtain a neural network model which has the function of judging whether the text pairs are similar under different granularities.

The invention also provides a storage medium, wherein a program for text matching of texts with different lengths under different granularities is stored in the storage medium, and when the program is executed, the gpu realizes the text matching method of the texts with different lengths under different granularities.

The technology adopted by the invention has the following beneficial effects:

1. in step 5, the invention adopts a multi-task model for a plurality of subtasks, so that different tasks can share one model, the occupied memory amount is reduced, the plurality of tasks obtain a result by one-time forward calculation, the reasoning speed is increased, information is shared by associated tasks and is mutually supplemented, and the performances of different tasks are improved. In step 5, the weight of each subtask is dynamically adjusted, so that the multitask model can be well converged for each subtask in a training stage, and the weight of each sample is increased in step 6, so that the model can better learn hard samples, and the performance of the model is improved;

2. according to the method, the key sentences are extracted from the long text in the step 4, instead of directly carrying out rough head and tail truncation and the like, so that the matching effect of the long text is improved, and in addition, the generalization capability of the model is improved by enhancing the data according to the rules and different granularities in the step 2;

3. in the invention, a text pair splicing mode is adopted in the step 8, compared with the mode that each text is independently input into the Chinese pre-training model to obtain the representation of the text, the training time of the model is reduced, the interaction of the text pairs on the bottom layer is also increased after the spliced text passes through the attention module in the pre-training model, and the performance is also improved.

Drawings

FIG. 1 is a pre-training flow diagram.

FIG. 2 is a diagram of a multitasking model framework.

Detailed Description

The invention provides a text matching method of texts with different lengths under different granularities, and the matching of the texts is carried out on the texts with three types of lengths, namely short and short texts, short and long texts and long and short texts under two matching granularities.

The main process of the invention comprises: 1) Preparing a data set; 2) Data enhancement; 3) Continuing the pre-training on the task-specific data set; 4) Processing the long text; 5) Designing a multitask framework; 6) Optimizing the multitask weight; 7) The method specifically comprises the following steps of fine tuning and training of a neural network model structure:

1. preparing a data set

A Chinese text pair data set with different lengths is prepared, and comprises two matching granularities of thickness and fineness, wherein the matching of the thickness only needs that two texts belong to the same topic, and the matching of the fineness requires that the two texts have to describe the same event. The method comprises three text pairs with different lengths, namely a short text pair, a short text pair and a long text pair, under each granularity, wherein data sets under the two granularities are not completely the same, and the text pairs under different matching granularities are labeled according to the thickness of the matching granularity and are used for training the neural network. The three different types of text pairs generally correspond to different matching scenes, for example, the short text pair can be used for automatic question answering, the short text pair can be used for retrieval, the long text pair can be used for news article recommendation, similar articles are recommended below one piece of news, and the like, different matching granularities also correspond to different application scenes, so that multiple models are required to be trained corresponding to different application scenes, the models are too large, redundancy exists among the models, and the scenes are similar, so that the multi-task framework finally designed by the method only trains one model, and the quantity of model parameters is greatly reduced.

2. Data enhancement

And (3) performing data enhancement on the data set in the step 1 to increase the generalization capability of the model. Two main ways of enhancement are adopted:

1. enhancement is performed according to rules: let the short text be s and the long text be l, then according to the transitivity principle:

1.1 if s _i And s _j Is similar to and s _j And s _k Similarly, then s _i And s and _k similarly;

1.2 if s _i And s _j Is similar to and s _j And l _k Similarly, then s _i And l _k Similarly;

1.3 if s _i And l _j Are similar to each other _j And l _k Similarly, then s _i And l _k Similarly;

1.4, if l _i And l _j Are similar to each other _j And l _k Similarly, then l _i And l _k Similarly;

2. enhancement between different particle sizes:

2.1, if the text pairs are similar under the fine granularity, the text pairs are also similar under the coarse granularity;

2.2, if the text pairs are dissimilar under the fine granularity, the text pairs are certainly dissimilar under the coarse granularity;

the text pairs meeting the conditions under the fine granularity can be used as data enhancement under the coarse granularity;

3. continuing pre-training on a task-specific dataset

Adopting an open-source RoBERTA-WWM Chinese pre-training model, performing further pre-training on text corpora with different lengths in the scene, directly performing population on the text in the step 2 in a pre-training stage, then performing Chinese word segmentation processing by using a jieba library, then randomly selecting 15% of the 15% to perform mask in each round, performing mask on 80% of the 15% and immediately replacing the 10% with other words, wherein the rest 10% is unchanged, the mask uses a WWM (worldwide mask) strategy, namely performing mask on a whole word instead of a certain word in the word, thereby increasing the challenge of the task, simultaneously removing the NSP task, and researches show that the NSP task is too simple, so that the performance of the reverse damage model cannot be improved, and the loss function in the pre-training stage is as follows:

where N is the total number of samples, M is the number of classes, i.e. the number of all words and symbols, y _ic Is a symbolic function, y if the sample i belongs to the class c _ic Get 1, otherwise y _ic Take 0,p _ic Representing the probability of the observation sample i belonging to the category c;

4. processing long text

Since the RoBERTa-wwm model used in step 3 has a limit on the length of the input text (the maximum length is 512), the short-long text pair and the long-long text pair in the data set in step 2 need to be truncated, and here, a truncation method for extracting a key sentence is adopted: 1) Firstly, separating sentences of long texts according to delimiters (periods, exclamation marks, question marks and the like); 2) Taking each sentence as a node, segmenting the sentences, filtering stop words, and calculating the similarity between every two sentences; 3) Constructing a node connection graph, taking the similarity between sentences as the weight of edges between corresponding nodes, and setting a threshold value to filter the edges with lower weight; 4) Calculating the weight value of each sentence according to the edge weight, wherein the calculation formula is as follows, and then iteratively propagating the weight of each node according to the connection graph until convergence; 5) And carrying out reverse sequencing according to the weight of the sentences, and selecting the sentences meeting the length requirement to be spliced according to the sequence in the original text to be used as the text after the truncation of the long text.

Wherein S _i ，S _j Respectively representing two sentences, similarity (S) _i ，S _j ) Denotes S _i ，S _j Similarity of (2), w _k Representing words in sentences, the numerator part is the number of the same word appearing in two sentences at the same time, and the denominator is the sum of logarithms of the numbers of the words in the sentences; w (S) _i ) Representing a sentence S _i D is the damping coefficient, out (S) _i ) Denotes S _i The neighbor node of (2);

for example, for long text "natural language processing" refers to a technique of interactive communication with a machine using natural language used for human communication. The natural language is processed by human, so that the computer can read and understand the natural language. Relevant research in natural language processing begins with human exploration of machine translation. \8230'

1) Results after clause: "natural language processing" refers to a technology for interactive communication with a machine using natural language used for human communication. "and" are processed by human beings to make them readable and understandable by a computer. "relevant research on natural language processing begins with the human exploration of machine translation. "\8230; \8230, the three sentences above are respectively set as s1, s2 and s3. 2) Taking each sentence as a node, segmenting each sentence, and filtering out the result after the words are stopped: ' natural language ', ' processing ', ' finger ', ' use ', ' human ', ' use ', ' natural language ', ' machine ', ' perform ', ' interact ', ' communicate ', ' technique ', ' human ', ' natural language ', ' processing ', ' computer ', ' can ', ' read ', ' understand ' ], [ ' natural language ', ' process ', ' correlation ', ' study ', ' start with ', ' human ', ' machine translation ', ' explore ' ] ' 8230 \8230, and then calculate the similarity between sentences, for example, the similarity between sentences s1 and s2 is 0.443 and the similarity between s1 and s3 is 0.646. 3) And constructing a connection graph according to the calculated weights of the edges. 4) The weights of the continuously iterative propagation calculation sentences are converged straightly, and the weights of s1, s2 and s3 are 0.364, 0.309 and 0.326 respectively. 5) The sentences are sorted from big to small according to the weight, and are spliced according to the position sequence of the original text until the length requirement is met, for example, s1, s3 and s2 are sorted, if the length of s1+ s3 reaches the requirement, the finally processed text is the splicing result of s1 and s3, and finally, the natural language processing refers to the technology of utilizing natural language used by human communication to carry out interactive communication with a machine. Relevant research in natural language processing begins with human exploration of machine translation. ".

5. Designing a multitasking framework

By using a Multi-Task framework of MTL (Multi-Task Learning), information between different tasks can be supplemented with each other, the parameter amount of the model can be greatly reduced, coarse-grained text matching is set as a Task A, fine-grained text matching is set as a Task B, a hard parameter sharing mode in the MTL is used, namely, parameters of a bottom model are shared hard, an output layer of an upper layer and a lower layer of two specific tasks correspond to the tasks A and B, and the parameters of the bottom model are used in the step 3 and the trained Chinese language model.

6. Multitask weight optimization

The weights of different tasks are optimized, because the loss functions of the two tasks A and B are changed in different stages of training and the convergence conditions of each task in different stages are inconsistent, a dynamic weight needs to be set for each task in each stage to ensure that a model is not dominated or biased by a certain task, and when the model tends to fit a certain task, the effects of other tasks are often negatively affected and the effects are relatively poor.

After each iteration, calculating the F1 value of each task, then calculating the weight of each task through the following formula, and when the total loss of the multi-task model is calculated, weighting and back-propagating the loss of each task by using the calculated weight. The F1 value is used as the evaluation criterion of the model difficulty level, if the F1 value of a certain task is larger, the task is easy to learn, and the task should have smaller weight, and the calculation formula of the weight is as follows:

wherein w _i Weight, k, representing task i per iteration _i Refers to the KPI, a measure of task i, here the F1 value, γ _i Is that the hyper-parameter is used to adjust the weight of task i

7. Fine tuning and training of neural network model structures

For the subtasks a and B in the multitask model in step 6, the learning difficulty of each sample in the subtasks is different, so that the samples of each subtask are weighted, the model can focus on the samples which are difficult to learn, the weight of the sample which is wrongly classified last time is increased in each iteration, and the weight of the sample which is correctly classified is reduced. After each iteration, calculating the weight of each sample according to the probability of the predicted true value of each sample, wherein the weight is smaller if the probability of the predicted true value is larger, and otherwise, the weight is larger. Finally, the loss of the whole model is loss weighted sum of the tasks A and B, and the expression of the loss function of the final model is as follows;

wherein w _A And w _B Respectively, the task A and the task B are calculated according to the weight calculation formula in the step 6 at each iteration, and N is _A And N _B The data amounts, alpha, of task A and task B, respectively _A And alpha _B And gamma _A And gamma _B The hyper-parameters, p, for adjusting the sample weights in tasks A, B, respectively _i Is the probability of being predicted as a true value.

And splicing the text pairs processed in the step 3 by using a connector [ SEP ], transmitting the spliced text pairs into the network model in the step 7, performing mean pooling on the output of [ CLS ] of each layer of the pre-training model to be used as the expression of the text pairs, respectively inputting the expression into two output layers, training a neural network by using a gradient descent strategy by using a label as supervision information, and obtaining the neural network which can judge whether the pairs are similar under different granularities after a plurality of iteration processes.

Claims

1. A text matching method of texts with different lengths under different granularities is characterized by comprising the following steps,

step 4, performing truncation processing on the long text in the data set subjected to data enhancement in the step 2 to obtain a text after the long text is truncated;

step 7, fine tuning and training of a neural network model structure are carried out based on the weight-optimized multitask framework, a neural network model which can judge whether text pairs are similar under different granularities is obtained, samples of a task A and a task B in the multitask model training task in the step 6 are weighted, the model can focus on the samples which are difficult to learn, the weight of the sample which is wrongly classified last time is increased during each iteration, the weight of the sample which is correctly classified is reduced, finally the loss of the whole model is the loss weighted sum of the two tasks of the task A and the task B, and the expression of the loss function of the model is as follows;

wherein

And &>

Task A and task B are calculated according to the weight calculation formula in step 6 at each iteration and are then evaluated in conjunction with the evaluation value>

And &>

The data quant, which are respectively task A and task B, are asserted>

And &>

And->

And &>

A hyperparameter, which adjusts the sample weight in task A, B, respectively>

Is the probability of being predicted as a true value;

and 8: and then splicing the text pairs in the data set subjected to data enhancement in the step 2, transmitting the spliced text pairs into the network model in the step 7, training a neural network by using a gradient descending strategy by using a label as supervision information, and obtaining the neural network which can judge whether the pairs are similar under different granularities after a plurality of iterative processes.

2. The method for matching texts with different lengths at different granularities according to claim 1, wherein the step 1 comprises the following steps:

3. The method for matching texts with different lengths under different granularities according to claim 1, wherein the data enhancement in step 2 comprises the following steps:

2.1, enhancing according to the transitivity rule of the similarity;

4. The method for matching texts with different sizes under different granularities according to claim 1, wherein in step 3:

adopting an open-source RoBERTA-WWM Chinese pre-training model, pre-training a text corpus in the scene with different lengths, performing Chinese word segmentation processing on the text in the step 2 by using a jieba library, then randomly selecting 15% of the texts in each round to perform mask, performing mask on 80% of the 15%, immediately replacing 10% of the texts by using other words, and keeping the remaining 10% unchanged, wherein the mask uses a WWM strategy, namely performing mask on the whole words to obtain a trained pre-training model;

the loss function for the pre-training phase is as follows:

wherein

Is the total number of samples, based on the number of samples in the sample group>

Is the number of the category, is>

Is a sign function if the sample->

Is in the category c->

Fetch 1, otherwise->

Take 0 and/or>

Indicates the observation sample->

Belongs to the category>

The probability of (c).

5. The method for matching texts with different sizes under different granularities according to claim 1, wherein in step 4:

4.1, firstly, clauses are carried out on the long text according to separators;

6. The method for matching texts with different lengths under different granularities according to claim 1, wherein in step 5:

and 5: and using a multi-task model training framework of the MTL, mutually supplementing information among different model training tasks, setting coarse-grained text matching as a task A, setting fine-grained text matching as a task B, and using a hard parameter sharing mode in the MTL, namely sharing pre-training model parameters to different model training tasks under the multi-task model framework.

7. The method for matching texts with different sizes under different granularities according to claim 1, wherein in step 6:

wherein

To representEach iteration task pick>

Based on the weight of->

Refer to KPI, i.e. task->

Is the value F1, is a hyperparameter for adjusting tasks->

The weight of (c).

8. A text matching device for texts with different lengths under different granularities is characterized by comprising the following modules,

the data set enhancing module is used for enhancing the data set of the data set preparing module to increase the generalization capability of the model;

the long text truncation module is used for truncating the long text in the data set after the data enhancement of the data set enhancement module to obtain the text after the truncation of the long text;

the neural network module is used for carrying out fine adjustment and training on the neural network model structure based on the multitask framework after weight optimization to obtain a neural network model which can judge whether text pairs are similar under different granularities;

weighting samples of a task A and a task B in a multi-task model training task in a weight optimization module, so that the model can focus on the samples which are difficult to learn, increasing the weight of the sample which is wrongly classified at the last time in each iteration, reducing the weight of the sample which is correctly classified, and finally, the loss of the whole model is the loss weighted sum of the two tasks of the task A and the task B, wherein the expression of the loss function of the final model is as follows;

wherein

And &>

And &>

The data quant, which are respectively task A and task B, are asserted>

And &>

And->

And &>

A hyperparameter in task A, B, respectively, that adjusts the sample weight, based on the current location of the sample in the sample database, and based on the current location of the sample in the database>

Is the probability of being predicted as a true value;

and then splicing the text pairs in the data set after data enhancement in the data set enhancement module into a neural network model of the neural network module, training the neural network by using a gradient descending strategy by using the label as supervision information, and obtaining the neural network which can judge whether the pairs are similar under different granularities after a plurality of iterative processes.

9. A storage medium, wherein the storage medium stores a program for matching texts with different sizes at different granularities, and gpu implements the method for matching texts with different sizes at different granularities according to any one of claims 1 to 7 when executing the program.