CN115687939A

CN115687939A - Mask text matching method and medium based on multi-task learning

Info

Publication number: CN115687939A
Application number: CN202211071421.4A
Authority: CN
Inventors: 张美伟; 崔秋实; 余娟; 吕洋; 余维华; 李文沅; 祝陈哲; 王香霖
Original assignee: Chongqing University; Chongqing Medical University
Current assignee: Chongqing University; Chongqing Medical University
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2023-02-03
Anticipated expiration: 2042-09-02

Abstract

The invention discloses a Mask text matching method and medium based on multi-task learning. The method comprises the following steps: 1) Acquiring at least two texts to be matched; 2) Extracting the characteristics of the texts to be matched to obtain the text word characteristics of each text to be matched; 3) Establishing a text matching model based on BERT; 4) And inputting the text word characteristics of all the texts to be matched into the text matching model to obtain matching results of different texts to be matched. The medium includes a computer program. The invention provides an idea of constructing a Mask matrix simplified model by combining the characteristics of data, and the difference between texts to be matched can be amplified while the model is simplified, so that the generalization capability of the final model training is enhanced.

Description

Mask text matching method and medium based on multi-task learning

Technical Field

The invention relates to the field of natural language processing, in particular to a Mask text matching method and a Mask text matching medium based on multi-task learning.

Background

The text matching method aims to judge whether the semantics between two natural sentences are equivalent, and is an important research direction in the field of natural language processing. The text matching research has high commercial value and plays an important role in the fields of information retrieval, intelligent customer service and the like.

In recent years, although some standard problem matching scores of neural network models have achieved accuracy similar to or even exceeding that of human beings, when real application scene problems are handled, the models are poor in robustness, correct judgment cannot be made on very simple problems (which are easily judged by human beings), and extremely poor product experience and economic loss are caused.

Most of the current text matching tasks are tested on a test set which is distributed with a training set, the effect is good, but the model capability is actually exaggerated, and the real evaluation on the fine granularity advantage and disadvantage of the model is lacked. Therefore, the method focuses on the robustness of the text matching model in a real application scene, finds the defects of the current text matching algorithm model from multiple dimensions such as vocabulary, syntax and pragmatics, and promotes the development of the semantic matching technology in the industrial fields such as intelligent interaction.

The traditional text matching method comprises algorithms such as BoW, VSM, TF-IDF, BM25, jaccod, simHash and the like, for example, the BM25 algorithm calculates the matching score between a query field and a text with higher score through the coverage degree of the query field by a network field, the matching degree of the text with higher score and the query is better, the matching problem of a vocabulary level or the similarity problem of a caption vocabulary level is mainly solved, and in practice, the matching algorithm based on the vocabulary coincidence degree has great limitation because of the following reasons: the word meaning is limited, and the taxi is actually the same vehicle although the words are not similar; "apple" in different contexts means something different, either fruit or company; the structure is limited, although the words of the 'machine learning' and the 'learning machine' are completely overlapped, the expressions have different meanings; knowledge limitation, "the Qinshihuang Dota" means that the sentence has no problem in terms of morphology and syntax, but the sentence is not correct in combination with knowledge, which indicates that for a text matching task, the text matching task cannot only stay at a literal matching level, and needs matching at a semantic level.

With the successful application of deep learning in the fields of computer vision, speech recognition and recommendation systems, many researches have been made in recent years to apply a deep neural network model to a natural language processing task to reduce the cost of feature engineering. The Word Embedding trained based on the neural network performs text matching calculation, the training mode is simple, the semantic computability represented by the obtained Word vector is further enhanced, but the Word Embedding obtained by training only by using label-free data is not different from the theme model technology in the practical effect of text matching degree calculation, and the Word Embedding is essentially based on co-occurrence information training.

The current text matching algorithm is mainly based on a pretrained language model of BERT, and text vector semantic information is improved as much as possible. However, the text vector obtained by the pre-training model cannot well identify the difference of the text in some scenes, such as: the difference between the two sentences of 'Renminbi how to change the harbor coin' and 'Renminbi how to change the people coin' is small in content, but the meanings are quite different, so that if a text vector is obtained by relying on a pre-training model alone, the difference of texts is difficult to capture from dimensions such as vocabulary, syntax, pragmatics and the like.

It can be seen that the current text matching algorithm has the following defects:

1) The statistical-based language model cannot express rich semantic information, and the difference between texts is difficult to capture in some short text matching scenes with smaller difference.

2) The algorithm model based on word vectors, attention and the like needs more labeled data, the model structure is complex, and the structural characteristics of the text, such as syntactic structure, part of speech and the like, are not further mined and utilized.

3) The output result of the pre-training is relatively concerned based on the pre-training text matching model, a more complex network structure is designed for classification according to the output result of the pre-training model, the structural features of the text are not combined with the pre-training model, and therefore some useful prior information is actually lost.

Disclosure of Invention

The invention aims to provide a Mask text matching method based on multitask learning, which comprises the following steps of:

1) Acquiring at least two texts to be matched;

2) Extracting the characteristics of the texts to be matched to obtain the text word characteristics of each text to be matched;

3) Establishing a text matching model based on BERT;

4) And inputting the text word characteristics of all the texts to be matched into the text matching model to obtain matching results of different texts to be matched.

Further, the step of extracting the features of the target text to be matched comprises the following steps: word segmentation processing, part of speech tagging, named entity recognition, semantic role tagging and dependency syntactic analysis.

Further, the target text word feature includes one or more of a part-of-speech feature, a named entity feature, a semantic role feature, and a dependency syntactic relationship feature.

Further, the BERT-based text matching model includes an embedding input layer, a multi-head attention layer, a forward propagation layer, and an output layer.

Further, the step of obtaining matching results of different texts to be matched comprises:

a) Converting text word features by using an embedding input layer to obtain an embedding input X, and converting the embedding input X into a feature component Q = XW ^Q Characteristic component K = XW ^K Characteristic component V = XW ^V ；W ^Q 、W ^K 、W ^V Weights corresponding to different feature components;

b) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), namely:

MultiHead(Q，K，V)＝Concat(head ₁ ，...，head _h )W ^O (1)

in the formula, W ^O Is a weight;

wherein, the parameter head _i As follows:

head _i ＝Attention(QW _i ^Q ，KW _i ^k ，VW _i ^v )，i＝1，2，...，h (2)

Attention(QW _i ^Q ，KW _i ^k ，VW _i ^v )＝Mask*Attention(Q，K，V) (3)

in the formula, softmax is an activation function; d is a radical of _k Representing the dimensions of the word vector, in order to prevent the input value for softmax from being too large, resulting in a derivative close to 0.Mask denotes a Mask; attention (QW) _i ^Q ，KW _i ^k ，VW _i ^v ) Attention (Q, K, V) is an intermediate parameter; h is an integer greater than 0;

c) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using the forward propagation layer to obtain a forward propagation layer processing result x, that is:

x＝norm(X+MultiHead(Q，K，V)) (5)

d) Processing the processing result x of the forward propagation layer by using an output layer to obtain a text matching model output based on BERT, wherein the text matching model output is used as a matching result of different texts to be matched;

the BERT based text matching model output is as follows:

FFN(x)＝max(0，xW ₁ +b ₁ )W ₂ +b ₂ (6)

in the formula, W ₁ 、W ₂ Is a weight; b ₁ 、b ₂ Is an offset; FFN (x) is the output.

Further, the embedding input X = X ₁ +x ₂ ；

Wherein a component x is input ₁ And an input component x ₂ Respectively as follows:

X ₁ ＝E _tok +E _seg +E _pos (7)

x ₂ ＝embedding1(pos)+embedding2(ner)+embedding3(seg) (8)

in the formula, E _tok 、E _seg 、E _pos Token Embedding codes, positionembedding codes and segmentembedding codes which respectively represent the characteristics of text words; the embedding layers of parts of speech, named entities and semantic roles are represented by embedding1, embedding2 and embedding 3; pos, ner, seg represent part of speech, named body, semantic role coding of the input text.

Further, the Mask masks are variables of 0 to 1, mask masks at the same positions of words of different texts to be matched =1, and Mask masks at different positions of words of different texts to be matched =0.

Further, the output of the BERT based text matching model comprises a sequence output and a vector output; the vector output is a classification vector and the sequence output is a part-of-speech tagging vector. The classification vectors include semantically identical and semantically different.

Further, the BERT-based text matching model is pre-trained;

the standard of the pre-training completion is Loss function Loss convergence;

the Loss function Loss is as follows:

Loss＝Loss _nll +Loss _pos-tag (9)

in formula (Loss) _nll Is a classification vector loss function; loss _pos-tag A loss function labeled for part of speech;

wherein, the Loss function Loss of classification vector _nll As follows:

in the formula, n is the number of training samples; j represents the jth sample; z represents the number of classified categories; c represents the c-th category; h is _j，c Represents the probability that the jth sample belongs to the c-th class; y is _j，c Indicating whether the jth sample belongs to the c category; y is _j，c =1 indicates that the jth sample belongs to the c-th class; y is _j，c =0 indicates that the jth sample does not belong to the c-th class;

loss function Loss of part of speech tagging _pos-tag As follows:

in the formula, P ₁ 、P ₂ 、P ₃ 、P _n Marking the scores of the 1 st, 2 nd and nth possible parts of speech sequences corresponding to one sample; p _real-path And marking the score of the sequence for the real part of speech corresponding to one sample.

A computer readable storage medium storing a computer program;

the computer program, when executed by a processor, performs the steps of the method of any one of claims 1 to 9.

The technical effect of the invention is undoubted, and the invention solves the problem that the difference between texts is difficult to capture in scenes such as intelligent interaction, natural language understanding, similar sentence extraction and the like.

Aiming at the problems of complex short text matching model and large parameter, the invention provides the idea of constructing a Mask matrix simplified model by combining the characteristics of data, and can amplify the difference between texts to be matched while simplifying the model, so that the generalization capability of the final model training is enhanced.

Considering that the difference before text matching is small, the difference can be reflected on the linguistic characteristics of the part of speech and the syntactic structure, and the semantic information input by the model is increased by embedding the characteristics of syntax, entities and the like at the input end.

General text matching mainly utilizes sentence vectors for matching, and the method introduces multi-task learning to learn the difference between the parts of speech of the text to be matched from word granularity, thereby enhancing the generalization capability of the model.

Drawings

FIG. 1 is a text matching flow diagram;

FIG. 2 is a flow diagram of text feature mining;

fig. 3 is a Mask schematic.

Detailed Description

The present invention is further illustrated by the following examples, but it should not be construed that the scope of the above-described subject matter is limited to the following examples. Various substitutions and alterations can be made without departing from the technical idea of the invention and the scope of the invention is covered by the present invention according to the common technical knowledge and the conventional means in the field.

Example 1:

referring to fig. 1 to 3, a Mask text matching method based on multitask learning includes the following steps:

1) Acquiring at least two texts to be matched;

2) Performing feature extraction on the texts to be matched to obtain text word features of each text to be matched;

3) Establishing a text matching model based on BERT;

4) Inputting the text word characteristics of all the texts to be matched into the text matching model, and obtaining matching results of different texts to be matched, wherein the matching results comprise semantic similarity and semantic dissimilarity.

The step of extracting the characteristics of the target text to be matched comprises the following steps: the method comprises a series of natural language processing operations such as word segmentation processing, part of speech tagging, named entity recognition, semantic role tagging, dependency syntactic analysis and the like, and the operations are performed by means of a natural language processing technology provided by a Language Technology Platform (LTP) developed by the research center of the Harbour society computing and information retrieval.

The method adopts the named entity recognition technology provided by the Harmony large language technology platform and an iterative heuristic method to carry out named entity recognition. The latter is to obtain the maximum noun phrase by combining the connected nouns, wherein the part of speech of nouns can only be { ni, nh, ns, nz, j }, and respectively represent the name of a mechanism, the name of a person, the name of a geography, other proper nouns and abbreviations.

The target text word features include one or more of a part-of-speech feature, a named entity feature, a semantic role feature, and a dependency syntax relationship feature.

The BERT-based text matching model includes an embedding input layer, a multi-head attention layer, a forward propagation layer, and an output layer.

The step of obtaining the matching results of different texts to be matched comprises the following steps:

1) Converting text word features by using an embedding input layer to obtain an embedding input X, and converting the embedding input X into a feature component Q = XW ^Q Characteristic component K = XW ^K Characteristic component V = XW ^V ；W ^Q 、W ^K 、W ^V Weights corresponding to different feature components;

2) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), that is:

MultiHead(Q，K，V)＝Concat(head ₁ ，...，head _h )W ^O (1)

in the formula, W ^O Is a weight;

wherein,parameter head _i As follows:

Attention(QW _i ^Q ，KW _i ^k ，VW _i ^v )＝Mask*Attention(Q，K，V) (3)

in the formula, softmax is an activation function; d _k Representing the dimensions of the word vector, to prevent the input value of softmax from being too large, resulting in a derivative close to 0.Mask denotes a Mask; attention (QW) _i ^Q ，KW _i ^k ，VW _i ^v ) Attention (Q, K, V) is an intermediate parameter; h is an integer greater than 0;

3) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using the forward propagation layer to obtain a forward propagation layer processing result x, that is:

x＝norm(X+MultiHead(Q，K，V)) (5)

4) Processing the processing result x of the forward propagation layer by using an output layer to obtain a text matching model output based on BERT, wherein the text matching model output is used as a matching result of different texts to be matched;

the BERT-based text matching model output is as follows:

FFN(x)＝max(0，xW ₁ +b ₁ )W ₂ +b ₂ (6)

in the formula, W ₁ 、W ₂ Is a weight; b is a mixture of ₁ 、b ₂ Is an offset. FFN (x) is the output.

The embedding input X = X ₁ +x ₂ ；

X ₁ ＝E _tok +E _seg +E _pos (7)

x ₂ ＝embedding1(pos)+embedding2(ner)+embedding3(seg) (8)

in the formula, E _tok 、E _seg 、E _pos Token Embedding codes, positionembedding codes and segmentembedding codes which respectively represent the characteristics of text words; the embedding layers of parts of speech, named entities and semantic roles are represented by embedding1, embedding2 and embedding 3; pos, ner, seg represent part of speech, named body, semantic role encoding of the input text.

The Mask is a variable of 0-1, mask masks at the same positions of words of different texts to be matched =1, and masks at different positions of words of different texts to be matched =0.

The output of the BERT-based text matching model comprises a sequence output and a vector output; the vector output is a classification vector and the sequence output is a part-of-speech tagged vector. The classification vectors include semantically identical and semantically different.

The BERT-based text matching model is pre-trained;

the standard of the pre-training end is Loss function Loss convergence;

the Loss function Loss is as follows:

Loss＝Loss _nll +Loss _pos-tag (9)

wherein, the Loss function Loss of classification vector _nll As follows:

in the formula, n is the number of training samples; j represents the jth sample; z represents the number of classified categories; c represents the c-th category; h is _j，c Representing the probability that the jth sample belongs to the c-th class; y is _j，c Indicating whether the jth sample belongs to the c category; y is _j，c =1 denotes that the jth sample belongs to the c-th class; y is _j，c =0 indicates that the jth sample does not belong to the thc categories;

loss function Loss of part of speech tagging _pos-tag The calculation principle is as follows:

for any sample, the sequence of part-of-speech tagged categories may be:

part of speech tagging sequence 1: START N-B N-I N-E O

Part of speech tagging sequence 2: START N-B N-E O O O

Part of speech tagging sequence 3: START O N-B N-E O

Part of speech tagging sequence 4: START V-B V-I V-E O

…

Part of speech tagging sequence n: START N-B N-E V-B V-E O

Each possible path has a score of P _i And N paths are total, the total score is:

P _total ＝P ₁ +P ₂ +P ₃ +...+P _n

in the training process, the parameter values of the model are continuously updated along with the iteration of the training process, so that the ratio of the real path is larger and larger.

A computer readable storage medium storing a computer program;

Example 2:

1) The text is processed by a language analysis tool, and characteristics such as part of speech, named entities, semantic roles, dependency syntactic relations and the like can be obtained.

2) And comparing the difference between the text pairs according to the input text to be matched, marking the positions with the same word position and the positions with different words, setting the positions with the same words in a Mask matrix as 0, and otherwise, setting the positions as 1.

The Embedding input of BERT can be expressed as Token Embedding, segmentation Embedding and Position Embedding synthesis x ₁ ：

X ₁ ＝E _word ＝E _tok +E _seg +E _pos

Other linguistic features may be represented as x by the embedding layer ₂ ：

x ₂ ＝embedding1(pos)+embedding2(ner)+embedding3(seg)

Final input x = x ₁ +x ₂

Converting input X to Q, K, V:

Q＝XW ^Q ，K＝XW ^K ，V＝XW ^V

attention calculation formula:

in order to focus on the inconsistent text to be matched by the model, a Mask matrix can be obtained by simple processing at the data processing stage, so that the model does not need to focus on attentions of the whole quantity, and only needs to focus on attentions of characters with inconsistent text:

Attention＝Mask*Attention(Q，K，V)

multi-head attention layer:

MultiHead(Q，K，V)＝Concat(head ₁ ，...，head _h )W ^O

wherein:

head _i ＝Attention(QW _i ^Q ，KW _i ^k ，VW _i ^v )，i＝1，2，...，h

a forward propagation layer:

FFN(x)＝max(0，xW ₁ +b ₁ )W ₂ +b ₂

the output is then:

x＝norm(X+MultiHead(Q，K，V))

output of final encoder:

Y＝FFN(x)

the output comprises sequence output and vector output, in the embodiment, the vector output is used as a classification vector, the sequence output is used as a part-of-speech tagging vector of a second task, and aiming at the classification vector, a first Loss function Loss can be obtained _KL Comprises the following steps:

the loss function for part-of-speech tagging is: loss _pos-tag The final loss function will be:

Loss＝Loss _nll +Loss _pos-tag

example 3:

a Mask text matching method based on multitask learning comprises the following steps:

1) Acquiring at least two texts to be matched;

3) Establishing a text matching model based on BERT;

Example 4:

a Mask text matching method based on multitask learning is disclosed in embodiment 3, wherein the step of extracting the features of the target text to be matched comprises the following steps: word segmentation processing, part of speech tagging, named entity identification, semantic role tagging and dependency syntactic analysis.

Example 5:

the main content of the Mask text matching method based on multitask learning is shown in an embodiment 3, wherein the target text word characteristics comprise one or more of part-of-speech characteristics, named entity characteristics, semantic role characteristics and dependency syntactic relation characteristics.

Example 6:

a Mask text matching method based on multitask learning mainly comprises an embodiment 3, wherein the text matching model based on BERT comprises an embedding input layer, a multi-head attention layer, a forward propagation layer and an output layer.

Example 7:

a Mask text matching method based on multi-task learning mainly comprises the following steps of embodiment 3, wherein the step of obtaining matching results of different texts to be matched comprises the following steps:

MultiHead(Q，K，V)＝Concat(head ₁ ，...，head _h )W ^O (1)

in the formula, W ^O Is a weight;

wherein, the parameter head _i As follows:

Attention(QW _i ^Q ，KW _i ^k ，VW _i ^v )＝Mask*Attention(Q，K，V) (3)

wherein softmax is an activation function; d _k A dimension representing a word vector; mask denotes a Mask; attention (QW) _i ^Q ，KW _i ^k ，VW _i ^v )、Attention(Q，K，V) Is an intermediate parameter; h is an integer greater than 0;

x＝norm(X+MultiHead(Q，K，V)) (5)

the BERT-based text matching model output is as follows:

FFN(x)＝max(0，xW ₁ +b ₁ )W ₂ +b ₂ (6)

Example 8:

a Mask text matching method based on multi-task learning mainly comprises the following content of embodiment 3, wherein the embedding input X = X ₁ +x ₂ ；

X ₁ ＝E _tok +E _seg +E _pos (7)

x ₂ ＝embedding1(pos)+embedding2(ner)+embedding3(seg) (8)

Example 9:

a Mask text matching method based on multitask learning mainly comprises the following steps of embodiment 3, wherein Mask masks are 0-1 variables, mask masks at the same positions of words of different texts to be matched =1, and Mask masks at different positions of words of different texts to be matched =0.

Example 10:

a Mask text matching method based on multitask learning mainly comprises the steps of embodiment 3, wherein the output of a text matching model based on BERT comprises sequence output and vector output; the vector output is a classification vector, and the sequence output is a part-of-speech tagging vector; the classification vectors include semantically identical and semantically different.

Example 11:

a Mask text matching method based on multitask learning mainly comprises the following steps of (1) embodiment 3, wherein a text matching model based on BERT is pre-trained;

the standard of the pre-training end is Loss function Loss convergence;

the Loss function Loss is as follows:

Loss＝Loss _nll +Loss _pos-tag (9)

in the formula, loss _nll Is a classification vector loss function; loss _pos-tag A loss function labeled for part of speech;

wherein, the Loss function Loss of classification vector _nll As follows:

in the formula, n is the number of training samples; j represents the jth sample; z represents the number of classified categories; c represents the c-th category; h is _j，c Represents the probability that the jth sample belongs to the c-th class; y is _j，c Indicating whether the jth sample belongs to the c category; y is _j，c =1 denotes that the jth sample belongs to the c-th class; y is _j，c =0 indicates that the jth sample does not belong to the c-th class;

loss function Loss of part of speech tagging _pos-tag As follows:

in the formula, P ₁ 、P ₂ 、P ₃ 、P _n Marking the scores of the 1 st, 2 nd and nth possible parts of speech tagging sequences corresponding to one sample; p _real-path And marking the score of the sequence for the real part of speech corresponding to one sample.

Example 12:

a computer readable storage medium storing a computer program; the computer program, when executed by a processor, performs the steps of the method of embodiments 1-11.

Claims

1. A Mask text matching method based on multitask learning is characterized by comprising the following steps:

1) And acquiring at least two texts to be matched.

2) And extracting the characteristics of the texts to be matched to obtain the text word characteristics of each text to be matched.

3) Establishing a text matching model based on BERT;

2. The Mask text matching method based on multitask learning according to claim 1, wherein the step of extracting the features of the target text to be matched comprises the following steps: word segmentation processing, part of speech tagging, named entity recognition, semantic role tagging and dependency syntactic analysis.

3. The Mask text matching method based on multitask learning according to claim 1, characterized in that said target text word characteristics include one or more of part-of-speech characteristics, named entity characteristics, semantic role characteristics and dependency syntactic relation characteristics.

4. The Mask text matching method based on multitask learning as claimed in claim 1, wherein said BERT based text matching model includes embedding input layer, multi-head attention layer, forward propagation layer and output layer.

5. The Mask text matching method based on multitask learning according to claim 1, wherein the step of obtaining matching results of different texts to be matched comprises:

2) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), namely:

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^O (1)

in the formula, W ^O Is a weight;

wherein, the parameter head _i As follows:

head _i ＝Attention(QW _i ^Q ,KW _i ^k ,VW _i ^v ),i＝1,2,...,h (2)

Attention(QW _i ^Q ,KW _i ^k ,VW _i ^v )＝Mask*Attention(Q,K,V) (3)

in the formula, softmax is an activation function; d is a radical of _k A dimension representing a word vector; mask denotes a Mask; attention (QW) _i ^Q ,KW _i ^k ,VW _i ^v ) Attention (Q, K, V) is an intermediate parameter; h is an integer greater than 0;

3) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using a forward propagation layer to obtain a forward propagation layer processing result x, that is:

x＝norm(X+MultiHead(Q,K,V)) (5)

4) Processing the processing result x of the forward propagation layer by utilizing an output layer to obtain a text matching model output based on BERT, and taking the text matching model output as matching results of different texts to be matched;

the BERT-based text matching model output is as follows:

FFN(x)＝max(0,xW ₁ +b ₁ )W ₂ +b ₂ (6)

6. The Mask text matching method based on multitask learning as claimed in claim 5, characterized in that said embedding input X = X ₁ +x ₂ ；

X ₁ ＝E _tok +E _seg +E _pos (7)

x ₂ ＝embedding1(pos)+embedding2(ner)+embedding3(seg) (8)

7. The Mask text matching method based on multitask learning according to claim 5, wherein the Mask is a variable 0-1, mask masks =1 at the same positions of words of different texts to be matched, and Mask masks =0 at different positions of words of different texts to be matched.

8. The Mask text matching method based on multitask learning according to claim 1, characterized in that the output of said text matching model based on BERT includes sequence output and vector output; the vector output is a classification vector, and the sequence output is a part-of-speech tagging vector; the classification vectors include semantically identical and semantically different.

9. The Mask text matching method based on multitask learning according to claim 1, characterized in that said text matching model based on BERT is pre-trained;

the standard of the pre-training end is Loss function Loss convergence;

the Loss function Loss is as follows:

Loss＝Loss _nll +Loss _pos-tag (9)

wherein, the Loss function Loss of classification vector _nll As follows:

in the formula, n is the number of training samples; j represents the jth sample; z represents the number of categories classified; c represents the c-th category; h is a total of _j,c Represents the probability that the jth sample belongs to the c-th class; y is _j,c Indicating whether the jth sample belongs to the c category; y is _j,c =1 indicates that the jth sample belongs to the c-th class; y is _j,c =0 indicates that the jth sample does not belong to the c-th class;

loss function Loss of part of speech tagging _pos-tag As follows:

in the formula, P ₁ 、P ₂ 、P ₃ 、P _n The 1 st, 2 nd and n th possible part-of-speech tags corresponding to one sampleScoring of the annotation sequence; p is _real-path And marking the score of the sequence for the real part of speech corresponding to one sample.

10. A computer-readable storage medium, characterized in that the computer-readable medium stores a computer program;