CN115687939A - Mask text matching method and medium based on multi-task learning - Google Patents
Mask text matching method and medium based on multi-task learning Download PDFInfo
- Publication number
- CN115687939A CN115687939A CN202211071421.4A CN202211071421A CN115687939A CN 115687939 A CN115687939 A CN 115687939A CN 202211071421 A CN202211071421 A CN 202211071421A CN 115687939 A CN115687939 A CN 115687939A
- Authority
- CN
- China
- Prior art keywords
- text
- mask
- matched
- text matching
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 22
- 238000004590 computer program Methods 0.000 claims abstract description 9
- 239000013598 vector Substances 0.000 claims description 43
- 238000012545 processing Methods 0.000 claims description 34
- 230000006870 function Effects 0.000 claims description 31
- 238000004458 analytical method Methods 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 abstract description 4
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000003058 natural language processing Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 241000282414 Homo sapiens Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a Mask text matching method and medium based on multi-task learning. The method comprises the following steps: 1) Acquiring at least two texts to be matched; 2) Extracting the characteristics of the texts to be matched to obtain the text word characteristics of each text to be matched; 3) Establishing a text matching model based on BERT; 4) And inputting the text word characteristics of all the texts to be matched into the text matching model to obtain matching results of different texts to be matched. The medium includes a computer program. The invention provides an idea of constructing a Mask matrix simplified model by combining the characteristics of data, and the difference between texts to be matched can be amplified while the model is simplified, so that the generalization capability of the final model training is enhanced.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a Mask text matching method and a Mask text matching medium based on multi-task learning.
Background
The text matching method aims to judge whether the semantics between two natural sentences are equivalent, and is an important research direction in the field of natural language processing. The text matching research has high commercial value and plays an important role in the fields of information retrieval, intelligent customer service and the like.
In recent years, although some standard problem matching scores of neural network models have achieved accuracy similar to or even exceeding that of human beings, when real application scene problems are handled, the models are poor in robustness, correct judgment cannot be made on very simple problems (which are easily judged by human beings), and extremely poor product experience and economic loss are caused.
Most of the current text matching tasks are tested on a test set which is distributed with a training set, the effect is good, but the model capability is actually exaggerated, and the real evaluation on the fine granularity advantage and disadvantage of the model is lacked. Therefore, the method focuses on the robustness of the text matching model in a real application scene, finds the defects of the current text matching algorithm model from multiple dimensions such as vocabulary, syntax and pragmatics, and promotes the development of the semantic matching technology in the industrial fields such as intelligent interaction.
The traditional text matching method comprises algorithms such as BoW, VSM, TF-IDF, BM25, jaccod, simHash and the like, for example, the BM25 algorithm calculates the matching score between a query field and a text with higher score through the coverage degree of the query field by a network field, the matching degree of the text with higher score and the query is better, the matching problem of a vocabulary level or the similarity problem of a caption vocabulary level is mainly solved, and in practice, the matching algorithm based on the vocabulary coincidence degree has great limitation because of the following reasons: the word meaning is limited, and the taxi is actually the same vehicle although the words are not similar; "apple" in different contexts means something different, either fruit or company; the structure is limited, although the words of the 'machine learning' and the 'learning machine' are completely overlapped, the expressions have different meanings; knowledge limitation, "the Qinshihuang Dota" means that the sentence has no problem in terms of morphology and syntax, but the sentence is not correct in combination with knowledge, which indicates that for a text matching task, the text matching task cannot only stay at a literal matching level, and needs matching at a semantic level.
With the successful application of deep learning in the fields of computer vision, speech recognition and recommendation systems, many researches have been made in recent years to apply a deep neural network model to a natural language processing task to reduce the cost of feature engineering. The Word Embedding trained based on the neural network performs text matching calculation, the training mode is simple, the semantic computability represented by the obtained Word vector is further enhanced, but the Word Embedding obtained by training only by using label-free data is not different from the theme model technology in the practical effect of text matching degree calculation, and the Word Embedding is essentially based on co-occurrence information training.
The current text matching algorithm is mainly based on a pretrained language model of BERT, and text vector semantic information is improved as much as possible. However, the text vector obtained by the pre-training model cannot well identify the difference of the text in some scenes, such as: the difference between the two sentences of 'Renminbi how to change the harbor coin' and 'Renminbi how to change the people coin' is small in content, but the meanings are quite different, so that if a text vector is obtained by relying on a pre-training model alone, the difference of texts is difficult to capture from dimensions such as vocabulary, syntax, pragmatics and the like.
It can be seen that the current text matching algorithm has the following defects:
1) The statistical-based language model cannot express rich semantic information, and the difference between texts is difficult to capture in some short text matching scenes with smaller difference.
2) The algorithm model based on word vectors, attention and the like needs more labeled data, the model structure is complex, and the structural characteristics of the text, such as syntactic structure, part of speech and the like, are not further mined and utilized.
3) The output result of the pre-training is relatively concerned based on the pre-training text matching model, a more complex network structure is designed for classification according to the output result of the pre-training model, the structural features of the text are not combined with the pre-training model, and therefore some useful prior information is actually lost.
Disclosure of Invention
The invention aims to provide a Mask text matching method based on multitask learning, which comprises the following steps of:
1) Acquiring at least two texts to be matched;
2) Extracting the characteristics of the texts to be matched to obtain the text word characteristics of each text to be matched;
3) Establishing a text matching model based on BERT;
4) And inputting the text word characteristics of all the texts to be matched into the text matching model to obtain matching results of different texts to be matched.
Further, the step of extracting the features of the target text to be matched comprises the following steps: word segmentation processing, part of speech tagging, named entity recognition, semantic role tagging and dependency syntactic analysis.
Further, the target text word feature includes one or more of a part-of-speech feature, a named entity feature, a semantic role feature, and a dependency syntactic relationship feature.
Further, the BERT-based text matching model includes an embedding input layer, a multi-head attention layer, a forward propagation layer, and an output layer.
Further, the step of obtaining matching results of different texts to be matched comprises:
a) Converting text word features by using an embedding input layer to obtain an embedding input X, and converting the embedding input X into a feature component Q = XW Q Characteristic component K = XW K Characteristic component V = XW V ;W Q 、W K 、W V Weights corresponding to different feature components;
b) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), namely:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (1)
in the formula, W O Is a weight;
wherein, the parameter head i As follows:
head i =Attention(QW i Q ,KW i k ,VW i v ),i=1,2,...,h (2)
Attention(QW i Q ,KW i k ,VW i v )=Mask*Attention(Q,K,V) (3)
in the formula, softmax is an activation function; d is a radical of k Representing the dimensions of the word vector, in order to prevent the input value for softmax from being too large, resulting in a derivative close to 0.Mask denotes a Mask; attention (QW) i Q ,KW i k ,VW i v ) Attention (Q, K, V) is an intermediate parameter; h is an integer greater than 0;
c) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using the forward propagation layer to obtain a forward propagation layer processing result x, that is:
x=norm(X+MultiHead(Q,K,V)) (5)
d) Processing the processing result x of the forward propagation layer by using an output layer to obtain a text matching model output based on BERT, wherein the text matching model output is used as a matching result of different texts to be matched;
the BERT based text matching model output is as follows:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2 (6)
in the formula, W 1 、W 2 Is a weight; b 1 、b 2 Is an offset; FFN (x) is the output.
Further, the embedding input X = X 1 +x 2 ;
Wherein a component x is input 1 And an input component x 2 Respectively as follows:
X 1 =E tok +E seg +E pos (7)
x 2 =embedding1(pos)+embedding2(ner)+embedding3(seg) (8)
in the formula, E tok 、E seg 、E pos Token Embedding codes, positionembedding codes and segmentembedding codes which respectively represent the characteristics of text words; the embedding layers of parts of speech, named entities and semantic roles are represented by embedding1, embedding2 and embedding 3; pos, ner, seg represent part of speech, named body, semantic role coding of the input text.
Further, the Mask masks are variables of 0 to 1, mask masks at the same positions of words of different texts to be matched =1, and Mask masks at different positions of words of different texts to be matched =0.
Further, the output of the BERT based text matching model comprises a sequence output and a vector output; the vector output is a classification vector and the sequence output is a part-of-speech tagging vector. The classification vectors include semantically identical and semantically different.
Further, the BERT-based text matching model is pre-trained;
the standard of the pre-training completion is Loss function Loss convergence;
the Loss function Loss is as follows:
Loss=Loss nll +Loss pos-tag (9)
in formula (Loss) nll Is a classification vector loss function; loss pos-tag A loss function labeled for part of speech;
wherein, the Loss function Loss of classification vector nll As follows:
in the formula, n is the number of training samples; j represents the jth sample; z represents the number of classified categories; c represents the c-th category; h is j,c Represents the probability that the jth sample belongs to the c-th class; y is j,c Indicating whether the jth sample belongs to the c category; y is j,c =1 indicates that the jth sample belongs to the c-th class; y is j,c =0 indicates that the jth sample does not belong to the c-th class;
loss function Loss of part of speech tagging pos-tag As follows:
in the formula, P 1 、P 2 、P 3 、P n Marking the scores of the 1 st, 2 nd and nth possible parts of speech sequences corresponding to one sample; p real-path And marking the score of the sequence for the real part of speech corresponding to one sample.
A computer readable storage medium storing a computer program;
the computer program, when executed by a processor, performs the steps of the method of any one of claims 1 to 9.
The technical effect of the invention is undoubted, and the invention solves the problem that the difference between texts is difficult to capture in scenes such as intelligent interaction, natural language understanding, similar sentence extraction and the like.
Aiming at the problems of complex short text matching model and large parameter, the invention provides the idea of constructing a Mask matrix simplified model by combining the characteristics of data, and can amplify the difference between texts to be matched while simplifying the model, so that the generalization capability of the final model training is enhanced.
Considering that the difference before text matching is small, the difference can be reflected on the linguistic characteristics of the part of speech and the syntactic structure, and the semantic information input by the model is increased by embedding the characteristics of syntax, entities and the like at the input end.
General text matching mainly utilizes sentence vectors for matching, and the method introduces multi-task learning to learn the difference between the parts of speech of the text to be matched from word granularity, thereby enhancing the generalization capability of the model.
Drawings
FIG. 1 is a text matching flow diagram;
FIG. 2 is a flow diagram of text feature mining;
fig. 3 is a Mask schematic.
Detailed Description
The present invention is further illustrated by the following examples, but it should not be construed that the scope of the above-described subject matter is limited to the following examples. Various substitutions and alterations can be made without departing from the technical idea of the invention and the scope of the invention is covered by the present invention according to the common technical knowledge and the conventional means in the field.
Example 1:
referring to fig. 1 to 3, a Mask text matching method based on multitask learning includes the following steps:
1) Acquiring at least two texts to be matched;
2) Performing feature extraction on the texts to be matched to obtain text word features of each text to be matched;
3) Establishing a text matching model based on BERT;
4) Inputting the text word characteristics of all the texts to be matched into the text matching model, and obtaining matching results of different texts to be matched, wherein the matching results comprise semantic similarity and semantic dissimilarity.
The step of extracting the characteristics of the target text to be matched comprises the following steps: the method comprises a series of natural language processing operations such as word segmentation processing, part of speech tagging, named entity recognition, semantic role tagging, dependency syntactic analysis and the like, and the operations are performed by means of a natural language processing technology provided by a Language Technology Platform (LTP) developed by the research center of the Harbour society computing and information retrieval.
The method adopts the named entity recognition technology provided by the Harmony large language technology platform and an iterative heuristic method to carry out named entity recognition. The latter is to obtain the maximum noun phrase by combining the connected nouns, wherein the part of speech of nouns can only be { ni, nh, ns, nz, j }, and respectively represent the name of a mechanism, the name of a person, the name of a geography, other proper nouns and abbreviations.
The target text word features include one or more of a part-of-speech feature, a named entity feature, a semantic role feature, and a dependency syntax relationship feature.
The BERT-based text matching model includes an embedding input layer, a multi-head attention layer, a forward propagation layer, and an output layer.
The step of obtaining the matching results of different texts to be matched comprises the following steps:
1) Converting text word features by using an embedding input layer to obtain an embedding input X, and converting the embedding input X into a feature component Q = XW Q Characteristic component K = XW K Characteristic component V = XW V ;W Q 、W K 、W V Weights corresponding to different feature components;
2) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), that is:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (1)
in the formula, W O Is a weight;
wherein,parameter head i As follows:
head i =Attention(QW i Q ,KW i k ,VW i v ),i=1,2,...,h (2)
Attention(QW i Q ,KW i k ,VW i v )=Mask*Attention(Q,K,V) (3)
in the formula, softmax is an activation function; d k Representing the dimensions of the word vector, to prevent the input value of softmax from being too large, resulting in a derivative close to 0.Mask denotes a Mask; attention (QW) i Q ,KW i k ,VW i v ) Attention (Q, K, V) is an intermediate parameter; h is an integer greater than 0;
3) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using the forward propagation layer to obtain a forward propagation layer processing result x, that is:
x=norm(X+MultiHead(Q,K,V)) (5)
4) Processing the processing result x of the forward propagation layer by using an output layer to obtain a text matching model output based on BERT, wherein the text matching model output is used as a matching result of different texts to be matched;
the BERT-based text matching model output is as follows:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2 (6)
in the formula, W 1 、W 2 Is a weight; b is a mixture of 1 、b 2 Is an offset. FFN (x) is the output.
The embedding input X = X 1 +x 2 ;
Wherein a component x is input 1 And an input component x 2 Respectively as follows:
X 1 =E tok +E seg +E pos (7)
x 2 =embedding1(pos)+embedding2(ner)+embedding3(seg) (8)
in the formula, E tok 、E seg 、E pos Token Embedding codes, positionembedding codes and segmentembedding codes which respectively represent the characteristics of text words; the embedding layers of parts of speech, named entities and semantic roles are represented by embedding1, embedding2 and embedding 3; pos, ner, seg represent part of speech, named body, semantic role encoding of the input text.
The Mask is a variable of 0-1, mask masks at the same positions of words of different texts to be matched =1, and masks at different positions of words of different texts to be matched =0.
The output of the BERT-based text matching model comprises a sequence output and a vector output; the vector output is a classification vector and the sequence output is a part-of-speech tagged vector. The classification vectors include semantically identical and semantically different.
The BERT-based text matching model is pre-trained;
the standard of the pre-training end is Loss function Loss convergence;
the Loss function Loss is as follows:
Loss=Loss nll +Loss pos-tag (9)
in formula (Loss) nll Is a classification vector loss function; loss pos-tag A loss function labeled for part of speech;
wherein, the Loss function Loss of classification vector nll As follows:
in the formula, n is the number of training samples; j represents the jth sample; z represents the number of classified categories; c represents the c-th category; h is j,c Representing the probability that the jth sample belongs to the c-th class; y is j,c Indicating whether the jth sample belongs to the c category; y is j,c =1 denotes that the jth sample belongs to the c-th class; y is j,c =0 indicates that the jth sample does not belong to the thc categories;
loss function Loss of part of speech tagging pos-tag The calculation principle is as follows:
for any sample, the sequence of part-of-speech tagged categories may be:
part of speech tagging sequence 1: START N-B N-I N-E O
Part of speech tagging sequence 2: START N-B N-E O O O
Part of speech tagging sequence 3: START O N-B N-E O
Part of speech tagging sequence 4: START V-B V-I V-E O
…
Part of speech tagging sequence n: START N-B N-E V-B V-E O
Each possible path has a score of P i And N paths are total, the total score is:
P total =P 1 +P 2 +P 3 +...+P n
in the training process, the parameter values of the model are continuously updated along with the iteration of the training process, so that the ratio of the real path is larger and larger.
A computer readable storage medium storing a computer program;
the computer program, when executed by a processor, performs the steps of the method of any one of claims 1 to 9.
Example 2:
referring to fig. 1 to 3, a Mask text matching method based on multitask learning includes the following steps:
1) The text is processed by a language analysis tool, and characteristics such as part of speech, named entities, semantic roles, dependency syntactic relations and the like can be obtained.
2) And comparing the difference between the text pairs according to the input text to be matched, marking the positions with the same word position and the positions with different words, setting the positions with the same words in a Mask matrix as 0, and otherwise, setting the positions as 1.
The Embedding input of BERT can be expressed as Token Embedding, segmentation Embedding and Position Embedding synthesis x 1 :
X 1 =E word =E tok +E seg +E pos
Other linguistic features may be represented as x by the embedding layer 2 :
x 2 =embedding1(pos)+embedding2(ner)+embedding3(seg)
Final input x = x 1 +x 2
Converting input X to Q, K, V:
Q=XW Q ,K=XW K ,V=XW V
attention calculation formula:
in order to focus on the inconsistent text to be matched by the model, a Mask matrix can be obtained by simple processing at the data processing stage, so that the model does not need to focus on attentions of the whole quantity, and only needs to focus on attentions of characters with inconsistent text:
Attention=Mask*Attention(Q,K,V)
multi-head attention layer:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O
wherein:
head i =Attention(QW i Q ,KW i k ,VW i v ),i=1,2,...,h
a forward propagation layer:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2
the output is then:
x=norm(X+MultiHead(Q,K,V))
output of final encoder:
Y=FFN(x)
the output comprises sequence output and vector output, in the embodiment, the vector output is used as a classification vector, the sequence output is used as a part-of-speech tagging vector of a second task, and aiming at the classification vector, a first Loss function Loss can be obtained KL Comprises the following steps:
the loss function for part-of-speech tagging is: loss pos-tag The final loss function will be:
Loss=Loss nll +Loss pos-tag
example 3:
a Mask text matching method based on multitask learning comprises the following steps:
1) Acquiring at least two texts to be matched;
2) Performing feature extraction on the texts to be matched to obtain text word features of each text to be matched;
3) Establishing a text matching model based on BERT;
4) And inputting the text word characteristics of all the texts to be matched into the text matching model to obtain matching results of different texts to be matched.
Example 4:
a Mask text matching method based on multitask learning is disclosed in embodiment 3, wherein the step of extracting the features of the target text to be matched comprises the following steps: word segmentation processing, part of speech tagging, named entity identification, semantic role tagging and dependency syntactic analysis.
Example 5:
the main content of the Mask text matching method based on multitask learning is shown in an embodiment 3, wherein the target text word characteristics comprise one or more of part-of-speech characteristics, named entity characteristics, semantic role characteristics and dependency syntactic relation characteristics.
Example 6:
a Mask text matching method based on multitask learning mainly comprises an embodiment 3, wherein the text matching model based on BERT comprises an embedding input layer, a multi-head attention layer, a forward propagation layer and an output layer.
Example 7:
a Mask text matching method based on multi-task learning mainly comprises the following steps of embodiment 3, wherein the step of obtaining matching results of different texts to be matched comprises the following steps:
1) Converting text word features by using an embedding input layer to obtain an embedding input X, and converting the embedding input X into a feature component Q = XW Q Characteristic component K = XW K Characteristic component V = XW V ;W Q 、W K 、W V Weights corresponding to different feature components;
2) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), that is:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (1)
in the formula, W O Is a weight;
wherein, the parameter head i As follows:
head i =Attention(QW i Q ,KW i k ,VW i v ),i=1,2,...,h (2)
Attention(QW i Q ,KW i k ,VW i v )=Mask*Attention(Q,K,V) (3)
wherein softmax is an activation function; d k A dimension representing a word vector; mask denotes a Mask; attention (QW) i Q ,KW i k ,VW i v )、Attention(Q,K,V) Is an intermediate parameter; h is an integer greater than 0;
3) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using the forward propagation layer to obtain a forward propagation layer processing result x, that is:
x=norm(X+MultiHead(Q,K,V)) (5)
4) Processing the processing result x of the forward propagation layer by using an output layer to obtain a text matching model output based on BERT, wherein the text matching model output is used as a matching result of different texts to be matched;
the BERT-based text matching model output is as follows:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2 (6)
in the formula, W 1 、W 2 Is a weight; b 1 、b 2 Is an offset; FFN (x) is the output.
Example 8:
a Mask text matching method based on multi-task learning mainly comprises the following content of embodiment 3, wherein the embedding input X = X 1 +x 2 ;
Wherein a component x is input 1 And an input component x 2 Respectively as follows:
X 1 =E tok +E seg +E pos (7)
x 2 =embedding1(pos)+embedding2(ner)+embedding3(seg) (8)
in the formula, E tok 、E seg 、E pos Token Embedding codes, positionembedding codes and segmentembedding codes which respectively represent the characteristics of text words; the embedding layers of parts of speech, named entities and semantic roles are represented by embedding1, embedding2 and embedding 3; pos, ner, seg represent part of speech, named body, semantic role coding of the input text.
Example 9:
a Mask text matching method based on multitask learning mainly comprises the following steps of embodiment 3, wherein Mask masks are 0-1 variables, mask masks at the same positions of words of different texts to be matched =1, and Mask masks at different positions of words of different texts to be matched =0.
Example 10:
a Mask text matching method based on multitask learning mainly comprises the steps of embodiment 3, wherein the output of a text matching model based on BERT comprises sequence output and vector output; the vector output is a classification vector, and the sequence output is a part-of-speech tagging vector; the classification vectors include semantically identical and semantically different.
Example 11:
a Mask text matching method based on multitask learning mainly comprises the following steps of (1) embodiment 3, wherein a text matching model based on BERT is pre-trained;
the standard of the pre-training end is Loss function Loss convergence;
the Loss function Loss is as follows:
Loss=Loss nll +Loss pos-tag (9)
in the formula, loss nll Is a classification vector loss function; loss pos-tag A loss function labeled for part of speech;
wherein, the Loss function Loss of classification vector nll As follows:
in the formula, n is the number of training samples; j represents the jth sample; z represents the number of classified categories; c represents the c-th category; h is j,c Represents the probability that the jth sample belongs to the c-th class; y is j,c Indicating whether the jth sample belongs to the c category; y is j,c =1 denotes that the jth sample belongs to the c-th class; y is j,c =0 indicates that the jth sample does not belong to the c-th class;
loss function Loss of part of speech tagging pos-tag As follows:
in the formula, P 1 、P 2 、P 3 、P n Marking the scores of the 1 st, 2 nd and nth possible parts of speech tagging sequences corresponding to one sample; p real-path And marking the score of the sequence for the real part of speech corresponding to one sample.
Example 12:
a computer readable storage medium storing a computer program; the computer program, when executed by a processor, performs the steps of the method of embodiments 1-11.
Claims (10)
1. A Mask text matching method based on multitask learning is characterized by comprising the following steps:
1) And acquiring at least two texts to be matched.
2) And extracting the characteristics of the texts to be matched to obtain the text word characteristics of each text to be matched.
3) Establishing a text matching model based on BERT;
4) And inputting the text word characteristics of all the texts to be matched into the text matching model to obtain matching results of different texts to be matched.
2. The Mask text matching method based on multitask learning according to claim 1, wherein the step of extracting the features of the target text to be matched comprises the following steps: word segmentation processing, part of speech tagging, named entity recognition, semantic role tagging and dependency syntactic analysis.
3. The Mask text matching method based on multitask learning according to claim 1, characterized in that said target text word characteristics include one or more of part-of-speech characteristics, named entity characteristics, semantic role characteristics and dependency syntactic relation characteristics.
4. The Mask text matching method based on multitask learning as claimed in claim 1, wherein said BERT based text matching model includes embedding input layer, multi-head attention layer, forward propagation layer and output layer.
5. The Mask text matching method based on multitask learning according to claim 1, wherein the step of obtaining matching results of different texts to be matched comprises:
1) Converting text word features by using an embedding input layer to obtain an embedding input X, and converting the embedding input X into a feature component Q = XW Q Characteristic component K = XW K Characteristic component V = XW V ;W Q 、W K 、W V Weights corresponding to different feature components;
2) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), namely:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (1)
in the formula, W O Is a weight;
wherein, the parameter head i As follows:
head i =Attention(QW i Q ,KW i k ,VW i v ),i=1,2,...,h (2)
Attention(QW i Q ,KW i k ,VW i v )=Mask*Attention(Q,K,V) (3)
in the formula, softmax is an activation function; d is a radical of k A dimension representing a word vector; mask denotes a Mask; attention (QW) i Q ,KW i k ,VW i v ) Attention (Q, K, V) is an intermediate parameter; h is an integer greater than 0;
3) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using a forward propagation layer to obtain a forward propagation layer processing result x, that is:
x=norm(X+MultiHead(Q,K,V)) (5)
4) Processing the processing result x of the forward propagation layer by utilizing an output layer to obtain a text matching model output based on BERT, and taking the text matching model output as matching results of different texts to be matched;
the BERT-based text matching model output is as follows:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2 (6)
in the formula, W 1 、W 2 Is a weight; b 1 、b 2 Is an offset; FFN (x) is the output.
6. The Mask text matching method based on multitask learning as claimed in claim 5, characterized in that said embedding input X = X 1 +x 2 ;
Wherein a component x is input 1 And an input component x 2 Respectively as follows:
X 1 =E tok +E seg +E pos (7)
x 2 =embedding1(pos)+embedding2(ner)+embedding3(seg) (8)
in the formula, E tok 、E seg 、E pos Token Embedding codes, positionembedding codes and segmentembedding codes which respectively represent the characteristics of text words; the embedding layers of parts of speech, named entities and semantic roles are represented by embedding1, embedding2 and embedding 3; pos, ner, seg represent part of speech, named body, semantic role encoding of the input text.
7. The Mask text matching method based on multitask learning according to claim 5, wherein the Mask is a variable 0-1, mask masks =1 at the same positions of words of different texts to be matched, and Mask masks =0 at different positions of words of different texts to be matched.
8. The Mask text matching method based on multitask learning according to claim 1, characterized in that the output of said text matching model based on BERT includes sequence output and vector output; the vector output is a classification vector, and the sequence output is a part-of-speech tagging vector; the classification vectors include semantically identical and semantically different.
9. The Mask text matching method based on multitask learning according to claim 1, characterized in that said text matching model based on BERT is pre-trained;
the standard of the pre-training end is Loss function Loss convergence;
the Loss function Loss is as follows:
Loss=Loss nll +Loss pos-tag (9)
in the formula, loss nll Is a classification vector loss function; loss pos-tag A loss function labeled for part of speech;
wherein, the Loss function Loss of classification vector nll As follows:
in the formula, n is the number of training samples; j represents the jth sample; z represents the number of categories classified; c represents the c-th category; h is a total of j,c Represents the probability that the jth sample belongs to the c-th class; y is j,c Indicating whether the jth sample belongs to the c category; y is j,c =1 indicates that the jth sample belongs to the c-th class; y is j,c =0 indicates that the jth sample does not belong to the c-th class;
loss function Loss of part of speech tagging pos-tag As follows:
in the formula, P 1 、P 2 、P 3 、P n The 1 st, 2 nd and n th possible part-of-speech tags corresponding to one sampleScoring of the annotation sequence; p is real-path And marking the score of the sequence for the real part of speech corresponding to one sample.
10. A computer-readable storage medium, characterized in that the computer-readable medium stores a computer program;
the computer program, when executed by a processor, performs the steps of the method of any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211071421.4A CN115687939B (en) | 2022-09-02 | Mask text matching method and medium based on multitask learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211071421.4A CN115687939B (en) | 2022-09-02 | Mask text matching method and medium based on multitask learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115687939A true CN115687939A (en) | 2023-02-03 |
CN115687939B CN115687939B (en) | 2024-09-24 |
Family
ID=
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116522165A (en) * | 2023-06-27 | 2023-08-01 | 武汉爱科软件技术股份有限公司 | Public opinion text matching system and method based on twin structure |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113239700A (en) * | 2021-04-27 | 2021-08-10 | 哈尔滨理工大学 | Text semantic matching device, system, method and storage medium for improving BERT |
CN113642330A (en) * | 2021-07-19 | 2021-11-12 | 西安理工大学 | Rail transit standard entity identification method based on catalog topic classification |
WO2022121251A1 (en) * | 2020-12-11 | 2022-06-16 | 平安科技(深圳)有限公司 | Method and apparatus for training text processing model, computer device and storage medium |
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022121251A1 (en) * | 2020-12-11 | 2022-06-16 | 平安科技(深圳)有限公司 | Method and apparatus for training text processing model, computer device and storage medium |
CN113239700A (en) * | 2021-04-27 | 2021-08-10 | 哈尔滨理工大学 | Text semantic matching device, system, method and storage medium for improving BERT |
CN113642330A (en) * | 2021-07-19 | 2021-11-12 | 西安理工大学 | Rail transit standard entity identification method based on catalog topic classification |
Non-Patent Citations (4)
Title |
---|
刘祥龙 等: "《飞桨PaddlePaddle深度学习实战》", 31 August 2020, 北京:机械工业出版社 * |
吕洋 等: ""基于数据挖掘算法的汉英机器翻译二元语义模式规则"", 《微型电脑应用》, vol. 37, no. 11, 20 November 2021 (2021-11-20) * |
月来客栈: ""This post is you need(上卷)——层层剥开Transformer"", pages 1 - 57, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/420820453> * |
李广 等: ""融合多角度特征的文本匹配模型"", 《计算机系统应用》, 17 May 2022 (2022-05-17) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116522165A (en) * | 2023-06-27 | 2023-08-01 | 武汉爱科软件技术股份有限公司 | Public opinion text matching system and method based on twin structure |
CN116522165B (en) * | 2023-06-27 | 2024-04-02 | 武汉爱科软件技术股份有限公司 | Public opinion text matching system and method based on twin structure |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112163416B (en) | Event joint extraction method for merging syntactic and entity relation graph convolution network | |
Zhu et al. | Simple is not easy: A simple strong baseline for textvqa and textcaps | |
Trischler et al. | Natural language comprehension with the epireader | |
CN112100351A (en) | Method and equipment for constructing intelligent question-answering system through question generation data set | |
CN112733541A (en) | Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism | |
CN112231472B (en) | Judicial public opinion sensitive information identification method integrated with domain term dictionary | |
CN110287323B (en) | Target-oriented emotion classification method | |
CN112183094B (en) | Chinese grammar debugging method and system based on multiple text features | |
CN112733533A (en) | Multi-mode named entity recognition method based on BERT model and text-image relation propagation | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
Khare et al. | Multi-modal embeddings using multi-task learning for emotion recognition | |
CN115238697A (en) | Judicial named entity recognition method based on natural language processing | |
CN116029305A (en) | Chinese attribute-level emotion analysis method, system, equipment and medium based on multitask learning | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
Wu et al. | Image captioning with an intermediate attributes layer | |
Ahmad et al. | Multi-task learning for universal sentence embeddings: A thorough evaluation using transfer and auxiliary tasks | |
CN116127954A (en) | Dictionary-based new work specialized Chinese knowledge concept extraction method | |
CN115687939B (en) | Mask text matching method and medium based on multitask learning | |
CN115687939A (en) | Mask text matching method and medium based on multi-task learning | |
CN114547237A (en) | French recommendation method fusing French keywords | |
CN114357166A (en) | Text classification method based on deep learning | |
Baranwal et al. | Extracting primary objects and spatial relations from sentences | |
Sharif et al. | Subicap: towards subword-informed image captioning | |
Goyal et al. | Automatic Evaluation of Machine Generated Feedback For Text and Image Data | |
CN116245111B (en) | Multi-direction multi-angle sentence semantic similarity recognition method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |