CN115687939A - Mask text matching method and medium based on multi-task learning - Google Patents

Mask text matching method and medium based on multi-task learning Download PDF

Info

Publication number
CN115687939A
CN115687939A CN202211071421.4A CN202211071421A CN115687939A CN 115687939 A CN115687939 A CN 115687939A CN 202211071421 A CN202211071421 A CN 202211071421A CN 115687939 A CN115687939 A CN 115687939A
Authority
CN
China
Prior art keywords
text
mask
matched
text matching
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211071421.4A
Other languages
Chinese (zh)
Other versions
CN115687939B (en
Inventor
张美伟
崔秋实
余娟
吕洋
余维华
李文沅
祝陈哲
王香霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Chongqing Medical University
Original Assignee
Chongqing University
Chongqing Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University, Chongqing Medical University filed Critical Chongqing University
Priority to CN202211071421.4A priority Critical patent/CN115687939B/en
Priority claimed from CN202211071421.4A external-priority patent/CN115687939B/en
Publication of CN115687939A publication Critical patent/CN115687939A/en
Application granted granted Critical
Publication of CN115687939B publication Critical patent/CN115687939B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a Mask text matching method and medium based on multi-task learning. The method comprises the following steps: 1) Acquiring at least two texts to be matched; 2) Extracting the characteristics of the texts to be matched to obtain the text word characteristics of each text to be matched; 3) Establishing a text matching model based on BERT; 4) And inputting the text word characteristics of all the texts to be matched into the text matching model to obtain matching results of different texts to be matched. The medium includes a computer program. The invention provides an idea of constructing a Mask matrix simplified model by combining the characteristics of data, and the difference between texts to be matched can be amplified while the model is simplified, so that the generalization capability of the final model training is enhanced.

Description

Mask text matching method and medium based on multi-task learning
Technical Field
The invention relates to the field of natural language processing, in particular to a Mask text matching method and a Mask text matching medium based on multi-task learning.
Background
The text matching method aims to judge whether the semantics between two natural sentences are equivalent, and is an important research direction in the field of natural language processing. The text matching research has high commercial value and plays an important role in the fields of information retrieval, intelligent customer service and the like.
In recent years, although some standard problem matching scores of neural network models have achieved accuracy similar to or even exceeding that of human beings, when real application scene problems are handled, the models are poor in robustness, correct judgment cannot be made on very simple problems (which are easily judged by human beings), and extremely poor product experience and economic loss are caused.
Most of the current text matching tasks are tested on a test set which is distributed with a training set, the effect is good, but the model capability is actually exaggerated, and the real evaluation on the fine granularity advantage and disadvantage of the model is lacked. Therefore, the method focuses on the robustness of the text matching model in a real application scene, finds the defects of the current text matching algorithm model from multiple dimensions such as vocabulary, syntax and pragmatics, and promotes the development of the semantic matching technology in the industrial fields such as intelligent interaction.
The traditional text matching method comprises algorithms such as BoW, VSM, TF-IDF, BM25, jaccod, simHash and the like, for example, the BM25 algorithm calculates the matching score between a query field and a text with higher score through the coverage degree of the query field by a network field, the matching degree of the text with higher score and the query is better, the matching problem of a vocabulary level or the similarity problem of a caption vocabulary level is mainly solved, and in practice, the matching algorithm based on the vocabulary coincidence degree has great limitation because of the following reasons: the word meaning is limited, and the taxi is actually the same vehicle although the words are not similar; "apple" in different contexts means something different, either fruit or company; the structure is limited, although the words of the 'machine learning' and the 'learning machine' are completely overlapped, the expressions have different meanings; knowledge limitation, "the Qinshihuang Dota" means that the sentence has no problem in terms of morphology and syntax, but the sentence is not correct in combination with knowledge, which indicates that for a text matching task, the text matching task cannot only stay at a literal matching level, and needs matching at a semantic level.
With the successful application of deep learning in the fields of computer vision, speech recognition and recommendation systems, many researches have been made in recent years to apply a deep neural network model to a natural language processing task to reduce the cost of feature engineering. The Word Embedding trained based on the neural network performs text matching calculation, the training mode is simple, the semantic computability represented by the obtained Word vector is further enhanced, but the Word Embedding obtained by training only by using label-free data is not different from the theme model technology in the practical effect of text matching degree calculation, and the Word Embedding is essentially based on co-occurrence information training.
The current text matching algorithm is mainly based on a pretrained language model of BERT, and text vector semantic information is improved as much as possible. However, the text vector obtained by the pre-training model cannot well identify the difference of the text in some scenes, such as: the difference between the two sentences of 'Renminbi how to change the harbor coin' and 'Renminbi how to change the people coin' is small in content, but the meanings are quite different, so that if a text vector is obtained by relying on a pre-training model alone, the difference of texts is difficult to capture from dimensions such as vocabulary, syntax, pragmatics and the like.
It can be seen that the current text matching algorithm has the following defects:
1) The statistical-based language model cannot express rich semantic information, and the difference between texts is difficult to capture in some short text matching scenes with smaller difference.
2) The algorithm model based on word vectors, attention and the like needs more labeled data, the model structure is complex, and the structural characteristics of the text, such as syntactic structure, part of speech and the like, are not further mined and utilized.
3) The output result of the pre-training is relatively concerned based on the pre-training text matching model, a more complex network structure is designed for classification according to the output result of the pre-training model, the structural features of the text are not combined with the pre-training model, and therefore some useful prior information is actually lost.
Disclosure of Invention
The invention aims to provide a Mask text matching method based on multitask learning, which comprises the following steps of:
1) Acquiring at least two texts to be matched;
2) Extracting the characteristics of the texts to be matched to obtain the text word characteristics of each text to be matched;
3) Establishing a text matching model based on BERT;
4) And inputting the text word characteristics of all the texts to be matched into the text matching model to obtain matching results of different texts to be matched.
Further, the step of extracting the features of the target text to be matched comprises the following steps: word segmentation processing, part of speech tagging, named entity recognition, semantic role tagging and dependency syntactic analysis.
Further, the target text word feature includes one or more of a part-of-speech feature, a named entity feature, a semantic role feature, and a dependency syntactic relationship feature.
Further, the BERT-based text matching model includes an embedding input layer, a multi-head attention layer, a forward propagation layer, and an output layer.
Further, the step of obtaining matching results of different texts to be matched comprises:
a) Converting text word features by using an embedding input layer to obtain an embedding input X, and converting the embedding input X into a feature component Q = XW Q Characteristic component K = XW K Characteristic component V = XW V ;W Q 、W K 、W V Weights corresponding to different feature components;
b) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), namely:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (1)
in the formula, W O Is a weight;
wherein, the parameter head i As follows:
head i =Attention(QW i Q ,KW i k ,VW i v ),i=1,2,...,h (2)
Attention(QW i Q ,KW i k ,VW i v )=Mask*Attention(Q,K,V) (3)
Figure BDA0003830480640000031
in the formula, softmax is an activation function; d is a radical of k Representing the dimensions of the word vector, in order to prevent the input value for softmax from being too large, resulting in a derivative close to 0.Mask denotes a Mask; attention (QW) i Q ,KW i k ,VW i v ) Attention (Q, K, V) is an intermediate parameter; h is an integer greater than 0;
c) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using the forward propagation layer to obtain a forward propagation layer processing result x, that is:
x=norm(X+MultiHead(Q,K,V)) (5)
d) Processing the processing result x of the forward propagation layer by using an output layer to obtain a text matching model output based on BERT, wherein the text matching model output is used as a matching result of different texts to be matched;
the BERT based text matching model output is as follows:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2 (6)
in the formula, W 1 、W 2 Is a weight; b 1 、b 2 Is an offset; FFN (x) is the output.
Further, the embedding input X = X 1 +x 2
Wherein a component x is input 1 And an input component x 2 Respectively as follows:
X 1 =E tok +E seg +E pos (7)
x 2 =embedding1(pos)+embedding2(ner)+embedding3(seg) (8)
in the formula, E tok 、E seg 、E pos Token Embedding codes, positionembedding codes and segmentembedding codes which respectively represent the characteristics of text words; the embedding layers of parts of speech, named entities and semantic roles are represented by embedding1, embedding2 and embedding 3; pos, ner, seg represent part of speech, named body, semantic role coding of the input text.
Further, the Mask masks are variables of 0 to 1, mask masks at the same positions of words of different texts to be matched =1, and Mask masks at different positions of words of different texts to be matched =0.
Further, the output of the BERT based text matching model comprises a sequence output and a vector output; the vector output is a classification vector and the sequence output is a part-of-speech tagging vector. The classification vectors include semantically identical and semantically different.
Further, the BERT-based text matching model is pre-trained;
the standard of the pre-training completion is Loss function Loss convergence;
the Loss function Loss is as follows:
Loss=Loss nll +Loss pos-tag (9)
in formula (Loss) nll Is a classification vector loss function; loss pos-tag A loss function labeled for part of speech;
wherein, the Loss function Loss of classification vector nll As follows:
Figure BDA0003830480640000041
in the formula, n is the number of training samples; j represents the jth sample; z represents the number of classified categories; c represents the c-th category; h is j,c Represents the probability that the jth sample belongs to the c-th class; y is j,c Indicating whether the jth sample belongs to the c category; y is j,c =1 indicates that the jth sample belongs to the c-th class; y is j,c =0 indicates that the jth sample does not belong to the c-th class;
loss function Loss of part of speech tagging pos-tag As follows:
Figure BDA0003830480640000042
in the formula, P 1 、P 2 、P 3 、P n Marking the scores of the 1 st, 2 nd and nth possible parts of speech sequences corresponding to one sample; p real-path And marking the score of the sequence for the real part of speech corresponding to one sample.
A computer readable storage medium storing a computer program;
the computer program, when executed by a processor, performs the steps of the method of any one of claims 1 to 9.
The technical effect of the invention is undoubted, and the invention solves the problem that the difference between texts is difficult to capture in scenes such as intelligent interaction, natural language understanding, similar sentence extraction and the like.
Aiming at the problems of complex short text matching model and large parameter, the invention provides the idea of constructing a Mask matrix simplified model by combining the characteristics of data, and can amplify the difference between texts to be matched while simplifying the model, so that the generalization capability of the final model training is enhanced.
Considering that the difference before text matching is small, the difference can be reflected on the linguistic characteristics of the part of speech and the syntactic structure, and the semantic information input by the model is increased by embedding the characteristics of syntax, entities and the like at the input end.
General text matching mainly utilizes sentence vectors for matching, and the method introduces multi-task learning to learn the difference between the parts of speech of the text to be matched from word granularity, thereby enhancing the generalization capability of the model.
Drawings
FIG. 1 is a text matching flow diagram;
FIG. 2 is a flow diagram of text feature mining;
fig. 3 is a Mask schematic.
Detailed Description
The present invention is further illustrated by the following examples, but it should not be construed that the scope of the above-described subject matter is limited to the following examples. Various substitutions and alterations can be made without departing from the technical idea of the invention and the scope of the invention is covered by the present invention according to the common technical knowledge and the conventional means in the field.
Example 1:
referring to fig. 1 to 3, a Mask text matching method based on multitask learning includes the following steps:
1) Acquiring at least two texts to be matched;
2) Performing feature extraction on the texts to be matched to obtain text word features of each text to be matched;
3) Establishing a text matching model based on BERT;
4) Inputting the text word characteristics of all the texts to be matched into the text matching model, and obtaining matching results of different texts to be matched, wherein the matching results comprise semantic similarity and semantic dissimilarity.
The step of extracting the characteristics of the target text to be matched comprises the following steps: the method comprises a series of natural language processing operations such as word segmentation processing, part of speech tagging, named entity recognition, semantic role tagging, dependency syntactic analysis and the like, and the operations are performed by means of a natural language processing technology provided by a Language Technology Platform (LTP) developed by the research center of the Harbour society computing and information retrieval.
The method adopts the named entity recognition technology provided by the Harmony large language technology platform and an iterative heuristic method to carry out named entity recognition. The latter is to obtain the maximum noun phrase by combining the connected nouns, wherein the part of speech of nouns can only be { ni, nh, ns, nz, j }, and respectively represent the name of a mechanism, the name of a person, the name of a geography, other proper nouns and abbreviations.
The target text word features include one or more of a part-of-speech feature, a named entity feature, a semantic role feature, and a dependency syntax relationship feature.
The BERT-based text matching model includes an embedding input layer, a multi-head attention layer, a forward propagation layer, and an output layer.
The step of obtaining the matching results of different texts to be matched comprises the following steps:
1) Converting text word features by using an embedding input layer to obtain an embedding input X, and converting the embedding input X into a feature component Q = XW Q Characteristic component K = XW K Characteristic component V = XW V ;W Q 、W K 、W V Weights corresponding to different feature components;
2) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), that is:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (1)
in the formula, W O Is a weight;
wherein,parameter head i As follows:
head i =Attention(QW i Q ,KW i k ,VW i v ),i=1,2,...,h (2)
Attention(QW i Q ,KW i k ,VW i v )=Mask*Attention(Q,K,V) (3)
Figure BDA0003830480640000061
in the formula, softmax is an activation function; d k Representing the dimensions of the word vector, to prevent the input value of softmax from being too large, resulting in a derivative close to 0.Mask denotes a Mask; attention (QW) i Q ,KW i k ,VW i v ) Attention (Q, K, V) is an intermediate parameter; h is an integer greater than 0;
3) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using the forward propagation layer to obtain a forward propagation layer processing result x, that is:
x=norm(X+MultiHead(Q,K,V)) (5)
4) Processing the processing result x of the forward propagation layer by using an output layer to obtain a text matching model output based on BERT, wherein the text matching model output is used as a matching result of different texts to be matched;
the BERT-based text matching model output is as follows:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2 (6)
in the formula, W 1 、W 2 Is a weight; b is a mixture of 1 、b 2 Is an offset. FFN (x) is the output.
The embedding input X = X 1 +x 2
Wherein a component x is input 1 And an input component x 2 Respectively as follows:
X 1 =E tok +E seg +E pos (7)
x 2 =embedding1(pos)+embedding2(ner)+embedding3(seg) (8)
in the formula, E tok 、E seg 、E pos Token Embedding codes, positionembedding codes and segmentembedding codes which respectively represent the characteristics of text words; the embedding layers of parts of speech, named entities and semantic roles are represented by embedding1, embedding2 and embedding 3; pos, ner, seg represent part of speech, named body, semantic role encoding of the input text.
The Mask is a variable of 0-1, mask masks at the same positions of words of different texts to be matched =1, and masks at different positions of words of different texts to be matched =0.
The output of the BERT-based text matching model comprises a sequence output and a vector output; the vector output is a classification vector and the sequence output is a part-of-speech tagged vector. The classification vectors include semantically identical and semantically different.
The BERT-based text matching model is pre-trained;
the standard of the pre-training end is Loss function Loss convergence;
the Loss function Loss is as follows:
Loss=Loss nll +Loss pos-tag (9)
in formula (Loss) nll Is a classification vector loss function; loss pos-tag A loss function labeled for part of speech;
wherein, the Loss function Loss of classification vector nll As follows:
Figure BDA0003830480640000071
in the formula, n is the number of training samples; j represents the jth sample; z represents the number of classified categories; c represents the c-th category; h is j,c Representing the probability that the jth sample belongs to the c-th class; y is j,c Indicating whether the jth sample belongs to the c category; y is j,c =1 denotes that the jth sample belongs to the c-th class; y is j,c =0 indicates that the jth sample does not belong to the thc categories;
loss function Loss of part of speech tagging pos-tag The calculation principle is as follows:
for any sample, the sequence of part-of-speech tagged categories may be:
part of speech tagging sequence 1: START N-B N-I N-E O
Part of speech tagging sequence 2: START N-B N-E O O O
Part of speech tagging sequence 3: START O N-B N-E O
Part of speech tagging sequence 4: START V-B V-I V-E O
Part of speech tagging sequence n: START N-B N-E V-B V-E O
Each possible path has a score of P i And N paths are total, the total score is:
P total =P 1 +P 2 +P 3 +...+P n
Figure BDA0003830480640000072
in the training process, the parameter values of the model are continuously updated along with the iteration of the training process, so that the ratio of the real path is larger and larger.
A computer readable storage medium storing a computer program;
the computer program, when executed by a processor, performs the steps of the method of any one of claims 1 to 9.
Example 2:
referring to fig. 1 to 3, a Mask text matching method based on multitask learning includes the following steps:
1) The text is processed by a language analysis tool, and characteristics such as part of speech, named entities, semantic roles, dependency syntactic relations and the like can be obtained.
2) And comparing the difference between the text pairs according to the input text to be matched, marking the positions with the same word position and the positions with different words, setting the positions with the same words in a Mask matrix as 0, and otherwise, setting the positions as 1.
The Embedding input of BERT can be expressed as Token Embedding, segmentation Embedding and Position Embedding synthesis x 1
X 1 =E word =E tok +E seg +E pos
Other linguistic features may be represented as x by the embedding layer 2
x 2 =embedding1(pos)+embedding2(ner)+embedding3(seg)
Final input x = x 1 +x 2
Converting input X to Q, K, V:
Q=XW Q ,K=XW K ,V=XW V
attention calculation formula:
Figure BDA0003830480640000081
in order to focus on the inconsistent text to be matched by the model, a Mask matrix can be obtained by simple processing at the data processing stage, so that the model does not need to focus on attentions of the whole quantity, and only needs to focus on attentions of characters with inconsistent text:
Attention=Mask*Attention(Q,K,V)
multi-head attention layer:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O
wherein:
head i =Attention(QW i Q ,KW i k ,VW i v ),i=1,2,...,h
a forward propagation layer:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2
the output is then:
x=norm(X+MultiHead(Q,K,V))
output of final encoder:
Y=FFN(x)
the output comprises sequence output and vector output, in the embodiment, the vector output is used as a classification vector, the sequence output is used as a part-of-speech tagging vector of a second task, and aiming at the classification vector, a first Loss function Loss can be obtained KL Comprises the following steps:
Figure BDA0003830480640000091
the loss function for part-of-speech tagging is: loss pos-tag The final loss function will be:
Loss=Loss nll +Loss pos-tag
example 3:
a Mask text matching method based on multitask learning comprises the following steps:
1) Acquiring at least two texts to be matched;
2) Performing feature extraction on the texts to be matched to obtain text word features of each text to be matched;
3) Establishing a text matching model based on BERT;
4) And inputting the text word characteristics of all the texts to be matched into the text matching model to obtain matching results of different texts to be matched.
Example 4:
a Mask text matching method based on multitask learning is disclosed in embodiment 3, wherein the step of extracting the features of the target text to be matched comprises the following steps: word segmentation processing, part of speech tagging, named entity identification, semantic role tagging and dependency syntactic analysis.
Example 5:
the main content of the Mask text matching method based on multitask learning is shown in an embodiment 3, wherein the target text word characteristics comprise one or more of part-of-speech characteristics, named entity characteristics, semantic role characteristics and dependency syntactic relation characteristics.
Example 6:
a Mask text matching method based on multitask learning mainly comprises an embodiment 3, wherein the text matching model based on BERT comprises an embedding input layer, a multi-head attention layer, a forward propagation layer and an output layer.
Example 7:
a Mask text matching method based on multi-task learning mainly comprises the following steps of embodiment 3, wherein the step of obtaining matching results of different texts to be matched comprises the following steps:
1) Converting text word features by using an embedding input layer to obtain an embedding input X, and converting the embedding input X into a feature component Q = XW Q Characteristic component K = XW K Characteristic component V = XW V ;W Q 、W K 、W V Weights corresponding to different feature components;
2) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), that is:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (1)
in the formula, W O Is a weight;
wherein, the parameter head i As follows:
head i =Attention(QW i Q ,KW i k ,VW i v ),i=1,2,...,h (2)
Attention(QW i Q ,KW i k ,VW i v )=Mask*Attention(Q,K,V) (3)
Figure BDA0003830480640000101
wherein softmax is an activation function; d k A dimension representing a word vector; mask denotes a Mask; attention (QW) i Q ,KW i k ,VW i v )、Attention(Q,K,V) Is an intermediate parameter; h is an integer greater than 0;
3) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using the forward propagation layer to obtain a forward propagation layer processing result x, that is:
x=norm(X+MultiHead(Q,K,V)) (5)
4) Processing the processing result x of the forward propagation layer by using an output layer to obtain a text matching model output based on BERT, wherein the text matching model output is used as a matching result of different texts to be matched;
the BERT-based text matching model output is as follows:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2 (6)
in the formula, W 1 、W 2 Is a weight; b 1 、b 2 Is an offset; FFN (x) is the output.
Example 8:
a Mask text matching method based on multi-task learning mainly comprises the following content of embodiment 3, wherein the embedding input X = X 1 +x 2
Wherein a component x is input 1 And an input component x 2 Respectively as follows:
X 1 =E tok +E seg +E pos (7)
x 2 =embedding1(pos)+embedding2(ner)+embedding3(seg) (8)
in the formula, E tok 、E seg 、E pos Token Embedding codes, positionembedding codes and segmentembedding codes which respectively represent the characteristics of text words; the embedding layers of parts of speech, named entities and semantic roles are represented by embedding1, embedding2 and embedding 3; pos, ner, seg represent part of speech, named body, semantic role coding of the input text.
Example 9:
a Mask text matching method based on multitask learning mainly comprises the following steps of embodiment 3, wherein Mask masks are 0-1 variables, mask masks at the same positions of words of different texts to be matched =1, and Mask masks at different positions of words of different texts to be matched =0.
Example 10:
a Mask text matching method based on multitask learning mainly comprises the steps of embodiment 3, wherein the output of a text matching model based on BERT comprises sequence output and vector output; the vector output is a classification vector, and the sequence output is a part-of-speech tagging vector; the classification vectors include semantically identical and semantically different.
Example 11:
a Mask text matching method based on multitask learning mainly comprises the following steps of (1) embodiment 3, wherein a text matching model based on BERT is pre-trained;
the standard of the pre-training end is Loss function Loss convergence;
the Loss function Loss is as follows:
Loss=Loss nll +Loss pos-tag (9)
in the formula, loss nll Is a classification vector loss function; loss pos-tag A loss function labeled for part of speech;
wherein, the Loss function Loss of classification vector nll As follows:
Figure BDA0003830480640000111
in the formula, n is the number of training samples; j represents the jth sample; z represents the number of classified categories; c represents the c-th category; h is j,c Represents the probability that the jth sample belongs to the c-th class; y is j,c Indicating whether the jth sample belongs to the c category; y is j,c =1 denotes that the jth sample belongs to the c-th class; y is j,c =0 indicates that the jth sample does not belong to the c-th class;
loss function Loss of part of speech tagging pos-tag As follows:
Figure BDA0003830480640000112
in the formula, P 1 、P 2 、P 3 、P n Marking the scores of the 1 st, 2 nd and nth possible parts of speech tagging sequences corresponding to one sample; p real-path And marking the score of the sequence for the real part of speech corresponding to one sample.
Example 12:
a computer readable storage medium storing a computer program; the computer program, when executed by a processor, performs the steps of the method of embodiments 1-11.

Claims (10)

1. A Mask text matching method based on multitask learning is characterized by comprising the following steps:
1) And acquiring at least two texts to be matched.
2) And extracting the characteristics of the texts to be matched to obtain the text word characteristics of each text to be matched.
3) Establishing a text matching model based on BERT;
4) And inputting the text word characteristics of all the texts to be matched into the text matching model to obtain matching results of different texts to be matched.
2. The Mask text matching method based on multitask learning according to claim 1, wherein the step of extracting the features of the target text to be matched comprises the following steps: word segmentation processing, part of speech tagging, named entity recognition, semantic role tagging and dependency syntactic analysis.
3. The Mask text matching method based on multitask learning according to claim 1, characterized in that said target text word characteristics include one or more of part-of-speech characteristics, named entity characteristics, semantic role characteristics and dependency syntactic relation characteristics.
4. The Mask text matching method based on multitask learning as claimed in claim 1, wherein said BERT based text matching model includes embedding input layer, multi-head attention layer, forward propagation layer and output layer.
5. The Mask text matching method based on multitask learning according to claim 1, wherein the step of obtaining matching results of different texts to be matched comprises:
1) Converting text word features by using an embedding input layer to obtain an embedding input X, and converting the embedding input X into a feature component Q = XW Q Characteristic component K = XW K Characteristic component V = XW V ;W Q 、W K 、W V Weights corresponding to different feature components;
2) Processing the feature component of the embedding input X by using a multi-head attention layer to obtain a multi-head attention layer processing result MultiHead (Q, K, V), namely:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (1)
in the formula, W O Is a weight;
wherein, the parameter head i As follows:
head i =Attention(QW i Q ,KW i k ,VW i v ),i=1,2,...,h (2)
Attention(QW i Q ,KW i k ,VW i v )=Mask*Attention(Q,K,V) (3)
Figure FDA0003830480630000011
in the formula, softmax is an activation function; d is a radical of k A dimension representing a word vector; mask denotes a Mask; attention (QW) i Q ,KW i k ,VW i v ) Attention (Q, K, V) is an intermediate parameter; h is an integer greater than 0;
3) Processing the multi-head attention layer processing result MultiHead (Q, K, V) by using a forward propagation layer to obtain a forward propagation layer processing result x, that is:
x=norm(X+MultiHead(Q,K,V)) (5)
4) Processing the processing result x of the forward propagation layer by utilizing an output layer to obtain a text matching model output based on BERT, and taking the text matching model output as matching results of different texts to be matched;
the BERT-based text matching model output is as follows:
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2 (6)
in the formula, W 1 、W 2 Is a weight; b 1 、b 2 Is an offset; FFN (x) is the output.
6. The Mask text matching method based on multitask learning as claimed in claim 5, characterized in that said embedding input X = X 1 +x 2
Wherein a component x is input 1 And an input component x 2 Respectively as follows:
X 1 =E tok +E seg +E pos (7)
x 2 =embedding1(pos)+embedding2(ner)+embedding3(seg) (8)
in the formula, E tok 、E seg 、E pos Token Embedding codes, positionembedding codes and segmentembedding codes which respectively represent the characteristics of text words; the embedding layers of parts of speech, named entities and semantic roles are represented by embedding1, embedding2 and embedding 3; pos, ner, seg represent part of speech, named body, semantic role encoding of the input text.
7. The Mask text matching method based on multitask learning according to claim 5, wherein the Mask is a variable 0-1, mask masks =1 at the same positions of words of different texts to be matched, and Mask masks =0 at different positions of words of different texts to be matched.
8. The Mask text matching method based on multitask learning according to claim 1, characterized in that the output of said text matching model based on BERT includes sequence output and vector output; the vector output is a classification vector, and the sequence output is a part-of-speech tagging vector; the classification vectors include semantically identical and semantically different.
9. The Mask text matching method based on multitask learning according to claim 1, characterized in that said text matching model based on BERT is pre-trained;
the standard of the pre-training end is Loss function Loss convergence;
the Loss function Loss is as follows:
Loss=Loss nll +Loss pos-tag (9)
in the formula, loss nll Is a classification vector loss function; loss pos-tag A loss function labeled for part of speech;
wherein, the Loss function Loss of classification vector nll As follows:
Figure FDA0003830480630000031
in the formula, n is the number of training samples; j represents the jth sample; z represents the number of categories classified; c represents the c-th category; h is a total of j,c Represents the probability that the jth sample belongs to the c-th class; y is j,c Indicating whether the jth sample belongs to the c category; y is j,c =1 indicates that the jth sample belongs to the c-th class; y is j,c =0 indicates that the jth sample does not belong to the c-th class;
loss function Loss of part of speech tagging pos-tag As follows:
Figure FDA0003830480630000032
in the formula, P 1 、P 2 、P 3 、P n The 1 st, 2 nd and n th possible part-of-speech tags corresponding to one sampleScoring of the annotation sequence; p is real-path And marking the score of the sequence for the real part of speech corresponding to one sample.
10. A computer-readable storage medium, characterized in that the computer-readable medium stores a computer program;
the computer program, when executed by a processor, performs the steps of the method of any one of claims 1 to 9.
CN202211071421.4A 2022-09-02 Mask text matching method and medium based on multitask learning Active CN115687939B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211071421.4A CN115687939B (en) 2022-09-02 Mask text matching method and medium based on multitask learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211071421.4A CN115687939B (en) 2022-09-02 Mask text matching method and medium based on multitask learning

Publications (2)

Publication Number Publication Date
CN115687939A true CN115687939A (en) 2023-02-03
CN115687939B CN115687939B (en) 2024-09-24

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116522165A (en) * 2023-06-27 2023-08-01 武汉爱科软件技术股份有限公司 Public opinion text matching system and method based on twin structure

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239700A (en) * 2021-04-27 2021-08-10 哈尔滨理工大学 Text semantic matching device, system, method and storage medium for improving BERT
CN113642330A (en) * 2021-07-19 2021-11-12 西安理工大学 Rail transit standard entity identification method based on catalog topic classification
WO2022121251A1 (en) * 2020-12-11 2022-06-16 平安科技(深圳)有限公司 Method and apparatus for training text processing model, computer device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022121251A1 (en) * 2020-12-11 2022-06-16 平安科技(深圳)有限公司 Method and apparatus for training text processing model, computer device and storage medium
CN113239700A (en) * 2021-04-27 2021-08-10 哈尔滨理工大学 Text semantic matching device, system, method and storage medium for improving BERT
CN113642330A (en) * 2021-07-19 2021-11-12 西安理工大学 Rail transit standard entity identification method based on catalog topic classification

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘祥龙 等: "《飞桨PaddlePaddle深度学习实战》", 31 August 2020, 北京:机械工业出版社 *
吕洋 等: ""基于数据挖掘算法的汉英机器翻译二元语义模式规则"", 《微型电脑应用》, vol. 37, no. 11, 20 November 2021 (2021-11-20) *
月来客栈: ""This post is you need(上卷)——层层剥开Transformer"", pages 1 - 57, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/420820453> *
李广 等: ""融合多角度特征的文本匹配模型"", 《计算机系统应用》, 17 May 2022 (2022-05-17) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116522165A (en) * 2023-06-27 2023-08-01 武汉爱科软件技术股份有限公司 Public opinion text matching system and method based on twin structure
CN116522165B (en) * 2023-06-27 2024-04-02 武汉爱科软件技术股份有限公司 Public opinion text matching system and method based on twin structure

Similar Documents

Publication Publication Date Title
CN112163416B (en) Event joint extraction method for merging syntactic and entity relation graph convolution network
Zhu et al. Simple is not easy: A simple strong baseline for textvqa and textcaps
Trischler et al. Natural language comprehension with the epireader
CN112100351A (en) Method and equipment for constructing intelligent question-answering system through question generation data set
CN112733541A (en) Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism
CN112231472B (en) Judicial public opinion sensitive information identification method integrated with domain term dictionary
CN110287323B (en) Target-oriented emotion classification method
CN112183094B (en) Chinese grammar debugging method and system based on multiple text features
CN112733533A (en) Multi-mode named entity recognition method based on BERT model and text-image relation propagation
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
Khare et al. Multi-modal embeddings using multi-task learning for emotion recognition
CN115238697A (en) Judicial named entity recognition method based on natural language processing
CN116029305A (en) Chinese attribute-level emotion analysis method, system, equipment and medium based on multitask learning
CN114254645A (en) Artificial intelligence auxiliary writing system
Wu et al. Image captioning with an intermediate attributes layer
Ahmad et al. Multi-task learning for universal sentence embeddings: A thorough evaluation using transfer and auxiliary tasks
CN116127954A (en) Dictionary-based new work specialized Chinese knowledge concept extraction method
CN115687939B (en) Mask text matching method and medium based on multitask learning
CN115687939A (en) Mask text matching method and medium based on multi-task learning
CN114547237A (en) French recommendation method fusing French keywords
CN114357166A (en) Text classification method based on deep learning
Baranwal et al. Extracting primary objects and spatial relations from sentences
Sharif et al. Subicap: towards subword-informed image captioning
Goyal et al. Automatic Evaluation of Machine Generated Feedback For Text and Image Data
CN116245111B (en) Multi-direction multi-angle sentence semantic similarity recognition method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant