CN116644751A

CN116644751A - Cross-domain named entity identification method, equipment, storage medium and product based on span comparison learning

Info

Publication number: CN116644751A
Application number: CN202310621806.1A
Authority: CN
Inventors: 王也; 史宸枭; 韩启龙; 宋洪涛; 刘鹏
Original assignee: Harbin Longming Technology Co ltd; Harbin Engineering University
Current assignee: Harbin Longming Technology Co ltd; Harbin Engineering University
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-08-25

Abstract

A cross-domain named entity recognition method, device, storage medium and product based on span comparison learning belong to the technical field of named entity recognition and solve the problem of low learning performance of domain offset and span boundary information. The method of the application comprises the following steps: the cross-domain named entity recognition model based on span contrast learning is constructed by using the technologies of a pre-training language model, countermeasure training, contrast learning, KL divergence and the like, the sequence labels are converted into a global boundary matrix by utilizing entity boundary information, the contrast learning of the span level and the calculation of the KL divergence are completed, relevant parameters in the model are continuously changed in the training process, the entity boundary information is fully considered, more fields of invariance information are learned, and the recognition performance of the cross-domain named entity is greatly improved. The method is suitable for identifying cross-domain named entities.

Description

Cross-domain named entity identification method, equipment, storage medium and product based on span comparison learning

Technical Field

The application relates to the technical field of named entity recognition, in particular to cross-domain named entity recognition.

Background

Named entity recognition (Named Entity Recognition, NER) is a task of automatically recognizing named entities in text, such as person names, place names, institution names, etc., and classifying them into different types. Deep learning models have outperformed traditional machine learning methods in terms of feature extraction depth and model performance, but require large amounts of labeling data. NER tasks are difficult to solve in the case of data resource starvation in certain fields, languages, etc. The domain adaptation is an important part of the transfer learning, and the domain offset problem is a common problem in the domain adaptation. Domain migration refers to the phenomenon of a model's performance degrading when migrating from one domain to another when the training set and the test set do not conform to the same underlying distribution.

A wide range of algorithms have been proposed to alleviate the domain shift problem, such as domain resistant neural networks (DANN) and distribution matching. However, these algorithms all have some problems. For DANN, the instability of the joint optimization training process requires a large amount of hyper-parameter adjustment. While the distributed matching algorithm is difficult to simultaneously maintain the discrimination capability of the model to the target task when attempting to realize instance level alignment. Therefore, there is a need to develop a stable and efficient solution to learn the domain invariance and instance matching capabilities of unsupervised domain adaptation.

In recent self-supervised learning (SSL) studies, contrast Learning (CL) has proven to be an effective approach to facilitating learning characterization at the instance level using data proxy task definitions in raw learning. From a domain adaptation perspective, constructing cross-domain positive samples and aligning domain-cognition pairs receives less attention in the relevant literature. Previous work focused on designing text transformations that preserve labels, such as transliteration, synonyms, omissions, and combinations thereof.

Disclosure of Invention

The invention aims to solve the problem of low learning performance of domain offset and span boundary information aiming at the problem of identifying cross-domain named entities, and provides a cross-domain named entity identification method, equipment, a storage medium and a product based on span comparison learning.

The invention is realized by the following technical scheme, and in one aspect, the invention provides a span comparison learning-based cross-domain named entity identification method, which comprises the following steps:

step 1, acquiring a source domain data set and a target domain data set, preprocessing the data set, and dividing the data set into a training set and a testing set;

step 2, constructing a cross-domain named entity recognition model based on span comparison learning, which specifically comprises the following steps:

Step 2.1, obtaining embedded representations of source domain data and target domain data, and assigning corresponding domain labels to the source domain and the target domain;

step 2.2, constructing a domain confusion enhancement sample, embedding and inputting the source domain and target domain data obtained in the step 2.1 into a pre-training language model BERT, generating an antagonism sample by using a projection gradient descent PGD method, and classifying the domain by using the antagonism attack;

step 2.3, generating a global boundary prediction matrix, which specifically comprises the following steps:

embedding a source domain into an input BERT, and constructing a Global boundary prediction matrix by using Global Pointer by using the obtained output; embedding the source domain and the domain confusion enhancement sample generated in the step 2.2, inputting the embedded domain confusion enhancement sample into BERT, and constructing a global boundary prediction matrix added with disturbance resistance by using a GlobalPointer by using the obtained output;

step 3, training the cross-domain named entity recognition model based on span comparison learning in the step 2, and specifically comprising the following steps:

step 3.1, calculating the loss of named entity identification of the source domain by using a cross entropy loss function by using the global boundary prediction matrix obtained by embedding the source domain in the step 2.3;

step 3.2, calculating the loss of contrast learning through the similarity and dissimilarity of vectors of all entity spans contained in the two global boundary prediction matrixes obtained in the step 2.3;

Step 3.3, calculating the loss of KL divergence through all entity spans contained in the two global boundary prediction matrixes obtained in the step 2.3, so that the generated countermeasure sample is more consistent with the distribution predicted by the model;

step 3.4, updating parameters of the model by combining the loss functions in the step 3.1, the step 3.2 and the step 3.3 to optimize the combined loss function, and training the obtained optimal cross-domain named entity recognition model based on span comparison learning;

and step 4, inputting the target domain test set into the span comparison learning-based cross-domain named entity recognition model after training, updating and optimizing in the step 3, and calculating the score of the target domain entity.

Further, step 2.2 specifically includes:

assume a source dataset D with n tag data ^S ＝{x _i ,y _i } _1,...,n Wherein x is _i Is a token sequence, y _i Is x _i The data of the source data set is obtained by independent and uniformly distributed sampling from a source domain;

target data set D with m unlabeled data ^T ＝{x _j } _1,...,m Wherein x is _j The data of the target data set is obtained by independent and uniformly distributed sampling in a target domain;

the model aims to learn the function f (x; θ) _f ,θ _y ) x-C, wherein the input of the function is a token sequence, and the output is a corresponding label; wherein θ _f Is a parameter of a pre-training language model, θ _y Is a parameter of category label prediction, C is a label set;

is the loss of the model in the classification task where the purpose of model learning is to minimize this loss, the specific formula is as follows:

wherein:both the presentation sequence and the tag are from the source domain; in a single field, challenge training is a challenge problem that aims at maximizing internal losses and minimizing external losses;

wherein: delta is the challenge sample generated;

wherein alpha is _adv The trade-off between the two losses to be controlled is usually set to 1;

the following iterative steps may generate an antagonistic disturbance;

where ε is the upper bound of the challenge disturbance, η is the challenge step, δ _t The challenge samples generated for the current iteration step,gradient of the loss of classification task at time t with respect to input at time t, < >>For gradient formula +.>Representing that if the disturbance exceeds the range e, it is mapped back into the specified range _F Represents an L-definition norm;

generating a challenge sample with domain confusion:

wherein domain-specific loss, delta, using a resistance attack learning domain classifier ₀ Is an initialized challenge sample, θ _d Is a parameter corresponding to the calculation of the domain classification, d is a domain label; synthesizing the disturbance delta, f (x+delta; theta) by searching the extreme directions of the most plagued domain classifiers in the embedding space _f ) Is a domain puzzle made from a pre-trained language model;the gradient input at time t is used for representing the loss of the domain classification task at time t.

Further, step 2.3 specifically includes:

let s= [ S ] ₁ ,s ₂ ,…,s _m ]Is a possible span in sentences; span s is denoted as s [ i:j ]]Wherein i and j are a head index and a tail index, respectively; the object of named entity identification is to identify all s E, where E is the set of entity types; given a sentence x= [ X ] with n tags ₁ ,x ₂ ,…x _n ]First, each mark in X is corresponding to the mark in the pre-training language modelRepresenting the association, thereby obtaining a sentence representation matrixWhere v is the dimension:

h ₁ ,h ₂ ,…h _n ＝BERT(x ₁ ,x ₂ ,…x _n )

after obtaining the sentence representation H, the span representation may be calculated using two feed-forward layers that rely on the start and end indices of the span:

q _i,α ＝W _q,α h _i +b _q,α

k _j,α ＝W _k, αh _j +b _k,α

wherein:is a vector representation of an entity token for identifying type alpha, q _i,α ，k _j,α Span s [ i:j ] of type alpha]Start and end positions, W _q,α ，W _k,α Is h _i And h _j Weights of b _q,α ，b _k,α For offset value, span sj]The score belonging to type α is calculated as follows:

calculating a scoring function of each span, and generating a global boundary prediction matrix through the scoring function;

wherein:and->Are all orthogonal matrices.

Further, step 3.1 specifically includes:

Calculating the score of each entity through the scoring function obtained in the step 2.4;

setting a cross entropy loss function as follows:

wherein: q, k denote the start index and end index, respectively, of the span, P _α Representing a set of spans of entity type α, Q _α Representing a set of spans that are not entities or entity types are not alpha, s _α (q, k) is the fraction of alpha type entities, satisfying s _α (q,k)>A segment of 0 is the output of an entity of type alpha.

Further, in step 3.2, for an input sentence, each entity span is represented as a vector, and similarity and dissimilarity of vectors of all entity spans contained in the input sentence are calculated to calculate a contrast loss;

the loss function calculation process for contrast learning is as follows:

wherein: n is the maximum length of the sentence, M is the number of negative examples, span (i, j) is the span representation, span (i, j) ⁺ Is the positive example of the current sentence, namely, the data enhancement of the source domain data countermeasure training, span (i, j) ^- The method is a negative example of the current sentence, namely, the span different from the current token label, and the cos cosine similarity is used for calculating the distance between the original sample and the positive and negative samples.

Further, in step 3.3, the loss function calculation process of the KL divergence is as follows:

further, step 3.4 specifically includes:

Performing overall training by adopting an end-to-end neural network model, wherein the model comprises four loss functions, namely identifying task loss, loss of a domain classifier, loss of contrast learning and loss of KL divergence for named entities of a source domain;

adding the loss functions to obtain the loss of a cross-domain named entity recognition model based on span comparison learning, and carrying out joint training on the loss functions;

wherein: alpha, lambda, beta are hyper-parameters used to control the weight of the various losses.

In a second aspect, the present invention provides a computer device comprising a memory and a processor, the memory having stored therein a computer program which when executed by the processor performs the steps of a span contrast learning based cross-domain named entity recognition method as described above.

In a third aspect, the present invention provides a computer-readable storage medium having stored therein a plurality of computer instructions for causing a computer to perform a span-based contrast learning cross-domain named entity recognition method as described above.

In a fourth aspect, the invention provides a computer program product which when executed by a processor implements a span-based contrast learning cross-domain named entity recognition method as described above.

The application has the beneficial effects that:

aiming at the problem of identifying the cross-domain named entity, the application overcomes the defects of the prior art, adopts the technologies of pre-training language model, countermeasure training, contrast learning and the like, fully considers and digs the invariance characteristics of the field, and provides a cross-domain named entity identification method based on span contrast learning.

1. And introducing entity boundary information, converting the sequence labels into global boundary matrixes, wherein the global boundary matrixes represent target labels of sentence levels, so that the model can learn clear span boundary information. In cross-domain learning, both distribution matching and instance-based matching have certain limitations, while contrast learning can learn domain invariance without labels in the target domain.

2. Contrast learning is used to reduce domain bias problems by counterlearning knowledge that confuses the model learning domain.

3. And simultaneously, the KL divergence is used for learning the distribution predicted by the approach model, so that the model performance is further improved.

The method is suitable for cross-domain named entity identification.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a flow chart of a cross-domain named entity recognition method based on span contrast learning;

FIG. 2 is a model diagram of a cross-domain named entity recognition method based on span contrast learning;

FIG. 3 is a field puzzle sample diagram;

FIG. 4 is a diagram of a multi-headed recognition nesting entity;

FIG. 5 is a schematic diagram of positive sampling;

fig. 6 is a negative sampling schematic.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended to illustrate the present invention and should not be construed as limiting the invention.

The specific embodiment is a span comparison learning-based cross-domain named entity identification method, which comprises the following steps:

embedding a source domain into an input BERT, and constructing a Global boundary prediction matrix by using Global Pointer by using the obtained output; embedding the source domain and the domain confusion enhancement sample generated in the step 2.2, inputting the embedded domain confusion enhancement sample into BERT, and constructing a Global boundary prediction matrix added with disturbance resistance by using Global Pointer by using the obtained output;

In the embodiment, aiming at the problem of identifying the cross-domain named entity, the defects of the prior art are overcome, the technology such as pre-training language model, countermeasure training, contrast learning and the like is used, the invariance characteristics of the field are fully considered and mined, and the cross-domain named entity identification method based on the span contrast learning is provided.

Firstly, introducing entity boundary information, converting sequence labels into a global boundary matrix, wherein the global boundary matrix represents a target label of sentence level, so that a model can learn clear span boundary information. In cross-domain learning, both distribution matching and instance-based matching have certain limitations, while contrast learning can learn domain invariance without labels in the target domain.

Second, contrast learning is used to reduce domain bias problems by counterlearning knowledge that confuses the model learning domain.

And at the same time, the KL divergence is used for learning the distribution which is close to the prediction of the model, so that the performance of the model is further improved.

In a second embodiment, the method for identifying a cross-domain named entity based on span comparison learning in the first embodiment is further defined, and in the second embodiment, step 2.2 is further defined, and specifically includes:

step 2.2, specifically comprising:

assume a source dataset D with n tag data ^S ＝{x _i ,y _i } _1,…,n Wherein x is _i Is a token sequence, y _i Is x _i The data of the source data set is obtained by independent and uniformly distributed sampling from a source domain;

target data set D with m unlabeled data ^T ＝{x _j } _1,…,m Wherein x is _j The data of the target data set is obtained by independent and uniformly distributed sampling in a target domain;

wherein: delta is the challenge sample generated;

wherein alpha is _adv For controlling the trade-off between the two losses, usually provided as1；

The following iterative steps may generate an antagonistic disturbance;

generating a challenge sample with domain confusion:

wherein domain-specific loss, delta, using a resistance attack learning domain classifier ₀ Is an initialized challenge sample, θ _d Is a parameter corresponding to the calculation of the domain classification, d is a domain label; synthesizing the disturbance delta, f (x+delta; theta) by searching the extreme directions of the most plagued domain classifiers in the embedding space _f ) Is a domain puzzle made from a pre-trained language model;to indicate t Loss of domain classification task at the moment of time is related to the gradient input at the moment of t.

In this embodiment, the internal maximization may be solved by a Projection Gradient Descent (PGD) method, and the loss function is assumed to be locally linear. PGD has the advantage that it relies only on the model itself, which can generate samples with different resistance, thereby improving the generalization ability of the model on invisible data. PGD use a step-by-step strategy to combat. Specifically, the forward and backward propagation is performed one by one, and the forward and backward propagation is performed one by one according to g ^adv Calculating the disturbance, accumulating the new disturbance counter delta to g of the embedded layer one by one ^adv If one of the ranges is exceeded, the mapping is performed back to the given range. Finally, the g obtained by the last step is calculated ^adv Accumulated onto the original gradient. I.e. g corresponding to the gradient of the accumulated t-step disturbance ^adv And updating the original gradient.

As shown in FIG. 3, the domain puzzle can enhance the domain invariance of the model, thereby enabling the model to better adapt to unknown data and domains, confusing the model by discarding domain-related information during the training process, making it difficult to distinguish between data of different domains, and pulling the source (target) data and their corresponding domain puzzles closer together to reduce domain differences.

In a third embodiment, the present embodiment is further defined by the span comparison learning-based cross-domain named entity recognition method in the first embodiment, where step 2.3 is further defined, and specifically includes:

step 2.3, specifically comprising:

let s= [ S ] ₁ ,s ₂ ,…,s _m ]Is a possible span in sentences; span s is denoted as s [ i:j ]]Wherein i and j are a head index and a tail index, respectively; the object of named entity identification is to identify all s E, where E is the set of entity types; given a sentence x= [ X ] with n tags ₁ ,x ₂ ,…x _n ]First, each tag in X is associated with its corresponding representation in the pre-trained language model, thereby obtaining a sentence representation matrixWhere v is the dimension:

h ₁ ,h ₂ ,…h _n ＝BERT(x ₁ ,x ₂ ,…x _n )

q _i,α ＝W _q,α h _i +b _q,α

k _j,α ＝W _k,α h _j +b _k,α

Wherein:and->Are all orthogonal matrices.

In this embodiment, as shown in fig. 4, the multi-headed recognition nesting entity is to generate all possible entity spans.

In the attention mechanism, position coding is divided into two forms, absolute position coding and relative position coding. Although absolute position coding may add position information to a word vector, the position information is associated with a fixed position and cannot represent context information of the fixed position. In order to utilize the boundary information, it will satisfyIs applied to the representation of the entities, making the model more sensitive to the relative position between the entities and thereby improving the performance of the entity recognition. In this way, a scoring function for each span may be calculated, and a global boundary prediction matrix may be generated from the scoring function.

In a fourth embodiment, the present embodiment is further defined by a span comparison learning-based cross-domain named entity recognition method according to the first embodiment, where step 3.1 is further defined, and specifically includes:

step 3.1, specifically comprising:

Setting a cross entropy loss function as follows:

In this embodiment, the score of each entity is calculated by the scoring function obtained in step 2.4. And to solve the class imbalance problem in the classification problem, a cross entropy loss function is designed to facilitate model learning of boundary information for each training support instance.

In a fifth embodiment, the present embodiment is further defined by a span comparison learning-based cross-domain named entity recognition method according to the fourth embodiment, where step 3.2 is further defined, and specifically includes:

in step 3.2, for an input sentence, each entity span is represented as a vector, and the similarity and dissimilarity of the vectors of all entity spans contained in the input sentence are calculated to calculate a contrast loss;

the loss function calculation process for contrast learning is as follows:

In the embodiment, contrast learning is used, spans similar to the model are zoomed in at the span level, spans dissimilar to the model are zoomed out, and the model can learn more entity span invariance information; as shown in fig. 5, for positive sampling, the model can encode the span of the source domain and the span of the domain puzzle to be closer in representation space, gradually pulling the example to the domain decision boundary as training progresses; for negative samples across domains, contrast loss may push the negative samples of the source and target domains away from each other, as in the left half of fig. 6, and negative samples of the same class across domains away from each other, which contradicts the goal of pulling up different domains. So to avoid the existence of such cross-domain rejection, samples of different domains are excluded from the negative sampling set.

And pulling spans similar to the span hierarchy and spans dissimilar to the span hierarchy, so that the model can learn more entity span invariance information.

In a sixth embodiment, the present embodiment is further defined by a span comparison learning-based cross-domain named entity recognition method in the fifth embodiment, where step 3.3 is further defined, and specifically includes:

in step 3.3, the loss function calculation process of the KL divergence is as follows:

in this embodiment, the loss of KL divergence is calculated through all the entity spans included in the two global boundary prediction matrices obtained in step 2.3, so that the generated challenge sample is more consistent with the distribution predicted by the model itself;

in the countermeasure training, in order to make the model more robust, certain disturbance is performed on training data to generate a countermeasure sample. These challenge samples, unlike the original samples, may introduce some noise or disturbance. To ensure that the challenge samples generated have some similarity and continuity, a hidden variable is typically introduced to control the distance between the samples. The samples that are expected to be generated in the process of generating the reactance samples can approach the predicted distribution of the model itself, so that the model is more robust. The KL distribution is a commonly used index for comparing the difference between two distributions, and a smaller KL divergence indicates a closer two distributions. Accordingly, the quality of the generated challenge sample is evaluated in challenge training by a method of calculating KL distribution for the distribution of the generated challenge sample and the predicted distribution of the model on the original sample, so as to make the generated challenge sample more consistent with the predicted distribution of the model itself. The challenge sample thus generated can be better used to train the model, improving the robustness of the model. Therefore, the loss function calculation process of KL divergence is adopted by the method in the present embodiment.

In a seventh embodiment, the present embodiment is further defined by a span comparison learning-based cross-domain named entity recognition method in the sixth embodiment, where step 3.4 is further defined, and specifically includes:

step 3.4, specifically comprising:

In the embodiment, the loss functions are added to obtain the loss of the cross-domain named entity recognition model based on span comparison learning, and multiple aspects of the model can be optimized through joint training of the loss functions, so that the performance and the robustness of the model are improved.

By optimizing the loss function through end-to-end training, a robust model with better classification performance on the original sample and higher quality of the countermeasure sample can be obtained.

In an eighth embodiment, this embodiment is an embodiment 1 of a span-based comparison learning method for identifying a cross-domain named entity, which specifically includes:

the span comparison learning-based cross-domain named entity identification method designed by the embodiment is realized through the following steps:

step 2.2, constructing a field confusion enhancement sample. Embedding and inputting the source domain and target domain data obtained in the step 2.1 into a pre-training language model BERT, generating a challenge sample by using a Projection Gradient Descent (PGD) method, and classifying the fields by using a challenge attack;

and 2.3, generating a global boundary prediction matrix. Embedding a source domain into an input BERT, and constructing a Global boundary prediction matrix by using Global Pointer by using the obtained output; embedding a source domain and embedding a domain confusion enhancement sample generated in the step 2.2, splicing the domain confusion enhancement sample by using a function concat () and then inputting the spliced domain confusion enhancement sample into BERT, and constructing a Global boundary prediction matrix added with disturbance resistance by using a Global Pointer by using the obtained output;

Step 3: training the cross-domain named entity recognition model based on span comparison learning in the step 2, wherein the method specifically comprises the following steps:

and 4, inputting the target domain test set into the span comparison learning-based cross-domain named entity recognition model after training, updating and optimizing in the step 3, and calculating the score of the target domain entity.

In an eighth embodiment, this embodiment is an embodiment 2 of a span-based comparison learning method for identifying a cross-domain named entity, which specifically includes:

As shown in fig. 1 to 2, the invention provides a span comparison learning-based cross-domain named entity identification method, which specifically comprises the following steps:

the step 1 specifically comprises the following steps:

step 1.1: extracting text sequences from a source domain and a target domain dataset;

step 1.2: dividing the preprocessed data set into a training set and a testing set;

and 2, constructing a cross-domain named entity recognition model based on span comparison learning.

The step 2 specifically comprises the following steps:

step 2.1, obtaining embedded representation of source domain data and target domain data, firstly obtaining codes of each label by using independent thermal codes, generating source domain and target domain token embedments by using a pre-training language model BERT, and numbering two fields of the source domain and the target domain: the source domain is 0 and the target domain is 1.

In step 2.2, it is assumed that there are n source data sets D of marker data ^S ＝{x _i ,y _i } _1,…,n These data are sampled from the source domain independently and in a distributed fashion. At the same time, there is also a target data set D with m unlabeled data ^T ＝{x _j } _1,…,m These data are independently sampled from the target domain in a uniform distribution, where x _i ，x _j Is a token sequence, y _i Is x _i Is a label of (a). In intra-domain training, models are aimed atA function is learned whose input is a token sequence and whose output is the corresponding label. The model aims to learn the function f (x; θ) _f ,θ _y ) x-C, where θ _f Is a parameter of a pre-training language model, θ _y Is a parameter of class label prediction, and C is a label set. In the general task of classification,is the loss of the model in the classification task where the purpose of model learning is to minimize this loss, the specific formula is as follows:

wherein:both the presentation sequence and the tag come from the source domain. In a single field, challenge training is a challenge problem that aims at maximizing internal losses and minimizing external losses.

Wherein: delta is the challenge sample we generated.

Wherein: alpha _adv For controlling the trade-off between the two losses, usually set to 1. Internal maximization can be solved by the Projection Gradient Descent (PGD) method and assumes that the loss function is locally linear. The advantage of PGD is that it relies only on the model itself, it is possible to generate a model with different pairs Samples of resistance, thereby improving the generalization ability of the model on invisible data. PGD use a step-by-step strategy to combat. Specifically, the forward and backward propagation is performed one by one, and the forward and backward propagation is performed one by one according to g ^adv Calculating the disturbance, accumulating the new disturbance counter delta to g of the embedded layer one by one ^adv If one of the ranges is exceeded, the mapping is performed back to the given range. Finally, the g obtained by the last step is calculated ^adv Accumulated onto the original gradient. I.e. g corresponding to the gradient of the accumulated t-step disturbance ^adv And updating the original gradient. The following iterative steps may generate an antagonistic disturbance.

Wherein: e is the upper bound of the countermeasure disturbance, eta is the countermeasure step size, delta _t The challenge samples generated for the current iteration step,gradient of the loss of classification task at time t with respect to input at time t, < >>Is a gradient formula. Pi (II) _{||δ||F≤∈} Representing that if the disturbance exceeds the range e, it is mapped back into the specified range _F Representing the L-definition norm.

As shown in FIG. 3, the domain puzzle can enhance the domain invariance of the model, thereby enabling the model to better adapt to unknown data and domains, confusing the model by discarding domain-related information during the training process, making it difficult to distinguish between data of different domains, and pulling the source (target) data and their corresponding domain puzzles closer together to reduce domain differences. To generate domain confusion enhancement, we use a challenge attack with perturbations to perform the classification tasks of the source and target domains, using the process of generating challenge samples described above, we can generate challenge samples with domain confusion:

Wherein domain-specific loss, delta, using a resistance attack learning domain classifier ₀ Is an initialized challenge sample, θ _d Is a parameter corresponding to the calculation of the domain class, and d is a domain label. Synthesizing the disturbance delta, f (x+delta; theta) by searching the extreme directions of the most plagued domain classifiers in the embedding space _f ) Is a domain puzzle made from a pre-trained language model.The gradient input at time t is used for representing the loss of the domain classification task at time t.

And 2.3, generating a global boundary prediction matrix. Embedding a source domain into an input BERT, and constructing a Global boundary prediction matrix by using Global Pointer by using the obtained output; embedding a source domain and embedding a domain confusion enhancement sample generated in the step 2.2, splicing the domain confusion enhancement sample by using a function concat () and then inputting the spliced domain confusion enhancement sample into BERT, and constructing a global boundary prediction matrix added with disturbance resistance by using the obtained output;

as shown in FIG. 4, the multi-headed recognition nest entity is to generate all possible entity spans. In step 2.3, assume s= [ S ] ₁ ,s ₂ ,…,s _m ]Is a possible span in sentences. Span s is denoted as s [ i:j ]]Where i and j are the head index and the tail index, respectively. The goal of named entity identification is to identify all s E, where E is the set of entity types. Given a sentence x= [ X ] with n tags ₁ ,x ₂ ,…x _n ]First, each marker in X is associated with its corresponding representation in the pre-trained language modelThereby obtaining a new hidden vector output matrixWhere v is the dimension:

h ₁ ,h ₂ ,…h _n ＝BERT(x ₁ ,x ₂ ,…x _n )

q _i,α ＝W _q,α h _i +b _q,α

k _j,α ＝W _k,α h _j +b _k,α

wherein:is a vector representation of an entity token for identifying type alpha, q _i,α ，k _j,α Span s [ i:j ] of type alpha]Start and end positions, W _q,α ，W _k,α Is h _i And h _j Weights of b _q,α ，b _k,α For offset value, span sj]The score belonging to type α may be calculated as follows: />

In the attention mechanism, position coding is divided into two forms, absolute position coding and relative position coding. Although absolute position coding may add position information to a word vector, the position information is associated with a fixed position and cannot represent context information of the fixed position. In order to utilize the boundary information, it will satisfyIs applied to the representation of the entities, making the model more sensitive to the relative position between the entities and thereby improving the performance of the entity recognition. In this way, the respective spans can be calculatedAnd generating a global boundary prediction matrix by the scoring function:

Wherein:and->Are all orthogonal matrices.

step 3.1: calculating the loss of named entity identification of the source domain by using a cross entropy loss function by using the global boundary prediction matrix obtained by embedding the source domain in the step 2.3;

and (3) calculating the score of each entity through the scoring function obtained in the step 2.4. And to solve the class imbalance problem in the classification problem, a cross entropy loss function is designed to facilitate model learning of boundary information for each training support instance:

Step 3.2: calculating the loss of contrast learning through the similarity and dissimilarity of vectors of all entity spans contained in the two global boundary prediction matrixes obtained in the step 2.3;

in step 3.2, using contrast learning, pulling spans similar to the model per se at the span level, and pulling spans dissimilar to the model per se, so that the model can learn more entity span invariance information; as shown in fig. 5, for positive sampling, the model can encode the span of the source domain and the span of the domain puzzle to be closer in representation space, gradually pulling the example to the domain decision boundary as training progresses; for negative samples across domains, contrast loss may push the negative samples of the source and target domains away from each other, as in the left half of fig. 6, and negative samples of the same class across domains away from each other, which contradicts the goal of pulling up different domains. So to avoid the existence of such cross-domain rejection, samples of different domains are excluded from the negative sampling set.

And (3) zooming in spans similar to the span hierarchy, zooming out spans dissimilar to the span hierarchy, enabling the model to learn more entity span invariance information, representing each entity span as a vector for one input sentence, and calculating the similarity and dissimilarity of the vectors of all entity spans contained in the vector to calculate the contrast loss. Therefore, the loss function calculation process of the contrast learning in step 3.23.3 is as follows:

in the countermeasure training, in order to make the model more robust, certain disturbance is performed on training data to generate a countermeasure sample. These challenge samples, unlike the original samples, may introduce some noise or disturbance. To ensure that the challenge samples generated have some similarity and continuity, a hidden variable is typically introduced to control the distance between the samples. The samples that are expected to be generated in the process of generating the reactance samples can approach the predicted distribution of the model itself, so that the model is more robust. The KL distribution is a commonly used index for comparing the difference between two distributions, and a smaller KL divergence indicates a closer two distributions. Accordingly, the quality of the generated challenge sample is evaluated in challenge training by a method of calculating KL distribution for the distribution of the generated challenge sample and the predicted distribution of the model on the original sample, so as to make the generated challenge sample more consistent with the predicted distribution of the model itself. The challenge sample thus generated can be better used to train the model, improving the robustness of the model. Therefore, the loss function calculation process of the KL divergence in step 3.33.4 is as follows:

and (3) performing overall training by adopting an end-to-end neural network model, wherein the model comprises four loss functions, namely identifying task loss, loss of a domain classifier, loss of contrast learning and loss of KL divergence for named entities of a source domain. The loss functions are added to obtain the loss of the cross-domain named entity recognition model based on span comparison learning, and multiple aspects of the model can be optimized through joint training of the loss functions, so that the performance and the robustness of the model are improved.

The invention is not limited to the embodiments described above. The above description of specific embodiments is intended to be illustrative, but not limiting, of the technical solutions of the present invention. Numerous specific modifications can be made by those skilled in the art without departing from the spirit of the invention and scope of the claims, which are within the scope of the invention.

Claims

1. A cross-domain named entity recognition method based on span comparison learning is characterized by comprising the following steps:

2. The span contrast learning-based cross-domain named entity recognition method according to claim 1, wherein step 2.2 specifically comprises:

Wherein: delta is the challenge sample generated;

wherein alpha is _adv For controlling the space between two lossesA trade-off, typically set to 1;

the following iterative steps may generate an antagonistic disturbance;

generating a challenge sample with domain confusion:

3. The span contrast learning-based cross-domain named entity recognition method according to claim 1, wherein step 2.3 specifically comprises:

h ₁ ，h ₂ ，…h _n ＝BERT(x ₁ ，x ₂ ，…x _n )

q _i，α ＝W _q，α h _i +b _q，α

k _j，α ＝W _k，α h _j +b _k，α

wherein:is a vector representation of an entity token for identifying type alpha, q _i,α ，k _j,α Span s [ i:j ] of type alpha]Start and end positions, W _q,α ，W _k,α Is h _i And h _j Weights of b _q,α ，b _k,α For offset value, span sj]Score belonging to type alphaThe calculation is as follows:

wherein:and->Are all orthogonal matrices.

4. The span contrast learning-based cross-domain named entity recognition method according to claim 1, wherein the step 3.1 specifically comprises:

Setting a cross entropy loss function as follows:

5. The method for identifying cross-domain named entities based on span comparison learning according to claim 4, wherein in step 3.2, for an input sentence, each entity span is represented as a vector, and similarity and dissimilarity of vectors of all entity spans contained in the input sentence are calculated to calculate comparison loss;

the loss function calculation process for contrast learning is as follows:

6. The span contrast learning-based cross-domain named entity recognition method according to claim 5, wherein in step 3.3, the loss function calculation process of the KL divergence is as follows:

7. The span contrast learning-based cross-domain named entity recognition method according to claim 6, wherein the step 3.4 specifically comprises:

8. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when running the computer program stored in the memory, performs the steps of the method of any one of claims 1 to 7.

9. A computer-readable storage medium having stored therein a plurality of computer instructions for causing a computer to perform the method of any one of claims 1 to 7.

10. A computer program product, characterized in that the computer program, when executed by a processor, implements the method of any of claims 1 to 7.