CN114417839A - Entity relation joint extraction method based on global pointer network - Google Patents

Entity relation joint extraction method based on global pointer network Download PDF

Info

Publication number
CN114417839A
CN114417839A CN202210060118.8A CN202210060118A CN114417839A CN 114417839 A CN114417839 A CN 114417839A CN 202210060118 A CN202210060118 A CN 202210060118A CN 114417839 A CN114417839 A CN 114417839A
Authority
CN
China
Prior art keywords
entity
head
tail
model
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210060118.8A
Other languages
Chinese (zh)
Inventor
史宏纬
王洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202210060118.8A priority Critical patent/CN114417839A/en
Publication of CN114417839A publication Critical patent/CN114417839A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Abstract

The invention discloses a global pointer network-based entity relationship joint extraction method, which solves the problem of target inconsistency by judging the head and tail positions of entities as a whole. The method includes the steps that (1) condition layer normalized fusion head entity information is introduced to guide a model to capture triple direction characteristics; and a relation classification task is added to extract the potential semantic relation of the sentences so as to filter out partial triple with wrong prediction. The experimental result of the invention on the CMeIE public data set shows that the model constructed by the invention can effectively identify the relation triple in the sentence. A global pointer network is designed to be used as a decoder to extract the relation and the tail entity of the input statement. The global pointer network judges the head and tail positions of the entity as a whole, so that the consistency of training and predicting targets is realized, the performance of the entity relation extraction model is enhanced, and the accuracy rate of the model extraction triple is improved.

Description

Entity relation joint extraction method based on global pointer network
Technical Field
The invention belongs to the field of natural language processing and information extraction, and provides an entity relationship joint extraction model based on a global pointer network. The model can be used for the entity relation extraction task of texts such as Chinese medical treatment and the like, and can provide technical support for the construction of knowledge maps and the downstream field thereof.
Background
With the advent of the big data era, massive unstructured data are stored in the internet, and how to dig out valuable information by using an information extraction technology becomes a research focus in the field of natural language processing. The entity relationship extraction is used as a key subtask of information extraction, and is widely applied to the construction of knowledge graphs and the downstream fields thereof, such as information retrieval, recommendation systems, intelligent question answering and the like. The task aims to extract pairs of entities with specific semantic relationships from unstructured text, represented in the form of relationship triples (head, relationship, tail).
The current entity relationship extraction methods are divided into two categories: pipeline methods and combinatorial methods. The pipelining method is characterized in that two subtasks of entity identification and relation extraction are independently solved, all entities in a sentence are identified by using an entity model, and then entity pairs are classified by using a relation model. The pipeline method ignores the internal connection and dependency relationship between the subtasks, and has a serious error propagation problem. The joint method models and trains the two subtasks integrally to realize mutual promotion of the two subtasks. For example, the LSTM-based joint model proposed by Miwa et al, uses BiLSTM to encode sentences and models grammatical dependencies between words by way-LSTM; zheng et al transforms entity relationship extraction into sequence tagging tasks, the model jointly decodes the entity-relationship labels of each token, and then obtains relationship triples according to the near principle matching. Compared with a pipeline method, the combination method enhances interaction between the entity and the relationship and relieves the error propagation problem, but the relationship is regarded as a discrete label on the entity pair on the modeling way, so that the model is difficult to learn correct classification characteristics in the training process, and therefore the overlapping triples cannot be effectively extracted.
In recent years, researchers have proposed annotation-based strategies to solve the problem of overlapping triples. For example, the ETL-SPAN proposed by Yu et al decomposes an entity relationship extraction task into a head entity extraction subtask and a tail entity relationship extraction subtask, and designs a SPAN distance fused multi-sequence labeling method to realize the extraction of triples; wei et al have designed a cascading binary pointer labeling framework CasRel that can learn the mapping function from the head entity to the tail entity in a given relationship, thereby achieving the overall modeling of triples. The method based on the labeling strategy reduces the noise interference of redundant entities through the joint decoding of entity boundaries and relationship categories, but the method uses a conventional pointer network to respectively train the head and tail positions of the entities, and does not consider the entity integrity required in the estimation of a prediction stage, namely the head and tail positions are required to be predicted correctly at the same time, so that the problem of inconsistency between model training and a predicted target is caused.
Disclosure of Invention
Aiming at the problem, a joint extraction method based on a global pointer network is provided, the global pointer network takes the head and tail positions as a whole, and carries out tail entity discrimination on subsequences corresponding to the head and tail positions, which means that a model carries out training and evaluation by taking the entity sequences as basic units, and consistency of training and predicting targets is guaranteed. In addition, in order to further improve the triple extraction effect, a condition layer normalization method is introduced to fuse the head entity information, and compared with the traditional summing and splicing feature fusion method, the method can better guide the model to capture triple direction features; and a relation classification auxiliary task is added, the potential semantic relation in the sentence is extracted according to the sentence vector, and the model can filter out partial triples with wrong prediction according to the result.
The method comprises the following steps:
step 1: and performing feature extraction on the input sentence. And (3) carrying out global feature extraction on the input sentences by using a NEZHA pre-training language model, and mining deep semantic features to obtain coding vectors with rich context information.
Step 2: all of the head entities in the sentence are identified. And (3) respectively marking the coding vector obtained in the step (2) by using a pointer network to judge whether each word in the sentence is the head and tail positions of the entity, adopting a nearest matching principle, namely, backwards matching each head position mark with the nearest tail position mark, and identifying the subsequence corresponding to the head position mark to the tail position mark as the head entity.
And step 3: and (4) introducing a condition layer normalization method to fuse the coding vector and the head entity characteristics. And setting the corresponding bias and weight in the layer normalization structure as a function of the characteristics of the head entity, and taking the obtained fusion vector as the input of the relation and tail entity extraction.
And 4, step 4: and extracting tail entities of each head entity under a specific relationship. Designing a global pointer network, dividing sentences into a plurality of continuous subsequences and scoring the subsequences according to the fusion vector output in the step 4 under each predefined relationship, and judging which subsequences are correct tail entities according to scores.
And 5: through the steps, the model can extract triples containing wrong semantic relations, and in order to alleviate the problem, the [ CLS ] vector with global semantic information in the coding vector is used as a sentence vector, the relations of the [ CLS ] vector are classified to identify the potential semantic relations in the sentences, and therefore, part of unreasonable triples in the extraction result are filtered.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention designs a global pointer network which is used as a decoder to extract the relation and the tail entity of the input statement. The global pointer network judges the head and tail positions of the entities as a whole. This means that the model is trained and evaluated with the entity sequence as the basic unit, so that the model has a global view, the consistency of training and predicting targets is realized, and the performance of the entity relationship extraction model is enhanced.
2. The invention introduces a condition layer normalization method, and fuses head entity characteristics and coding vectors. The method not only can effectively relieve the problems of gradient disappearance and gradient explosion, but also can fully capture the triple direction information in the input statement and enhance the representation capability of the entity relationship characteristics.
3. The invention adds a relation classification auxiliary task and judges the potential semantic relation in the input sentence. The task can guide the model to better learn sentence vector characteristics, and the model can filter out part of triples with errors in extraction according to the classification result, so that the accuracy rate of the triples extracted by the model is improved.
Drawings
Fig. 1 is a schematic diagram of an overall architecture of an entity relationship extraction model based on a global pointer network.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
The object of the invention is to extract all possible triples in a sentence. In order to extract the overlapping triples more effectively and avoid the noise interference of redundant entities, the entity relationship extraction task is modeled and solved by a method based on a labeling strategy, as shown in formula (1):
Figure BDA0003477916850000031
wherein h, r and t respectively represent a head entity, a relation and a tail entity of the triple, X represents an input sentence, and omega represents a set formed by all relations of the data set.
Based on the method, the entity relationship extraction task is decomposed into three subtasks: 1) identifying all head entities in the sentence 2) extracting tail entities under a specific relationship for each head entity 3) filtering partially unreasonable triples according to sentence semantics.
In order to implement the above task, the overall architecture of the joint extraction method based on the global pointer network provided by the present invention is shown in fig. 1, and the joint extraction method is divided into three modules: 1) the encoding and head entity identification module is used for extracting the characteristics of the input sentence by using NEZHA to obtain an encoding vector and marking all head entities in the sentence through a pointer network; 2) the relation and tail entity extraction module is used for introducing the normalized fusion head entity characteristics and the coding vectors of the condition layer and designing a global pointer network to extract a tail entity sequence under a specific relation; 3) and the relation classification module extracts the potential semantic relation of the sentences and eliminates part of triples with wrong prediction according to the extraction result.
First, coding and head entity identification module
Step 1: in order to fully capture the deep semantic features of sentences, the module adopts a NEZHA pre-training language model to extract the features of input sentences and obtain corresponding coding vectors.
NEZHA is developed based on a BERT pre-training language model, and additionally uses optimization schemes such as functional relative position coding, full word mask, mixed precision training and the like, so that the expression capability of the model on the context information of the input statement can be enhanced
H=NEZHA(X) (2)
Wherein X ═ X12,…,χn) Representing the input sentence, n being the length of the input sentence, H ═ H1,h2,…,hn]A code vector representing each position of the sentence.
Step 2: the encoding vector H is directly decoded to identify all possible head entities in the input sentence, and the invention trains two binary pointer markers to respectively mark the head and tail positions of the head entities by assigning binary labels (0/1) to each word, wherein 1 represents that the current word is the head (tail) position of a certain head entity, and 0 represents that the current word is not the head (tail) position of a head entity. The labeling process is as follows:
Figure BDA0003477916850000041
Figure BDA0003477916850000042
wherein, W(·)Representing a matrix of training parameters, b(·)Represents the offset vector and sigma represents the sigmoid activation function.
Figure BDA0003477916850000043
And
Figure BDA0003477916850000044
respectively representing the probability that the word at the ith position in the sentence is the head and the tail of the head entity.
The model will determine the binary label for each word based on the magnitude of the probability versus the threshold. For the case that the sentence contains a plurality of head entities, the module adopts a closest matching principle, namely each head position mark is matched with a closest tail position mark backwards, and the subsequence corresponding to the head position mark to the tail position mark is identified as the head entity.
The method adopts a minimum two-class cross entropy loss function to train model parameters, and comprises the following steps:
Figure BDA0003477916850000045
wherein, L represents the length of the sequence,
Figure BDA0003477916850000046
a real tag indicating that the ith character is at the start or end position.
Second, relation and tail entity extraction module
And step 3: unlike the step 2 of identifying the head entity, in addition to the encoding vector of the input sentence, the head entity feature needs to be additionally considered when extracting the relation and the tail entity. At present, simple addition or splicing methods are generally adopted in a joint model for fusion between features, and the method limits the expression of fused features.
Taking the correct triplet (chronic kidney disease, multiple population, old people) and the incorrect triplet (old people, multiple population, chronic kidney disease) obtained after reversing the head and tail entities in fig. 1 as examples, if the fusion method is used for the two triplets, the same fusion vector will be obtained, and for the unreasonable phenomenon, the direction information of the triplet needs to be additionally considered during feature fusion. In this regard, the present invention introduces a Conditional layer Normalization method cln (Conditional layer Normalization) with reference to the concept of Conditional Batch Normalization, and sets the corresponding bias and weight in the layer Normalization structure as a function of the condition to be fused. The specific calculation of CLN is as follows:
Figure BDA0003477916850000047
wherein the content of the first and second substances,
Figure BDA0003477916850000048
representing CLN method inputsMu represents a mean value of the feature information, sigma2Representing the variance of the feature information, e is a positive number tending to 0, gamma and theta are unconditional training parameters, cγAnd cβRespectively representing two input condition information to be fused, W1And W2The training matrix is the condition information to be fused.
As can be seen from the above formula, the CLN structure not only passes through W1And W2Training matrix is aligned with cγ、cβγ, β, and cγAnd cβAnd mapping the information to different vector spaces to learn the direction information of the condition to be fused.
As shown in FIG. 1, the invention uses CLN method to perform feature fusion on head entity feature and the coding vector output in step 1, and c in CLN methodγAnd cβAre set as header entity codes. The feature fusion process is as follows:
H'=CLN(H,hhead,hhead) (7)
hhead=Concat(hstart,hend) (8)
where H' represents the fused encoded vector for relational and tail entity extraction, HheadThe coding vector h corresponding to the initial and end positions of the entity extracted in step 1 and step 2 is expressed as the head entity codingstartAnd hendAnd (4) splicing to obtain the product.
And 4, step 4: and extracting the tail entities possibly existing in the sentences according to the fusion characteristics under each predefined relationship. In contrast, the invention designs a global pointer network to judge the head and tail positions of the entity as a whole, rather than marking the head and tail positions of the entity separately, so that the model has a global view, and the consistency of model training and prediction targets is realized.
As shown in fig. 1, the global pointer network treats an input sentence with a length of n as n (n +1)/2 consecutive subsequences with different lengths, and represents the form of (i, j), where i represents a start position and j represents an end position. The goal of this module is to score each subsequence and identify the correct tail entity based on the score. Assuming that there are m relation classes in the dataset, the model will discriminate the subsequences under m relation subspaces respectively, which means that herein the relation and tail entity extraction task is converted into m "n (n +1)/2 k-by-k" multi-label classification tasks, where k represents the number of tail entities.
Firstly, the global pointer network uses two linear layers to perform linear transformation on the fusion coding vector H' to obtain a vector sequence
Figure BDA0003477916850000051
And
Figure BDA0003477916850000052
secondly, in order to enhance the sensitivity of the pointer network to the length and span of the tail entity, a relative position code RoPE is introduced, which satisfies the condition that
Figure BDA0003477916850000053
Applying the transformation matrix R to qaAnd kaVector is carried out; finally, by
Figure BDA0003477916850000054
And
Figure BDA0003477916850000055
inner product of(s)α(i, j) representing subsequences from i to j as scores of complete tail entities, all subsequences with scores greater than a threshold being considered tail entities of the current head entity under the alpha relationship. The global pointer network labeling process is as follows:
Figure BDA0003477916850000056
Figure BDA0003477916850000057
Figure BDA0003477916850000058
wherein the content of the first and second substances,
Figure BDA0003477916850000059
a matrix of training parameters is represented that is,
Figure BDA00034779168500000510
representing a bias vector. And repeating the operations of the relation and the tail entity extraction for each extracted head entity to extract all possible triples in the sentence.
For the multi-label classification task, the conventional idea is to convert the multi-label classification task into n (n +1)/2 binary classification tasks, which will cause a serious problem of class imbalance, for this reason, the module introduces Circle Loss, so that the score of each entity subsequence is not less than that of a non-entity subsequence, and finally the k subsequences with the highest score are output. Aiming at the scene that the number of the current task tail entities is not fixed, an s is added on the basis of the current task tail entities0The threshold value is used for determining the number of tail entities finally output by the module, so that the scores of the entity subsequences are all larger than s0All non-entity subsequences being smaller than s0Finally all the output values are greater than the threshold s0A subsequence of (2). This step trains the model parameters by minimizing this loss function, as follows:
Figure BDA0003477916850000061
wherein, PαDenotes all non-tail entity subsequences, Q, of the current head entity under the alpha-th relationshipαRepresenting all tail entity subsequences of the current head entity under the a-th relationship.
Third, relation classification module
And 5: after steps 1 through 4, the model may extract triples containing incorrect semantic relationships. In order to alleviate the problem, a relationship classification auxiliary task is added in the text, a [ CLS ] vector with global semantic information in a coding vector is used as a sentence vector to be input into the module, a potential semantic relationship in a sentence is identified, and a part of unreasonable triples in a model extraction result are filtered out, which is specifically as follows:
Figure BDA0003477916850000062
wherein the content of the first and second substances,
Figure BDA0003477916850000063
probability, W, of current sentence having kth relationkRepresenting a matrix of parameters to be trained, bkRepresents the offset vector and sigma represents the sigmoid activation function.
In the prediction phase of the model, if some relation k probability
Figure BDA0003477916850000064
And if the value is less than the given threshold, the sentence is considered to have no semantic relation k, and all triples with the relation k in the model extraction result are filtered out.
In this step, the model uses a two-class cross entropy loss function to train model parameters, as follows:
Figure BDA0003477916850000065
wherein the content of the first and second substances,
Figure BDA0003477916850000066
the true tags representing the kth relationship of the current sentence, and M represents the number of predefined relationships.
Step 6: the joint loss function of the present invention is obtained by adding the losses in steps 2, 4, 5 and learning the parameters of the overall model by minimizing the joint loss as follows:
Loss=Losssubject+Lossobject+Lossrel (14)
the method uses an Adam algorithm training model, and adopts Exponential Moving Average (EMA) to ensure the stability of the training process.
Introduction to the Experimental data set
The invention uses a public data set CMeIE, the data set is derived from CHIP2020 academic evaluation competition and is jointly provided by ' Zhengzhou university ', ' Beijing university ', ' Pengcheng laboratory ' and ' Harbin industry university ' (Shenzhen '). The data are derived from pediatric and hundreds of common diseases (wherein the pediatric corpus is derived from 518 pediatric diseases, and the common disease corpus is derived from 109 common diseases), and are labeled with 2.8 ten thousand disease sentences, nearly 7.5 thousand triplet data and 53 relationship types. The data set comprises 14339 pieces of training set data, 3585 pieces of verification set data and 4482 pieces of test set data, and comprises 53 Schema constraints.
Zeng et al classified the sentences in the dataset into three categories, Normal, EntityparirOverlap (EPO) and SingleEntityOverlap (SEO), based on the degree of triple overlap. The Normal type indicates that the head entity or the tail entity of any triplet in the sentence has no overlap with the head and tail entities of other triplets. EPO type indicates that triplets in a sentence have overlapping pairs of entities, i.e., there are multiple relationships between two entities simultaneously. The SEO type indicates that there are no overlapping pairs of entities, but there are overlapping entities. CMeIE dataset samples are shown in table 1:
TABLE 1 CMeIE data set sample
Figure BDA0003477916850000071
Introduction to Experimental parameters
The test of the invention uses GTX 2080Ti video card operation code, the video memory is 11G, the test is carried out on a Linux centros platform, Python3.6/Keras 2.2.4/Tensorflow 1.1.14 is used, and the test parameter setting is shown in Table 2:
TABLE 2 hyper-parameters of the model
Figure BDA0003477916850000072
Figure BDA0003477916850000081
Evaluation index of experiment
The triples extracted by the method are only extracted correctly when the head-tail entity boundaries are correct and the corresponding relation categories are correct at the same time. The task evaluation index has the following three: precision (Precision), Recall (Recall), F1 value (F1-Score). The details are as follows:
Figure BDA0003477916850000082
Figure BDA0003477916850000083
Figure BDA0003477916850000084
wherein, TP represents the correct number of the extracted entity relationship triples, TP + FP represents the total number of the extracted triples, and TP + FN represents the total number of the triples labels.
Analysis of Experimental results
In order to verify the effectiveness of the method provided by the invention in improving the extraction effect of the triples, the method is compared with other joint extraction models on a CMeIE Chinese medical data set, as shown in Table 3:
InfoExtractor2.0 is a baseline model for a 2020 hundred degree information extraction competition. A structured markup strategy is designed to fine-tune the pre-trained language model, by which multiple overlapping SPOs can be extracted in one process.
2. The position auxiliary step-by-step marking model firstly determines a main entity by marking head and tail positions, and then marks corresponding guest entities under the relationship attributes preset one by one. In order to improve the extraction effect, triple position auxiliary information is introduced in the marking process, and a feature representation is enhanced by combining a self-attention mechanism.
And 3, replacing the LSTM coding layer by a Bert pre-training language model in the MultiHead Selection, and replacing the sequence label CRF with a pointer label according to the Nested NER problem. The model treats the relationship classifier as a linear classifier of entity pairs, and each entity pair selects only the last character of the current entity segment for relationship prediction.
Biaffine Attention uses Bert as a shared coding layer, combining coding vectors with entity tag embedding, using a Biaffine classifier to predict the relationship between tokens.
CasRel uses BERT as a shared coding layer, a cascaded binary markup framework uses a pointer network to extract relations and entities, first extracting head entities, then extracting tail entities according to the head entities and judging the relations.
Table 3 comparative experiment with existing extraction model
Figure BDA0003477916850000091
As can be seen from table 3, the performance of the method on the CMeIE data set is significantly improved compared with other joint extraction models, and the result analysis can obtain:
(1) the InfoExtractor2.0 adopts multiple sequences to jointly label the entities and the relations, can effectively identify the entity pair overlapping problem in the triplets, but has the condition of redundant matching of head and tail entities, and causes the extraction accuracy rate to be low.
(2) Comparing the results of MultiHead Selection and Biaffine Attention, it can be found that the mapping relationship from the head entity sequence to the tail entity sequence can be better captured by using the double affine Attention matrix than simply linearly changing the head entity sequence and the tail entity sequence, and the recall rate of the model is improved.
(3) Comparing the Biaffine Attention and CasRel results, it can be found that the relationship is regarded as a function of mapping the head entity to the tail entity instead of discrete labels between entity pairs, so that model training can be better guided, and the extraction performance of model triples can be improved.
Generally, the greater the number of triples in a sentence, the more complex the sentence structure. In order to explore the extraction performance of the model in sentences with different complexity degrees, the method provided by the invention performs experiments on sentences with different triple quantities on the verification set. The results of the experiment are shown in table 4. From experimental results, the extraction performance of the method on sentences with five complexity degrees is superior to that of a baseline model CasRel, and the evidence shows that the method can more effectively model sentences containing a plurality of triples and extract the triples.
In order to further explore the extraction performance of models with different overlapping types, the verification set of CMeIE is divided into three types of Normal, EPO and SEO, and the extraction performance of the CMeIE in three patterns is compared with that of a baseline model CasRel. The results of the experiment are shown in Table 5. Experimental results show that the extraction performance of the model is superior to that of a baseline model CasRel on three overlapped sentences, and the evidence shows that the model can effectively solve the problem of overlapped triples.
Table 4 experimental results of sentences containing different numbers of triples
Figure BDA0003477916850000092
TABLE 5 Experimental results on different overlapping types of sentences
Figure BDA0003477916850000101
In order to explore the influence degree of model components on the overall performance of the model, under the condition of controlling other structures and overall parameters of the model to be the same, the contribution of each component to the experimental result of the model in the CMeIE data set is evaluated by adding, deleting or replacing the model components, and 5 group comparison experiments are set in total, wherein the experiment results are respectively as follows:
Figure BDA0003477916850000102
ours: representing all components of a retention model
Figure BDA0003477916850000103
-NEZHA: replacement of NEZHA with Bert Pre-training language model
Figure BDA0003477916850000104
-global pointer network: replacing global pointer network with multi-layer binary pointer network
Figure BDA0003477916850000105
-cln (add): fusing header entity vectors and code vectors using conventional addition methods
Figure BDA0003477916850000106
-a relationship classification: removing relationship classification assistance tasks
The results are shown in Table 6:
table 6 model ablation experimental results
Figure BDA0003477916850000107
The results of the experiment were analyzed as follows:
and replacing NEZHA with a Bert pre-training language model, wherein the recall rate of the model is obviously reduced, which shows that the NEZHA pre-training language model can capture richer context information.
Replacing the global pointer network decoder with a multi-layer pointer network decoder, the overall performance of the model is reduced, and the global pointer decoder can enable the model to better capture the overall information of the tail entity and the mapping relationship between the relationship and the entity.
And replacing the condition layer normalization with a vector addition method, so that the overall performance of the model is reduced, the condition layer normalization method can learn the directional information of the triples, and the model is guided to fuse the head entity characteristics more effectively.
And the relation classification module is removed, so that the overall performance of the model is reduced, and the relation auxiliary task can not only enhance the learning of the model on sentence level characteristics, but also effectively remove wrong triples.

Claims (7)

1. The entity relationship joint extraction method based on the global pointer network is characterized in that: the method comprises the following steps:
step 1: extracting the characteristics of the input sentence; global feature extraction is carried out on input sentences by using a NEZHA pre-training language model, deep semantic features are mined, and coding vectors with rich context information are obtained;
step 2: identifying all head entities in the sentence; respectively marking the coding vector obtained in the step (2) by using a pointer network to judge whether each word in the sentence is the head and tail positions of the entity, adopting a nearest matching principle, namely backwards matching each head position mark with a nearest tail position mark, and identifying the subsequence corresponding to the head position mark to the tail position mark as a head entity;
and step 3: a condition layer normalization method is introduced to fuse the coding vector and the head entity characteristics; setting the corresponding bias and weight in the layer normalization structure as a function of the characteristics of the head entity, and taking the obtained fusion vector as the input of the relation and tail entity extraction;
and 4, step 4: extracting tail entities of each head entity under a specific relationship; designing a global pointer network, dividing sentences into a plurality of continuous subsequences and scoring the subsequences according to the fusion vector output in the step 4 under each predefined relationship, and judging which subsequences are correct tail entities according to scores;
and 5: the model may extract triples containing wrong semantic relations, the [ CLS ] vector with global semantic information in the coding vector is used as a sentence vector, the relations are classified to identify potential semantic relations in the sentence, and therefore, some unreasonable triples in the extraction result are filtered;
modeling and solving an entity relation extraction task based on a labeling strategy method, wherein the method is shown as a formula (1):
Figure FDA0003477916840000011
wherein h, r and t respectively represent a head entity, a relation and a tail entity of the triple, X represents an input sentence, and omega represents a set formed by all relations of the data set.
2. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: step 1: extracting features of the input sentences by adopting a NEZHA pre-training language model, and obtaining corresponding coding vectors;
NEZHA is developed based on a BERT pre-training language model, and additionally uses optimization schemes such as functional relative position coding, full word mask, mixed precision training and the like, so that the expression capability of the model on the context information of the input statement can be enhanced
H=NEZHA(X) (2)
Wherein X is (X)1,x2,…,χn) Representing the input sentence, n being the length of the input sentence, H ═ H1,h2,…,hn]A code vector representing each position of the sentence.
3. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: step 2: decoding the encoded vector H directly to identify all possible head entities in the input sentence, training two binary pointer markers, marking the head/tail positions of the head entities by assigning binary tags (0/1) to each word, wherein 1 indicates that the current word is the head/tail of a certain head entity, and 0 indicates that the current word is not the head/tail position of a head entity; the labeling process is as follows:
Figure FDA0003477916840000021
Figure FDA0003477916840000022
wherein,W(·)Representing a matrix of training parameters, b(·)Representing a bias vector, and sigma representing a sigmoid activation function;
Figure FDA0003477916840000023
and
Figure FDA0003477916840000024
respectively representing the probability that the character at the ith position in the sentence is the head and the tail of the head entity;
the model determines the binary label of each word according to the size relation between the probability and the threshold value; for the situation that a sentence contains a plurality of head entities, adopting a nearest matching principle, namely, each head position mark is backwards matched with a nearest tail position mark, and identifying a subsequence corresponding to the head position mark to the tail position mark as a head entity;
the model parameters were trained using the minimized two-class cross-entropy loss function, as follows:
Figure FDA0003477916840000025
wherein, L represents the length of the sequence,
Figure FDA0003477916840000026
a real tag indicating that the ith character is at the start or end position.
4. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: and step 3: different from the step 2 of identifying the head entity, in addition to the coding vector of the input sentence, the characteristics of the head entity need to be additionally considered when extracting the relation and the tail entity;
introducing a conditional layer normalization method CLN, and setting corresponding bias and weight in a layer normalization structure as functions related to conditions to be fused; the specific calculation of CLN is as follows:
Figure FDA0003477916840000027
wherein S represents the characteristic information input by the CLN method, mu represents the mean value of the characteristic information, and sigma represents the average value of the characteristic information2Representing the variance of the feature information, e is a positive number tending to 0, gamma and beta are unconditional training parameters, cγAnd cβRespectively representing two input condition information to be fused, W1And W2A training matrix of condition information to be fused;
CLN structure not only passes through W1And W2Training matrix is aligned with cγ、cβγ, β, and cγAnd cβMapping the direction information of the learning condition to be fused in different vector spaces;
using CLN method to fuse the head entity character and the code vector output in step 1, and c in CLN methodγAnd cβAre all set as head entity codes; the feature fusion process is as follows:
H'=CLN(H,hhead,hhead) (7)
hhead=Concat(hstart,hend) (8)
where H' represents the fused encoded vector for relational and tail entity extraction, HheadThe coding vector h corresponding to the initial and end positions of the entity extracted in step 1 and step 2 is expressed as the head entity codingstartAnd hendAnd (4) splicing to obtain the product.
5. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: and 4, step 4: extracting tail entities possibly existing in the sentences according to the fusion characteristics under each predefined relationship; designing a global pointer network to judge the head and tail positions of the entity as a whole, rather than marking the head and tail positions of the entity separately, so that the model has a global view, and consistency of model training and a prediction target is realized;
the global pointer network regards an input sentence with the length of n as n (n +1)/2 continuous subsequences with different lengths, and the expression is (i, j), wherein i represents a starting position, and j represents an ending position; scoring each subsequence, and judging a correct tail entity according to the score; assuming that m relation categories are in total in the data set, the model distinguishes sub-sequences under m relation subspaces respectively, and converts the relation and tail entity extraction tasks into m multi-label classification tasks of 'n (n + 1)/2-to-k', wherein k represents the number of tail entities;
firstly, the global pointer network uses two linear layers to perform linear transformation on the fusion coding vector H' to obtain a vector sequence
Figure FDA0003477916840000031
And
Figure FDA0003477916840000032
secondly, in order to enhance the sensitivity of the pointer network to the length and span of the tail entity, a relative position code RoPE is introduced, which satisfies the condition that
Figure FDA0003477916840000033
Applying the transformation matrix R to qaAnd kaVector is carried out; finally, by
Figure FDA0003477916840000034
And
Figure FDA0003477916840000035
inner product of(s)α(i, j) representing the fraction of subsequences from i to j as complete tail entities, all subsequences with fraction greater than a threshold being considered tail entities of the current head entity under the alpha relationship; the global pointer network labeling process is as follows:
Figure FDA0003477916840000036
Figure FDA0003477916840000037
Figure FDA0003477916840000038
wherein the content of the first and second substances,
Figure FDA0003477916840000039
a matrix of training parameters is represented that is,
Figure FDA00034779168400000310
representing a bias vector; repeating the relation and tail entity extraction operation for each extracted head entity to extract all possible triples in the sentence;
introducing Circle Loss to ensure that the score of each entity subsequence is not less than that of a non-entity subsequence, and finally outputting k subsequences with the highest score; is added with an s0The threshold value is used for determining the number of tail entities finally output by the module, so that the scores of the entity subsequences are all larger than s0All non-entity subsequences being smaller than s0Finally all the output values are greater than the threshold s0A subsequence of (a); the model parameters were trained by minimizing this loss function, as follows:
Figure FDA00034779168400000311
wherein, PαDenotes all non-tail entity subsequences, Q, of the current head entity under the alpha-th relationshipαRepresenting all tail entity subsequences of the current head entity under the a-th relationship.
6. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: and 5: after steps 1 to 4, the model may extract triples containing incorrect semantic relationships; a relation classification auxiliary task is added, a [ CLS ] vector with global semantic information in a coding vector is used as a sentence vector and is input into the module, a potential semantic relation in a sentence is identified, and a part of unreasonable triples in a model extraction result are filtered out, and the relation classification auxiliary task is specifically as follows:
Figure FDA0003477916840000041
wherein the content of the first and second substances,
Figure FDA0003477916840000042
probability, W, of current sentence having kth relationkRepresenting a matrix of parameters to be trained, bkRepresenting a bias vector, and sigma representing a sigmoid activation function;
in the prediction phase of the model, if some relation k probability
Figure FDA0003477916840000043
If the semantic relation k is smaller than the given threshold, the sentence is considered to have no semantic relation k, and all triples with the relation k in the model extraction result are filtered out;
the model uses a two-class cross entropy loss function to train model parameters as follows:
Figure FDA0003477916840000044
wherein the content of the first and second substances,
Figure FDA0003477916840000045
the true tags representing the kth relationship of the current sentence, and M represents the number of predefined relationships.
7. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: step 6: the joint loss function is obtained by adding the losses in steps 2, 4, 5 and learning the parameters of the overall model by minimizing the joint loss as follows:
Loss=Losssubject+Lossobject+Lossrel (15)
the model is trained by using the Adam algorithm, and exponential moving average is adopted to ensure the stability of the training process.
CN202210060118.8A 2022-01-19 2022-01-19 Entity relation joint extraction method based on global pointer network Pending CN114417839A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210060118.8A CN114417839A (en) 2022-01-19 2022-01-19 Entity relation joint extraction method based on global pointer network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210060118.8A CN114417839A (en) 2022-01-19 2022-01-19 Entity relation joint extraction method based on global pointer network

Publications (1)

Publication Number Publication Date
CN114417839A true CN114417839A (en) 2022-04-29

Family

ID=81275354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210060118.8A Pending CN114417839A (en) 2022-01-19 2022-01-19 Entity relation joint extraction method based on global pointer network

Country Status (1)

Country Link
CN (1) CN114417839A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691895A (en) * 2022-05-31 2022-07-01 南京航天数智科技有限公司 Criminal case entity relationship joint extraction method based on pointer network
CN115759098A (en) * 2022-11-14 2023-03-07 中国科学院空间应用工程与技术中心 Chinese entity and relation combined extraction method and system for space text data
CN116579426A (en) * 2023-07-11 2023-08-11 航天宏康智能科技(北京)有限公司 Training method and device for network security threat knowledge extraction model
CN117151084A (en) * 2023-10-31 2023-12-01 山东齐鲁壹点传媒有限公司 Chinese spelling and grammar error correction method, storage medium and equipment
CN117408247A (en) * 2023-12-15 2024-01-16 南京邮电大学 Intelligent manufacturing triplet extraction method based on relational pointer network

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691895A (en) * 2022-05-31 2022-07-01 南京航天数智科技有限公司 Criminal case entity relationship joint extraction method based on pointer network
CN115759098A (en) * 2022-11-14 2023-03-07 中国科学院空间应用工程与技术中心 Chinese entity and relation combined extraction method and system for space text data
CN116579426A (en) * 2023-07-11 2023-08-11 航天宏康智能科技(北京)有限公司 Training method and device for network security threat knowledge extraction model
CN117151084A (en) * 2023-10-31 2023-12-01 山东齐鲁壹点传媒有限公司 Chinese spelling and grammar error correction method, storage medium and equipment
CN117151084B (en) * 2023-10-31 2024-02-23 山东齐鲁壹点传媒有限公司 Chinese spelling and grammar error correction method, storage medium and equipment
CN117408247A (en) * 2023-12-15 2024-01-16 南京邮电大学 Intelligent manufacturing triplet extraction method based on relational pointer network
CN117408247B (en) * 2023-12-15 2024-03-29 南京邮电大学 Intelligent manufacturing triplet extraction method based on relational pointer network

Similar Documents

Publication Publication Date Title
CN110135457B (en) Event trigger word extraction method and system based on self-encoder fusion document information
CN112214995B (en) Hierarchical multitasking term embedded learning for synonym prediction
CN114417839A (en) Entity relation joint extraction method based on global pointer network
CN110781683A (en) Entity relation joint extraction method
CN108829722A (en) A kind of Dual-Attention relationship classification method and system of remote supervisory
CN112528034B (en) Knowledge distillation-based entity relationship extraction method
CN112183094A (en) Chinese grammar debugging method and system based on multivariate text features
CN110309511B (en) Shared representation-based multitask language analysis system and method
CN113282713B (en) Event trigger detection method based on difference neural representation model
US20240013000A1 (en) Method and apparatus of ner-oriented chinese clinical text data augmentation
CN114548101B (en) Event detection method and system based on backtracking sequence generation method
CN113761893B (en) Relation extraction method based on mode pre-training
CN116151256A (en) Small sample named entity recognition method based on multitasking and prompt learning
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN117076653A (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
CN114610866A (en) Sequence-to-sequence combined event extraction method and system based on global event type
CN114611520A (en) Text abstract generating method
CN114492460A (en) Event causal relationship extraction method based on derivative prompt learning
CN114564950A (en) Electric Chinese named entity recognition method combining word sequence
CN112784601B (en) Key information extraction method, device, electronic equipment and storage medium
CN116663539A (en) Chinese entity and relationship joint extraction method and system based on Roberta and pointer network
CN116975161A (en) Entity relation joint extraction method, equipment and medium of power equipment partial discharge text
CN116127097A (en) Structured text relation extraction method, device and equipment
CN115017356A (en) Image text pair judgment method and device
CN114282537A (en) Social text-oriented cascade linear entity relationship extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination