CN114417839A

CN114417839A - Entity relation joint extraction method based on global pointer network

Info

Publication number: CN114417839A
Application number: CN202210060118.8A
Authority: CN
Inventors: 史宏纬; 王洁
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2022-04-29

Abstract

The invention discloses a global pointer network-based entity relationship joint extraction method, which solves the problem of target inconsistency by judging the head and tail positions of entities as a whole. The method includes the steps that (1) condition layer normalized fusion head entity information is introduced to guide a model to capture triple direction characteristics; and a relation classification task is added to extract the potential semantic relation of the sentences so as to filter out partial triple with wrong prediction. The experimental result of the invention on the CMeIE public data set shows that the model constructed by the invention can effectively identify the relation triple in the sentence. A global pointer network is designed to be used as a decoder to extract the relation and the tail entity of the input statement. The global pointer network judges the head and tail positions of the entity as a whole, so that the consistency of training and predicting targets is realized, the performance of the entity relation extraction model is enhanced, and the accuracy rate of the model extraction triple is improved.

Description

Entity relation joint extraction method based on global pointer network

Technical Field

The invention belongs to the field of natural language processing and information extraction, and provides an entity relationship joint extraction model based on a global pointer network. The model can be used for the entity relation extraction task of texts such as Chinese medical treatment and the like, and can provide technical support for the construction of knowledge maps and the downstream field thereof.

Background

With the advent of the big data era, massive unstructured data are stored in the internet, and how to dig out valuable information by using an information extraction technology becomes a research focus in the field of natural language processing. The entity relationship extraction is used as a key subtask of information extraction, and is widely applied to the construction of knowledge graphs and the downstream fields thereof, such as information retrieval, recommendation systems, intelligent question answering and the like. The task aims to extract pairs of entities with specific semantic relationships from unstructured text, represented in the form of relationship triples (head, relationship, tail).

The current entity relationship extraction methods are divided into two categories: pipeline methods and combinatorial methods. The pipelining method is characterized in that two subtasks of entity identification and relation extraction are independently solved, all entities in a sentence are identified by using an entity model, and then entity pairs are classified by using a relation model. The pipeline method ignores the internal connection and dependency relationship between the subtasks, and has a serious error propagation problem. The joint method models and trains the two subtasks integrally to realize mutual promotion of the two subtasks. For example, the LSTM-based joint model proposed by Miwa et al, uses BiLSTM to encode sentences and models grammatical dependencies between words by way-LSTM; zheng et al transforms entity relationship extraction into sequence tagging tasks, the model jointly decodes the entity-relationship labels of each token, and then obtains relationship triples according to the near principle matching. Compared with a pipeline method, the combination method enhances interaction between the entity and the relationship and relieves the error propagation problem, but the relationship is regarded as a discrete label on the entity pair on the modeling way, so that the model is difficult to learn correct classification characteristics in the training process, and therefore the overlapping triples cannot be effectively extracted.

In recent years, researchers have proposed annotation-based strategies to solve the problem of overlapping triples. For example, the ETL-SPAN proposed by Yu et al decomposes an entity relationship extraction task into a head entity extraction subtask and a tail entity relationship extraction subtask, and designs a SPAN distance fused multi-sequence labeling method to realize the extraction of triples; wei et al have designed a cascading binary pointer labeling framework CasRel that can learn the mapping function from the head entity to the tail entity in a given relationship, thereby achieving the overall modeling of triples. The method based on the labeling strategy reduces the noise interference of redundant entities through the joint decoding of entity boundaries and relationship categories, but the method uses a conventional pointer network to respectively train the head and tail positions of the entities, and does not consider the entity integrity required in the estimation of a prediction stage, namely the head and tail positions are required to be predicted correctly at the same time, so that the problem of inconsistency between model training and a predicted target is caused.

Disclosure of Invention

Aiming at the problem, a joint extraction method based on a global pointer network is provided, the global pointer network takes the head and tail positions as a whole, and carries out tail entity discrimination on subsequences corresponding to the head and tail positions, which means that a model carries out training and evaluation by taking the entity sequences as basic units, and consistency of training and predicting targets is guaranteed. In addition, in order to further improve the triple extraction effect, a condition layer normalization method is introduced to fuse the head entity information, and compared with the traditional summing and splicing feature fusion method, the method can better guide the model to capture triple direction features; and a relation classification auxiliary task is added, the potential semantic relation in the sentence is extracted according to the sentence vector, and the model can filter out partial triples with wrong prediction according to the result.

The method comprises the following steps:

step 1: and performing feature extraction on the input sentence. And (3) carrying out global feature extraction on the input sentences by using a NEZHA pre-training language model, and mining deep semantic features to obtain coding vectors with rich context information.

Step 2: all of the head entities in the sentence are identified. And (3) respectively marking the coding vector obtained in the step (2) by using a pointer network to judge whether each word in the sentence is the head and tail positions of the entity, adopting a nearest matching principle, namely, backwards matching each head position mark with the nearest tail position mark, and identifying the subsequence corresponding to the head position mark to the tail position mark as the head entity.

And step 3: and (4) introducing a condition layer normalization method to fuse the coding vector and the head entity characteristics. And setting the corresponding bias and weight in the layer normalization structure as a function of the characteristics of the head entity, and taking the obtained fusion vector as the input of the relation and tail entity extraction.

And 4, step 4: and extracting tail entities of each head entity under a specific relationship. Designing a global pointer network, dividing sentences into a plurality of continuous subsequences and scoring the subsequences according to the fusion vector output in the step 4 under each predefined relationship, and judging which subsequences are correct tail entities according to scores.

And 5: through the steps, the model can extract triples containing wrong semantic relations, and in order to alleviate the problem, the [ CLS ] vector with global semantic information in the coding vector is used as a sentence vector, the relations of the [ CLS ] vector are classified to identify the potential semantic relations in the sentences, and therefore, part of unreasonable triples in the extraction result are filtered.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention designs a global pointer network which is used as a decoder to extract the relation and the tail entity of the input statement. The global pointer network judges the head and tail positions of the entities as a whole. This means that the model is trained and evaluated with the entity sequence as the basic unit, so that the model has a global view, the consistency of training and predicting targets is realized, and the performance of the entity relationship extraction model is enhanced.

2. The invention introduces a condition layer normalization method, and fuses head entity characteristics and coding vectors. The method not only can effectively relieve the problems of gradient disappearance and gradient explosion, but also can fully capture the triple direction information in the input statement and enhance the representation capability of the entity relationship characteristics.

3. The invention adds a relation classification auxiliary task and judges the potential semantic relation in the input sentence. The task can guide the model to better learn sentence vector characteristics, and the model can filter out part of triples with errors in extraction according to the classification result, so that the accuracy rate of the triples extracted by the model is improved.

Drawings

Fig. 1 is a schematic diagram of an overall architecture of an entity relationship extraction model based on a global pointer network.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

The object of the invention is to extract all possible triples in a sentence. In order to extract the overlapping triples more effectively and avoid the noise interference of redundant entities, the entity relationship extraction task is modeled and solved by a method based on a labeling strategy, as shown in formula (1):

wherein h, r and t respectively represent a head entity, a relation and a tail entity of the triple, X represents an input sentence, and omega represents a set formed by all relations of the data set.

Based on the method, the entity relationship extraction task is decomposed into three subtasks: 1) identifying all head entities in the sentence 2) extracting tail entities under a specific relationship for each head entity 3) filtering partially unreasonable triples according to sentence semantics.

In order to implement the above task, the overall architecture of the joint extraction method based on the global pointer network provided by the present invention is shown in fig. 1, and the joint extraction method is divided into three modules: 1) the encoding and head entity identification module is used for extracting the characteristics of the input sentence by using NEZHA to obtain an encoding vector and marking all head entities in the sentence through a pointer network; 2) the relation and tail entity extraction module is used for introducing the normalized fusion head entity characteristics and the coding vectors of the condition layer and designing a global pointer network to extract a tail entity sequence under a specific relation; 3) and the relation classification module extracts the potential semantic relation of the sentences and eliminates part of triples with wrong prediction according to the extraction result.

First, coding and head entity identification module

Step 1: in order to fully capture the deep semantic features of sentences, the module adopts a NEZHA pre-training language model to extract the features of input sentences and obtain corresponding coding vectors.

NEZHA is developed based on a BERT pre-training language model, and additionally uses optimization schemes such as functional relative position coding, full word mask, mixed precision training and the like, so that the expression capability of the model on the context information of the input statement can be enhanced

H＝NEZHA(X) (2)

Wherein X ═ X₁,χ₂,…,χ_n) Representing the input sentence, n being the length of the input sentence, H ═ H₁,h₂,…,h_n]A code vector representing each position of the sentence.

Step 2: the encoding vector H is directly decoded to identify all possible head entities in the input sentence, and the invention trains two binary pointer markers to respectively mark the head and tail positions of the head entities by assigning binary labels (0/1) to each word, wherein 1 represents that the current word is the head (tail) position of a certain head entity, and 0 represents that the current word is not the head (tail) position of a head entity. The labeling process is as follows:

wherein, W_(·)Representing a matrix of training parameters, b_(·)Represents the offset vector and sigma represents the sigmoid activation function.

And

respectively representing the probability that the word at the ith position in the sentence is the head and the tail of the head entity.

The model will determine the binary label for each word based on the magnitude of the probability versus the threshold. For the case that the sentence contains a plurality of head entities, the module adopts a closest matching principle, namely each head position mark is matched with a closest tail position mark backwards, and the subsequence corresponding to the head position mark to the tail position mark is identified as the head entity.

The method adopts a minimum two-class cross entropy loss function to train model parameters, and comprises the following steps:

wherein, L represents the length of the sequence,

a real tag indicating that the ith character is at the start or end position.

Second, relation and tail entity extraction module

And step 3: unlike the step 2 of identifying the head entity, in addition to the encoding vector of the input sentence, the head entity feature needs to be additionally considered when extracting the relation and the tail entity. At present, simple addition or splicing methods are generally adopted in a joint model for fusion between features, and the method limits the expression of fused features.

Taking the correct triplet (chronic kidney disease, multiple population, old people) and the incorrect triplet (old people, multiple population, chronic kidney disease) obtained after reversing the head and tail entities in fig. 1 as examples, if the fusion method is used for the two triplets, the same fusion vector will be obtained, and for the unreasonable phenomenon, the direction information of the triplet needs to be additionally considered during feature fusion. In this regard, the present invention introduces a Conditional layer Normalization method cln (Conditional layer Normalization) with reference to the concept of Conditional Batch Normalization, and sets the corresponding bias and weight in the layer Normalization structure as a function of the condition to be fused. The specific calculation of CLN is as follows:

wherein the content of the first and second substances,

representing CLN method inputsMu represents a mean value of the feature information, sigma²Representing the variance of the feature information, e is a positive number tending to 0, gamma and theta are unconditional training parameters, c_γAnd c_βRespectively representing two input condition information to be fused, W₁And W₂The training matrix is the condition information to be fused.

As can be seen from the above formula, the CLN structure not only passes through W₁And W₂Training matrix is aligned with c_γ、c_βγ, β, and c_γAnd c_βAnd mapping the information to different vector spaces to learn the direction information of the condition to be fused.

As shown in FIG. 1, the invention uses CLN method to perform feature fusion on head entity feature and the coding vector output in step 1, and c in CLN method_γAnd c_βAre set as header entity codes. The feature fusion process is as follows:

H'＝CLN(H,h_head,h_head) (7)

h_head＝Concat(h_start,h_end) (8)

where H' represents the fused encoded vector for relational and tail entity extraction, H_headThe coding vector h corresponding to the initial and end positions of the entity extracted in step 1 and step 2 is expressed as the head entity coding_startAnd h_endAnd (4) splicing to obtain the product.

And 4, step 4: and extracting the tail entities possibly existing in the sentences according to the fusion characteristics under each predefined relationship. In contrast, the invention designs a global pointer network to judge the head and tail positions of the entity as a whole, rather than marking the head and tail positions of the entity separately, so that the model has a global view, and the consistency of model training and prediction targets is realized.

As shown in fig. 1, the global pointer network treats an input sentence with a length of n as n (n +1)/2 consecutive subsequences with different lengths, and represents the form of (i, j), where i represents a start position and j represents an end position. The goal of this module is to score each subsequence and identify the correct tail entity based on the score. Assuming that there are m relation classes in the dataset, the model will discriminate the subsequences under m relation subspaces respectively, which means that herein the relation and tail entity extraction task is converted into m "n (n +1)/2 k-by-k" multi-label classification tasks, where k represents the number of tail entities.

Firstly, the global pointer network uses two linear layers to perform linear transformation on the fusion coding vector H' to obtain a vector sequence

And

secondly, in order to enhance the sensitivity of the pointer network to the length and span of the tail entity, a relative position code RoPE is introduced, which satisfies the condition that

Applying the transformation matrix R to q^aAnd k^aVector is carried out; finally, by

And

inner product of(s)_α(i, j) representing subsequences from i to j as scores of complete tail entities, all subsequences with scores greater than a threshold being considered tail entities of the current head entity under the alpha relationship. The global pointer network labeling process is as follows:

wherein the content of the first and second substances,

a matrix of training parameters is represented that is,

representing a bias vector. And repeating the operations of the relation and the tail entity extraction for each extracted head entity to extract all possible triples in the sentence.

For the multi-label classification task, the conventional idea is to convert the multi-label classification task into n (n +1)/2 binary classification tasks, which will cause a serious problem of class imbalance, for this reason, the module introduces Circle Loss, so that the score of each entity subsequence is not less than that of a non-entity subsequence, and finally the k subsequences with the highest score are output. Aiming at the scene that the number of the current task tail entities is not fixed, an s is added on the basis of the current task tail entities₀The threshold value is used for determining the number of tail entities finally output by the module, so that the scores of the entity subsequences are all larger than s₀All non-entity subsequences being smaller than s₀Finally all the output values are greater than the threshold s₀A subsequence of (2). This step trains the model parameters by minimizing this loss function, as follows:

wherein, P_αDenotes all non-tail entity subsequences, Q, of the current head entity under the alpha-th relationship_αRepresenting all tail entity subsequences of the current head entity under the a-th relationship.

Third, relation classification module

And 5: after steps 1 through 4, the model may extract triples containing incorrect semantic relationships. In order to alleviate the problem, a relationship classification auxiliary task is added in the text, a [ CLS ] vector with global semantic information in a coding vector is used as a sentence vector to be input into the module, a potential semantic relationship in a sentence is identified, and a part of unreasonable triples in a model extraction result are filtered out, which is specifically as follows:

wherein the content of the first and second substances,

probability, W, of current sentence having kth relation_kRepresenting a matrix of parameters to be trained, b_kRepresents the offset vector and sigma represents the sigmoid activation function.

In the prediction phase of the model, if some relation k probability

And if the value is less than the given threshold, the sentence is considered to have no semantic relation k, and all triples with the relation k in the model extraction result are filtered out.

In this step, the model uses a two-class cross entropy loss function to train model parameters, as follows:

wherein the content of the first and second substances,

the true tags representing the kth relationship of the current sentence, and M represents the number of predefined relationships.

Step 6: the joint loss function of the present invention is obtained by adding the losses in steps 2, 4, 5 and learning the parameters of the overall model by minimizing the joint loss as follows:

Loss＝Loss_subject+Loss_object+Loss_rel (14)

the method uses an Adam algorithm training model, and adopts Exponential Moving Average (EMA) to ensure the stability of the training process.

Introduction to the Experimental data set

The invention uses a public data set CMeIE, the data set is derived from CHIP2020 academic evaluation competition and is jointly provided by ' Zhengzhou university ', ' Beijing university ', ' Pengcheng laboratory ' and ' Harbin industry university ' (Shenzhen '). The data are derived from pediatric and hundreds of common diseases (wherein the pediatric corpus is derived from 518 pediatric diseases, and the common disease corpus is derived from 109 common diseases), and are labeled with 2.8 ten thousand disease sentences, nearly 7.5 thousand triplet data and 53 relationship types. The data set comprises 14339 pieces of training set data, 3585 pieces of verification set data and 4482 pieces of test set data, and comprises 53 Schema constraints.

Zeng et al classified the sentences in the dataset into three categories, Normal, EntityparirOverlap (EPO) and SingleEntityOverlap (SEO), based on the degree of triple overlap. The Normal type indicates that the head entity or the tail entity of any triplet in the sentence has no overlap with the head and tail entities of other triplets. EPO type indicates that triplets in a sentence have overlapping pairs of entities, i.e., there are multiple relationships between two entities simultaneously. The SEO type indicates that there are no overlapping pairs of entities, but there are overlapping entities. CMeIE dataset samples are shown in table 1:

TABLE 1 CMeIE data set sample

Introduction to Experimental parameters

The test of the invention uses GTX 2080Ti video card operation code, the video memory is 11G, the test is carried out on a Linux centros platform, Python3.6/Keras 2.2.4/Tensorflow 1.1.14 is used, and the test parameter setting is shown in Table 2:

TABLE 2 hyper-parameters of the model

Evaluation index of experiment

The triples extracted by the method are only extracted correctly when the head-tail entity boundaries are correct and the corresponding relation categories are correct at the same time. The task evaluation index has the following three: precision (Precision), Recall (Recall), F1 value (F1-Score). The details are as follows:

wherein, TP represents the correct number of the extracted entity relationship triples, TP + FP represents the total number of the extracted triples, and TP + FN represents the total number of the triples labels.

Analysis of Experimental results

In order to verify the effectiveness of the method provided by the invention in improving the extraction effect of the triples, the method is compared with other joint extraction models on a CMeIE Chinese medical data set, as shown in Table 3:

InfoExtractor2.0 is a baseline model for a 2020 hundred degree information extraction competition. A structured markup strategy is designed to fine-tune the pre-trained language model, by which multiple overlapping SPOs can be extracted in one process.

2. The position auxiliary step-by-step marking model firstly determines a main entity by marking head and tail positions, and then marks corresponding guest entities under the relationship attributes preset one by one. In order to improve the extraction effect, triple position auxiliary information is introduced in the marking process, and a feature representation is enhanced by combining a self-attention mechanism.

And 3, replacing the LSTM coding layer by a Bert pre-training language model in the MultiHead Selection, and replacing the sequence label CRF with a pointer label according to the Nested NER problem. The model treats the relationship classifier as a linear classifier of entity pairs, and each entity pair selects only the last character of the current entity segment for relationship prediction.

Biaffine Attention uses Bert as a shared coding layer, combining coding vectors with entity tag embedding, using a Biaffine classifier to predict the relationship between tokens.

CasRel uses BERT as a shared coding layer, a cascaded binary markup framework uses a pointer network to extract relations and entities, first extracting head entities, then extracting tail entities according to the head entities and judging the relations.

Table 3 comparative experiment with existing extraction model

As can be seen from table 3, the performance of the method on the CMeIE data set is significantly improved compared with other joint extraction models, and the result analysis can obtain:

(1) the InfoExtractor2.0 adopts multiple sequences to jointly label the entities and the relations, can effectively identify the entity pair overlapping problem in the triplets, but has the condition of redundant matching of head and tail entities, and causes the extraction accuracy rate to be low.

(2) Comparing the results of MultiHead Selection and Biaffine Attention, it can be found that the mapping relationship from the head entity sequence to the tail entity sequence can be better captured by using the double affine Attention matrix than simply linearly changing the head entity sequence and the tail entity sequence, and the recall rate of the model is improved.

(3) Comparing the Biaffine Attention and CasRel results, it can be found that the relationship is regarded as a function of mapping the head entity to the tail entity instead of discrete labels between entity pairs, so that model training can be better guided, and the extraction performance of model triples can be improved.

Generally, the greater the number of triples in a sentence, the more complex the sentence structure. In order to explore the extraction performance of the model in sentences with different complexity degrees, the method provided by the invention performs experiments on sentences with different triple quantities on the verification set. The results of the experiment are shown in table 4. From experimental results, the extraction performance of the method on sentences with five complexity degrees is superior to that of a baseline model CasRel, and the evidence shows that the method can more effectively model sentences containing a plurality of triples and extract the triples.

In order to further explore the extraction performance of models with different overlapping types, the verification set of CMeIE is divided into three types of Normal, EPO and SEO, and the extraction performance of the CMeIE in three patterns is compared with that of a baseline model CasRel. The results of the experiment are shown in Table 5. Experimental results show that the extraction performance of the model is superior to that of a baseline model CasRel on three overlapped sentences, and the evidence shows that the model can effectively solve the problem of overlapped triples.

Table 4 experimental results of sentences containing different numbers of triples

TABLE 5 Experimental results on different overlapping types of sentences

In order to explore the influence degree of model components on the overall performance of the model, under the condition of controlling other structures and overall parameters of the model to be the same, the contribution of each component to the experimental result of the model in the CMeIE data set is evaluated by adding, deleting or replacing the model components, and 5 group comparison experiments are set in total, wherein the experiment results are respectively as follows:

ours: representing all components of a retention model

-NEZHA: replacement of NEZHA with Bert Pre-training language model

-global pointer network: replacing global pointer network with multi-layer binary pointer network

-cln (add): fusing header entity vectors and code vectors using conventional addition methods

-a relationship classification: removing relationship classification assistance tasks

The results are shown in Table 6:

table 6 model ablation experimental results

The results of the experiment were analyzed as follows:

and replacing NEZHA with a Bert pre-training language model, wherein the recall rate of the model is obviously reduced, which shows that the NEZHA pre-training language model can capture richer context information.

Replacing the global pointer network decoder with a multi-layer pointer network decoder, the overall performance of the model is reduced, and the global pointer decoder can enable the model to better capture the overall information of the tail entity and the mapping relationship between the relationship and the entity.

And replacing the condition layer normalization with a vector addition method, so that the overall performance of the model is reduced, the condition layer normalization method can learn the directional information of the triples, and the model is guided to fuse the head entity characteristics more effectively.

And the relation classification module is removed, so that the overall performance of the model is reduced, and the relation auxiliary task can not only enhance the learning of the model on sentence level characteristics, but also effectively remove wrong triples.

Claims

1. The entity relationship joint extraction method based on the global pointer network is characterized in that: the method comprises the following steps:

step 1: extracting the characteristics of the input sentence; global feature extraction is carried out on input sentences by using a NEZHA pre-training language model, deep semantic features are mined, and coding vectors with rich context information are obtained;

step 2: identifying all head entities in the sentence; respectively marking the coding vector obtained in the step (2) by using a pointer network to judge whether each word in the sentence is the head and tail positions of the entity, adopting a nearest matching principle, namely backwards matching each head position mark with a nearest tail position mark, and identifying the subsequence corresponding to the head position mark to the tail position mark as a head entity;

and step 3: a condition layer normalization method is introduced to fuse the coding vector and the head entity characteristics; setting the corresponding bias and weight in the layer normalization structure as a function of the characteristics of the head entity, and taking the obtained fusion vector as the input of the relation and tail entity extraction;

and 4, step 4: extracting tail entities of each head entity under a specific relationship; designing a global pointer network, dividing sentences into a plurality of continuous subsequences and scoring the subsequences according to the fusion vector output in the step 4 under each predefined relationship, and judging which subsequences are correct tail entities according to scores;

and 5: the model may extract triples containing wrong semantic relations, the [ CLS ] vector with global semantic information in the coding vector is used as a sentence vector, the relations are classified to identify potential semantic relations in the sentence, and therefore, some unreasonable triples in the extraction result are filtered;

modeling and solving an entity relation extraction task based on a labeling strategy method, wherein the method is shown as a formula (1):

2. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: step 1: extracting features of the input sentences by adopting a NEZHA pre-training language model, and obtaining corresponding coding vectors;

H＝NEZHA(X) (2)

Wherein X is (X)₁,x₂,…,χ_n) Representing the input sentence, n being the length of the input sentence, H ═ H₁,h₂,…,h_n]A code vector representing each position of the sentence.

3. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: step 2: decoding the encoded vector H directly to identify all possible head entities in the input sentence, training two binary pointer markers, marking the head/tail positions of the head entities by assigning binary tags (0/1) to each word, wherein 1 indicates that the current word is the head/tail of a certain head entity, and 0 indicates that the current word is not the head/tail position of a head entity; the labeling process is as follows:

wherein，W_(·)Representing a matrix of training parameters, b_(·)Representing a bias vector, and sigma representing a sigmoid activation function;

and

respectively representing the probability that the character at the ith position in the sentence is the head and the tail of the head entity;

the model determines the binary label of each word according to the size relation between the probability and the threshold value; for the situation that a sentence contains a plurality of head entities, adopting a nearest matching principle, namely, each head position mark is backwards matched with a nearest tail position mark, and identifying a subsequence corresponding to the head position mark to the tail position mark as a head entity;

the model parameters were trained using the minimized two-class cross-entropy loss function, as follows:

wherein, L represents the length of the sequence,

a real tag indicating that the ith character is at the start or end position.

4. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: and step 3: different from the step 2 of identifying the head entity, in addition to the coding vector of the input sentence, the characteristics of the head entity need to be additionally considered when extracting the relation and the tail entity;

introducing a conditional layer normalization method CLN, and setting corresponding bias and weight in a layer normalization structure as functions related to conditions to be fused; the specific calculation of CLN is as follows:

wherein S represents the characteristic information input by the CLN method, mu represents the mean value of the characteristic information, and sigma represents the average value of the characteristic information²Representing the variance of the feature information, e is a positive number tending to 0, gamma and beta are unconditional training parameters, c_γAnd c_βRespectively representing two input condition information to be fused, W₁And W₂A training matrix of condition information to be fused;

CLN structure not only passes through W₁And W₂Training matrix is aligned with c_γ、c_βγ, β, and c_γAnd c_βMapping the direction information of the learning condition to be fused in different vector spaces;

using CLN method to fuse the head entity character and the code vector output in step 1, and c in CLN method_γAnd c_βAre all set as head entity codes; the feature fusion process is as follows:

H'＝CLN(H,h_head,h_head) (7)

h_head＝Concat(h_start,h_end) (8)

5. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: and 4, step 4: extracting tail entities possibly existing in the sentences according to the fusion characteristics under each predefined relationship; designing a global pointer network to judge the head and tail positions of the entity as a whole, rather than marking the head and tail positions of the entity separately, so that the model has a global view, and consistency of model training and a prediction target is realized;

the global pointer network regards an input sentence with the length of n as n (n +1)/2 continuous subsequences with different lengths, and the expression is (i, j), wherein i represents a starting position, and j represents an ending position; scoring each subsequence, and judging a correct tail entity according to the score; assuming that m relation categories are in total in the data set, the model distinguishes sub-sequences under m relation subspaces respectively, and converts the relation and tail entity extraction tasks into m multi-label classification tasks of 'n (n + 1)/2-to-k', wherein k represents the number of tail entities;

And

And

inner product of(s)_α(i, j) representing the fraction of subsequences from i to j as complete tail entities, all subsequences with fraction greater than a threshold being considered tail entities of the current head entity under the alpha relationship; the global pointer network labeling process is as follows:

wherein the content of the first and second substances,

a matrix of training parameters is represented that is,

representing a bias vector; repeating the relation and tail entity extraction operation for each extracted head entity to extract all possible triples in the sentence;

introducing Circle Loss to ensure that the score of each entity subsequence is not less than that of a non-entity subsequence, and finally outputting k subsequences with the highest score; is added with an s₀The threshold value is used for determining the number of tail entities finally output by the module, so that the scores of the entity subsequences are all larger than s₀All non-entity subsequences being smaller than s₀Finally all the output values are greater than the threshold s₀A subsequence of (a); the model parameters were trained by minimizing this loss function, as follows:

6. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: and 5: after steps 1 to 4, the model may extract triples containing incorrect semantic relationships; a relation classification auxiliary task is added, a [ CLS ] vector with global semantic information in a coding vector is used as a sentence vector and is input into the module, a potential semantic relation in a sentence is identified, and a part of unreasonable triples in a model extraction result are filtered out, and the relation classification auxiliary task is specifically as follows:

wherein the content of the first and second substances,

probability, W, of current sentence having kth relation_kRepresenting a matrix of parameters to be trained, b_kRepresenting a bias vector, and sigma representing a sigmoid activation function;

in the prediction phase of the model, if some relation k probability

If the semantic relation k is smaller than the given threshold, the sentence is considered to have no semantic relation k, and all triples with the relation k in the model extraction result are filtered out;

the model uses a two-class cross entropy loss function to train model parameters as follows:

wherein the content of the first and second substances,

7. The entity relationship joint extraction method based on the global pointer network as claimed in claim 1, wherein: step 6: the joint loss function is obtained by adding the losses in steps 2, 4, 5 and learning the parameters of the overall model by minimizing the joint loss as follows:

Loss＝Loss_subject+Loss_object+Loss_rel (15)

the model is trained by using the Adam algorithm, and exponential moving average is adopted to ensure the stability of the training process.