CN114417846B

CN114417846B - Entity relation extraction method based on attention contribution degree

Info

Publication number: CN114417846B
Application number: CN202111410469.9A
Authority: CN
Inventors: 欧阳建权; 张晶; 李波
Original assignee: Hunan Hailong International Intelligent Technology Co ltd; Xiangtan University
Current assignee: Hunan Hailong International Intelligent Technology Co ltd; Xiangtan University
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2023-12-19
Anticipated expiration: 2041-11-25
Also published as: CN114417846A

Abstract

The invention provides an entity relation extraction method based on attention contribution, which comprises the steps of segmenting an original sentence in a dataset through spaCy to obtain a word list, and storing the word list and a label into an input dataset D in a dictionary form; then sampling the input data set D to obtain an entity sample set and a relation sample set of each sentence in the input data set D; selecting a BERT model pre-trained on a large-scale biomedical corpus, a judicial database and a travel database, calculating interaction information among entities by adopting an attention contribution algorithm, and transmitting the interaction information to downstream entity extraction and relationship extraction tasks to form a span-based entity relationship extraction model; and finally, placing the entity sample set and the relation sample set into a span-based entity relation extraction model for training, and greatly improving the F1 value of the entity extraction task and the F1 value of the relation extraction task.

Description

Entity relation extraction method based on attention contribution degree

Technical Field

The invention relates to the field of knowledge extraction, in particular to a physical relationship extraction method based on attention contribution degree and a medical report analysis method based on the attention contribution degree.

Background

In the field of natural language processing, information extraction has been attracting attention, and the information extraction mainly comprises 3 subtasks: entity extraction, relation extraction and event extraction, wherein relation extraction is a core task and an important link in the information extraction field. The main objective of entity relation extraction is to identify and judge specific relations existing between entity pairs from natural language texts, which provides basic support for intelligent retrieval, semantic analysis and the like, is helpful for improving search efficiency and promotes automatic construction of a knowledge base.

The types of relationships involved in the entity relationship extraction of the initial MUC, ACE evaluating conference are limited to a few types of entity relationships between named entities (including person names, place names, organization names, etc.), such as employment relationships, geographic location relationships, person-to-social organization relationships, etc. The evaluation task of SemEval-2007 defines the physical relationship between 7 common nouns or noun phrases, but provides a smaller english corpus scale. The evaluation task of SemEval-2010 is enriched and perfected, and the entity relationship types are expanded to 9 types.

The entity relation corpus issued by UC, ACE and SemEval evaluation conference is obtained by means of manual labeling, namely, domain experts firstly prepare relation type systems and labeling rules, and then judge and screen one by one from large-scale texts. The method consumes a great deal of manpower, has higher cost and is difficult to expand the corpus. In addition, the entity relation corpus obtained by the method has narrow coverage and single sentence instance form.

The entity relationship of the text in the specific field is complex, and has certain requirements on the professional literacy of labeling personnel, so that an automatic entity relationship extraction technology is of great importance. The entity relation extraction research mainly comprises a sequence labeling scheme and a span-based scheme, but the current research has the problems of overlapping relation, entity nesting, large calculated amount, insufficient information mining in specific fields and the like.

The attention contribution degree algorithm is used, the span-based strong negative sample joint extraction method proposed by the SpERT model is used for reference, interaction information between words carried by BERT attention heads trained by a data set in a specific field is fully utilized, and the effect that the F1 value reaches 82.76% on the ADE of a medical report is achieved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide an entity relation extraction method based on attention contribution, which adopts a BERT model pre-trained by a large-scale data set, calculates interaction information between entities by adopting an attention contribution algorithm, and transmits the interaction information to a downstream task, thereby greatly improving the F1 value of the entity extraction task and the F1 value of the relation extraction task. And is particularly suited for analysis of data sets in a particular field, such as medical reports.

According to a first embodiment of the present invention, there is provided a method for extracting an entity relationship based on attention contribution.

An attention contribution degree-based entity relationship extraction method, comprising the following steps:

s0: selecting a data set D';

s1: the original sentences in the data set D' are segmented through the spaCy to obtain a word list; storing the word list and the tags contained in the data set D' in the input data set D in a dictionary form;

s2: sampling the input data set D to obtain each sentence D in the input data set D _i (d _i E) entity sample setAnd relation sample set->

S3: the method comprises the steps of constructing a span-based entity relation extraction model, wherein the entity relation extraction model specifically comprises a BERT pre-training module, an entity extraction module and a relation classification module;

s4: from a set of entity samples for each sentenceCalculate each sentence d _i Each entity of->Feature vector +.>And attention contribution->Feature vector +.>Attention contribution degree->Combining and inputting the entity extraction modules to obtain the predicted entity type entity of the entity extraction modules _ij 。

S5: according to each relation sample setCalculate each sentence d _i Is>Corresponding head entity->And tail entity->Feature vector +.>Context information->Attention contribution degreeInputting the relation classification module to obtain the predicted relation type relation of the relation classification module _ij ；

S6: training a span-based entity relation extraction model to obtain an attention contribution-based entity relation extraction method.

In the present invention, the dataset D' includes original sentences and tags.

In the present invention, the data set D' is a medical report data set, a judicial data set, or a travel database.

In the present invention, the data set D' is preferably a drug adverse effect (ADE) data set from a benchmark corpus created by gurulingppa.

Alternatively, the judicial dataset is from a China judicial archive database or a CourData judicial field base database. The travel database is from a Rui Si data-travel database.

In the present invention, step S1 specifically includes:

s101: dividing the original sentences in the data set D' into words through the spaCy to obtain a word list; storing the word list after word segmentation into a dictionary dic with keys as token;

s102: the tags in the dataset include entities and entity relationships, the entities being comprised of one or more words; storing the entity type of each entity and start and end of the entity in the keys of the dictionary dic through the dictionary, storing the entity type and the start and end of the entity in the form of an entity element in an entity list, and storing the entity list in the dictionary dic with keys as entries; storing the relation type of each entity relation and the index positions head and tail of the head and tail entities in the items of the dictionary dic through the dictionary, storing the relation type and the index positions tail and tail of each entity relation into a relation list in the form of a relation element, and storing the relation list into the dictionary dic taking the relation as the relation;

s103: one dictionary dic constitutes one sample, a plurality of dictionaries constitutes an input data set D, and the input data set D is formed in a list form and stored in a json file.

In the present invention, step S2 specifically includes:

s201: and (3) entity sampling: selecting all possible entities consisting of 1-10 words to form an entity sampling pool, wherein 100 entity negative samples are randomly selected to be combined with entity positive samples in the entity sampling pool, and each entity sample (including the entity negative samples and the entity positive samples) is combined in a sentence d _i The location information, entity type and word number of the entity constitute an entity sample setThe entity positive samples are entities contained in the data set D, and the entity negative samples are entities which are randomly generated in the entity sampling pool and do not belong to the entity positive samples;

s202: and (3) relation sampling: integrating entity samplesThe entities in the sentence are paired in pairs, 100 relationship negative samples are randomly selected to be combined with a relationship positive sample, and the position information of the head and tail entities of each relationship sample (comprising the relationship negative sample and the relationship positive sample) in the original sentence and the relationship type form a relationship sample set->The positive relation sample is the head and tail entity of the existing relation in the data set D, and the negative relation sample is the head and tail entity without relation.

In the present invention, step S3 specifically includes:

s301: BERT pre-training module: using BERT-base-based pre-trained model BioBERT according to input sentence d _i Providing downstream tasks with respective entities e from last hidden and items capture of the BERT pretraining model _ij Semantic features of (a)

S302: size reducing: for the Embedding layer to learn an entity e consisting of different word numbers _ij Features of (2)

S303: entity extraction module entity classification: sequentially a dropout layer, a full connection layer and a softmax layer to obtain the predicted entity type entity of the entity extraction module _ij ；

S304: relationship classification module relation classification: sequentially a dropout layer, a full connection layer and a sigmod layer to obtain a predicted relationship type relation of the relationship classification module _ij 。

In the present invention, step S4 specifically includes:

s401: calculate each sentence d _i Each of the entities of (a)Feature vector +.>The formula of (2) is as follows:

wherein token _n For sentence d _i Words in (i.e. token) _n ∈d _i ) The method comprises the steps of carrying out a first treatment on the surface of the cls is sentence d _i The special classification words which are not maximally pooled are embedded;for entity e _ij Length feature vectors of (a);Representation ofEntity e _ij Is defined by the intermediate feature vector of (a);Expressed as implicit feature vector +.>Is a mask matrix of (a);

s402: calculate each sentence d _i Each entity e of _ij Is the physical attention contribution degree of (1)The formula of (2) is as follows:

wherein:representing the constituent entity e _ij Attention matrix of all words of (a);Representing attention matrix->In represents token _m0 Attention matrix of (2), and-> Representing token _m0 A maximally pooled attention scalar; token (token) _m0 For entity e _ij Words corresponding to position m, and +.>θ ₁ The contribution threshold is specifically 0.4-0.8;

s403: each entity sample setCalculated entity feature vector +.>Attention contribution degree->Combining, inputting the entity extraction module entity classification to obtain the predicted entity type entity of the entity extraction module _ij The formula is as follows:

in the present invention, step S5 specifically includes:

s501: calculate each sentence d _i Each relation of (3)Corresponding head entity->Feature vector +.>The calculation formula is as follows:

wherein token _n For sentence d _i Words in (i.e. token) _n ∈d _i ) The method comprises the steps of carrying out a first treatment on the surface of the cls is sentence d _i The special classification words which are not maximally pooled are embedded;expressed as head entity->Is defined by the intermediate feature vector of (a);For head entity->Length feature vectors of (a);expressed as implicit feature vector +.>Is a mask matrix of (a);

calculate each sentence d _i Each relation of (3)Corresponding tail entity->Feature vectors of (a)The calculation formula is as follows:

wherein token _n For sentence d _i Words in (i.e. token) _n ∈d _i ) The method comprises the steps of carrying out a first treatment on the surface of the cls is sentence d _i The special classification words which are not maximally pooled are embedded;expressed as tail entity->Is defined by the intermediate feature vector of (a);For tail entity->Length feature vectors of (a);expressed as implicit feature vector +.>Is a mask matrix of (a);

calculate each sentence d _i Each relation of (3)Corresponding feature vector>Specifically, the head entity feature vector corresponding to each relation>And tail entity eigenvector->Combining:

s502: calculate each sentence d _i Each relation of (3)Corresponding head entity->Attention contribution degree->The calculation formula is as follows:

wherein:representing the composition of the head entity->Attention matrix of all words of (a);Representing an attention matrixIn represents token _m1 Attention matrix of (2), and-> Representing token _m1 A maximally pooled attention scalar; token (token) _m1 For head entity->Words corresponding to position m, and +.>θ ₁ The contribution threshold is specifically 0.4-0.8;

calculate each sentence d _i Each relation of (3)Corresponding tail entity->Attention contribution degree->The calculation formula is as follows:

wherein:representing the constituent tail entities->Attention matrix of all words of (a);Representing an attention matrixIn represents token _m2 Attention matrix of (2), and-> A focus scalar representing a maximum pooling of token nm; tiken _m2 For tail entity->Words corresponding to position m, and +.>θ ₁ The contribution threshold is specifically 0.4-0.8;

calculate each sentence d _i Each relation of (3)Corresponding attention contribution degree->Specifically, the attention contribution degree of the head entity corresponding to each relation is->And tail entity attention contribution +.>Combining:

s503: contextual informationThe calculation formula of (2) is as follows:

wherein the method comprises the steps ofTo be in sentence d _i Middle located head entity->And tail entity->A last hidden vector of words in between;

s504: each relation sample setCalculated feature vector +.>Attention contribution degree->And context information->Combining the input relation extraction module relation classification to obtain the predicted relation type relation of the relation classification module _ij The formula is as follows:

in the present invention, a joint loss function l=wl is defined in step S6 ^e +(1-w)L ^r Training a physical relationship extraction model; wherein: w is a weight, L ^e Cross entropy loss function for entity extraction module, L ^r And a binary cross entropy loss function of the relation classification module.

In the present invention, step S6 specifically includes:

s601: setting the loss function to l=wl using Adam optimizer ^e +(1-w)L ^r ；

S602: setting an evaluation standard as a micro F1 value, and training an entity relation extraction model;

if the relation type of the model prediction, the types of the two related entities and the span are consistent with the labels, the prediction is considered to be correct;

a model prediction is considered erroneous if its relationship type, the type of two related entities, and the span are inconsistent with the label.

According to a second embodiment of the present invention, there is provided a use of an entity relationship extraction method based on attention contribution.

An attention contribution based entity relationship extraction method of the first embodiment is used to analyze medical reports or judicial decisions or travel data reports.

Preferably, the method is used for analyzing medical reports of adverse drug reactions.

The invention provides an entity relation extraction method based on attention contribution, which comprises the steps of segmenting an original sentence in a dataset through spaCy to obtain a word list, and storing the word list and a label into an input dataset D in a dictionary form; then sampling the input data set D to obtain an entity sample set and a relation sample set of each sentence in the input data set D; selecting a BERT model pre-trained on a large-scale biomedical corpus, calculating interaction information between entities by adopting an attention contribution algorithm, and transmitting the interaction information to downstream entity extraction and relationship extraction tasks to form a span-based entity relationship extraction model; and finally, placing the entity sample set and the relation sample set into a span-based entity relation extraction model for training, and greatly improving the F1 value of the entity extraction task and the F1 value of the relation extraction task.

The method is aimed at a data set (especially the data set in a specific field), and the data set is preprocessed (sentence word segmentation, and a label and word sequences after word segmentation are stored in a dictionary); then positive sampling and negative sampling are carried out; constructing a span-based entity relation extraction model (comprising a BERT pre-training module, an entity extraction module and a relation classification module); and training to obtain the entity relation extraction method based on the attention contribution degree.

In the present invention, maxpool is derived from the torch.nn.maxpool2d function of the pyrtorch library, i.e. solving for maximum pooling. And inputting the vector or matrix to be solved into the function to obtain the corresponding vector.

In the present invention, the implicit feature vector Output [ last hidden ] is derived from the hidden_states of the transformers. Bertmodel function Output of the transformers library. And inputting the entity to be solved into the function to obtain the corresponding vector.

In the present invention, output [ atttures ] originates from atttures Output by the transformers. BertModel functions of the transformers library. And inputting the vector or matrix to be solved into the function to obtain the corresponding vector.

In the present invention, the average is derived from the torch.mean function of the pyrtorch library, i.e., solving the arithmetic square. The vector or matrix is input into the function to obtain the corresponding solution operand square.

In the present invention, entity classification employs entity extraction module functions in a span-based entity relationship extraction model.

In the present invention, relation classification employs a relationship classification module function in a span-based entity relationship extraction model.

The invention has the following beneficial technical effects:

1. the attention-based contribution algorithm adopted by the invention fully utilizes the interaction information between words carried by the BERT attention head pre-trained by the data set in the specific field, thereby greatly enhancing the mining of the model on the information in the specific field;

2. by adopting the entity relation extraction method, the entity relation can be better migrated to various data sets in specific fields, such as judicial, travel, medical science, news, natural science and the like, through fine tuning;

3. the method provided by the invention is used for analyzing the medical report of the adverse reaction of the medicine, and compared with the SpBert, CMAN, table-Sequence and other advanced methods, the micro F1 value is improved by at least 1.31%.

Drawings

FIG. 1 is a flow chart showing the steps of a method for extracting entity relationships based on attention contribution degree according to the present invention;

FIG. 2 is a span-based entity relationship extraction model according to the present invention.

Detailed Description

The following examples illustrate the technical aspects of the invention, and the scope of the invention claimed includes but is not limited to the following examples.

Example 1

s0: selecting a data set D';

s1: word segmentation is carried out on the original sentences through space, and the word list and the labels contained in the data set D' are stored in the input data set D in a dictionary form

S2: sampling the data set D to obtain each sentence D in the input data set D _i (d _i E) entity sample setClosing deviceSample set of lines->

s4: from each entity sample setCalculate each sentence d _i Is->Feature vectors of (a)And entity attention contribution +.>The entity feature vector->Attention contribution degree->Combining and transmitting the entity extraction modules to obtain the predicted entity type entity of the entity extraction modules _ij 。

S5: according to each relation sample setCalculate each sentence d _i Every relation->Corresponding head entity->And tail entity->Feature vector +.>And contextual information->Attention contribution of head entity->Tail entity attention contribution +.>Combining the relationship extraction modules to obtain the predicted relationship type relation of the relationship classification module _ij ；

S6: definition of the joint loss function l=wl ^e +(1-w)L ^r Training a span-based entity relation extraction model to obtain an attention contribution-based entity relation extraction method.

Application examples

The method of example 1 was employed wherein: the data set D' is used for selecting a medical report of adverse reaction generated by using a certain antibacterial drug as the data set; the data set is from a benchmark corpus created by Guulingappa.

The specific operation of converting the original sentence and the tag into the input data set D described in step S1 is as follows:

(1) Dividing the original sentences in the data set D' into words through space to obtain a word list; storing the word list after word segmentation into a dictionary dic with keys as token;

(2) The tags in the dataset include entities and entity relationships, the entities being comprised of one or more words; storing the entity type (type) of each entity and the start and end index positions (start and end) of the entity marks in the keys of the dictionary dic through the dictionary, storing the entity list in the form of one entity element, and storing the entity list in the dictionary dic with keys as entries; storing the relation type (type) of each relation and the index position (head, tail) of the head and tail entity in the items of the dictionary dic through the dictionary, storing the relation list in the form of a relation element, and storing the relation list in the dictionary dic with the key as the relation;

(3) One dictionary dic constitutes one sample, a plurality of dictionaries constitutes an input data set D, and the input data set D is formed in a list form and stored in a json file.

The specific form of the preprocessed data set D is as follows:

{"tokens":

["Intravenous","azithromycin","-","induced","ototoxicity","."],

"entities":

[{"type":"Adverse-Effect","start":4,"end":5},

{"type":"Drug","start":1,"end":2}],

"relations":

[{"type":"Adverse-Effect","head":0,"tail":1}],

"orig_id":0},

{"tokens":

["Immobilization",",","while","Paget","'s","bone","disease","was","present",",","and","perhaps","enhanced","activation","of","dihydrotachysterol","by","rifampicin",",","could","have","led","to","increased","calcium","-","release","into","the","circulation","."],

"entities":

[{"type":"Adverse-Effect","start":23,"end":27},

{"type":"Drug","start":15,"end":16}],

"relations":

[{"type":"Adverse-Effect","head":0,"tail":1}],

"orig_id":1}

the specific operation of sampling the data set D in step S2 is as follows:

(1) Sampling the entity, selecting all possible entities consisting of 1-10 words to form an entity sampling pool, randomly selecting 100 entity negative samples (entity type is no entity) from the entity sampling pool to be combined with entity positive samples, and combining the entity negative samples with the entity positive samplesSample in sentence d _i The location information, entity type and word number of the entity constitute an entity sample setThe entity positive samples are entities contained in the data set D, and the entity negative samples are entities which are randomly generated in the entity sampling pool and do not belong to the entity positive samples;

(2) Relational sampling, sampling entity sample setThe entities in the two-to-two pairing are combined with 100 relation negative samples (the relation type is None) and the relation positive samples to form a relation sample set +.>Relation sample set->The method comprises the basic information of the head and tail entities of each relation in the original sentence, such as the relation type and the like; the positive relation sample is the head and tail entity of the existing relation in the data set D, and the negative relation sample is the head and tail entity without relation.

The specific structure of the span-based entity relationship extraction model (as shown in fig. 2) in step S3 is as follows:

(1) The BERT pre-training module adopts a BERT-base-based pre-trained model BioBERT according to an input sentence d _i Providing downstream tasks with last hitden and items captured entities e from the BERT pre-training model _ij Semantic features of (a)

(2) size reducing, which is an Embedding layer, with num_reducing being 100 and reducing_dim being 25, to learn an entity e consisting of different word numbers _ij Features of (2)

(3) The entity extraction module entity classification sequentially comprises a dropout layer (prop_drop is 0.1), a full connection layer and a softmax layer, wherein the classifier weight is initialized by adopting a normal distributed random number with a mean value of 0 and a variance of 0.02 to obtain the predicted entity type entity of the entity extraction module _ij ；

(4) The relation classification module relation classification sequentially comprises a dropout layer (prop_drop is 0.1), a full connection layer and a sigmod layer, wherein the classifier weight is initialized by adopting a normal-too-distributed random number with the mean value of 0 and the variance of 0.02, and the prediction relation type entity of the entity relation classification module is obtained _ij 。

The specific operation of inputting data by the entity extraction module in step S4 is as follows:

(1) Calculate each sentence d _i Each of the entities of (a)Feature vector +.>The formula of (2) is as follows:

wherein token _n For sentence d _i Words in (i.e. token) _n ∈d _i ) The method comprises the steps of carrying out a first treatment on the surface of the cls is sentence d _i The special classification words which are not maximally pooled are embedded;for entity e _ij Length feature vectors of (a);Representing entity e _ij Is defined by the intermediate feature vector of (a);Expressed as implicit feature vector +.>Is a mask matrix of (a);

(2) Calculate each sentence d _i Each entity e of _ij Is the physical attention contribution degree of (1)The method comprises the following steps:

a. extraction entity e _ij All corresponding token _m0 In sentence d _i Attention moment array in (a)The formula is as follows:

b. each token is provided with _m0 Attention moment array of (a)Conversion to attention scalar +.>The formula is as follows:

c. each token is provided with _m0 In greater than the threshold value theta ₁ All of the attention of (3)Force scalarObtaining each entity e by averaging _ij The formula is as follows:

wherein:representing the constituent entity e _ij Attention matrix of all words of (a);Representing attention matrix->Attention matrix indicated in (a), and +.> Representing token _m0 A maximally pooled attention scalar; token (token) _m0 For entity e _ij Words corresponding to position m, and +.>θ ₁ A contribution threshold, specifically 0.5;

(3) Each entity sample setCalculated entity feature vector +.>Attention contribution degree->Splicing in the last dimension->The incoming entity extraction module entity classification obtains the predicted entity type entity of the entity extraction module _ij The formula is as follows:

the specific operation of inputting data by the relationship classification module in step S5 is as follows:

(1) From a set of relational samplesCalculate each sentence d _i Every relation->Corresponding header entityFeature vector +.>The calculation formula is as follows:

(2) Calculate each sentence d _i Each relation of (3)Corresponding head entity->Attention contribution degree->The calculation formula is as follows:

wherein:representing the composition of the head entity->Attention matrix of all words of (a);Representing an attention matrixIn represents token _m1 Attention matrix of (2), and-> Representing token _m1 A maximally pooled attention scalar; token (token) _m1 For head entity->Words corresponding to position m, and +.>θ ₁ A contribution threshold, specifically 0.5;

wherein:representing the constituent tail entities->Attention matrix of all words of (a);Representing attention matrix->In represents token _m2 Attention matrix of (2), and-> Representing token _m2 A maximally pooled attention scalar; token (token) _m2 For tail entity->Words corresponding to position m, and +.>θ ₁ A contribution threshold, specifically 0.5;

calculate each sentence d _i Each relation of (3)Corresponding attention contribution degree->Specifically, the attention contribution degree of the head entity corresponding to each relation is->And tail entity attention contribution +.>And (3) splicing and combining:

(3) From a set of relational samplesPosition information of head and tail entities in the original sentence, head entity +.>And tail entity->The last hidden vector of the word in between is maximally pooled to obtain context information +.>The formula is as follows:

wherein the method comprises the steps ofFor each relationship r _ij Corresponding head entity->And tail entity->In sentence d _i Middle located head entity->And tail entity->A last hidden vector of words in between;

(4) Each relation sample setCalculated feature vector +.>Attention contribution degree->And context information->Splicing in the last dimension->The incoming entity extraction module relation classification obtains the predicted relationship type relationship of the relationship classification module _ij The formula is as follows:

the specific operation of the training model in step S6 is as follows:

(1) Adopts an Adam optimizer, and the learning rate is 5 multiplied by 10 ^-5 The method comprises the steps of carrying out a first treatment on the surface of the Setting the loss function to l=wl ^e +(1-w)L ^r Wherein L is ^e Cross entropy loss function for entity extraction module, L ^r The weight w is 0.5 for the binary cross entropy loss function of the relation classification module;

(2) Wherein the evaluation standard is micro F1 value, and if the relationship type of the model prediction, the types and spans of two related entities are consistent with the labels, the prediction is considered to be correct;

(3) The training set has a batch_size of 6 and an epochs of 30.

By using the method provided in the above embodiment, ADE is now used as a data set, a data set extracted from medical reports describing adverse effects of the use of a certain antibacterial drug, which contains 4272 sentences, and the micro F1 value of an attention contribution-based entity relationship extraction method of the present invention is calculated by using the following test method.

Wherein: a=the number of correctly classified relationship instances in the medical report of adverse effects caused by the use of the antibacterial drug, b is the total number of relationship instances in the medical report determined to be adverse effects caused by the use of the antibacterial drug, and c is the total number of relationship instances in the input data set D.

Through calculation, the micro F1 value of the medical report for analyzing the adverse drug reaction by adopting the entity relation extraction method based on the attention contribution degree provided by the invention is 82.76, and compared with the medical report for analyzing the adverse drug reaction by adopting the SpBert, CMAN, table-Sequence and other advanced methods, the micro F1 value of the invention is improved by 1.31%. The entity relation extraction model trained based on the attention contribution method provided by the embodiment can effectively capture word-to-word interaction information in the BERT model pre-trained by the data set in the specific field, and the interaction information is combined into the span-based sample code, so that the model is helped to know the context of an entity described by a sentence, and the effect is better than that of the traditional BERT-based entity relation extraction model.

Claims

1. An attention contribution degree-based entity relationship extraction method, comprising the following steps:

s0: selecting a data set D';

s2: sampling the input data set D to obtain each sentence D in the input data set D _i Is a set of entity samples of (1)And relation sample set->

s4: from a set of entity samples for each sentenceCalculate each sentence d _i Each entity e in (a) _ij Feature vector +.>And attention contribution->Feature vector +.>Attention contribution degree->Combining and inputting the entity extraction modules to obtain the predicted entity type entity of the entity extraction modules _ij The method comprises the steps of carrying out a first treatment on the surface of the Wherein: calculate each sentence d _i Each entity e of _ij Feature vector +.>The formula of (2) is as follows:

wherein token _n For sentence d _i Words of (a); cls is sentence d _i The special classification words which are not maximally pooled are embedded;for entity e _ij Length feature vectors of (a);Representing entity e _ij Is defined by the intermediate feature vector of (a);Expressed as implicit feature vectorsIs a mask matrix of (a);

calculate each sentence d _i Each entity e of _ij Is the physical attention contribution degree of (1)The formula of (2) is as follows:

s5: according to each relation sample setCalculate each sentence d _i Each relation r of (2) _ij Corresponding head entity->And tail entityFeature vector +.>Context information->Attention contribution degree->Inputting the relation classification module to obtain the predicted relation type relation of the relation classification module _ij The method comprises the steps of carrying out a first treatment on the surface of the Wherein: calculate each sentence d _i Each relation r of (2) _ij Corresponding feature vectorSpecifically, the head entity feature vector corresponding to each relation>And tail entity eigenvector->Combining:calculate each sentence d _i Each relation r of (2) _ij Corresponding attention contribution degree->Specifically, the attention contribution degree of the head entity corresponding to each relation is->And tail entity attention contribution +.>Combining:

2. The attention contribution based entity relationship extraction method of claim 1, wherein: the dataset D' includes original sentences and tags.

3. The attention contribution based entity relationship extraction method of claim 1, wherein: the data set D' is a medical report data set, a judicial data set or a travel database.

4. The attention contribution based entity relationship extraction method of claim 3, wherein: the data set D' is an adverse drug reaction data set, and the adverse drug reaction data set is from a benchmark corpus created by Guulingappa.

5. The attention contribution based entity relationship extraction method of claim 3, wherein: the judicial data set is from a Chinese judicial archive database or a CourData judicial field basic database; the travel database is from a Rui Si data-travel database.

6. The attention contribution based entity relationship extraction method of any of claims 1-5, wherein: the step S1 specifically comprises the following steps:

7. The attention contribution based entity relationship extraction method of any of claims 1-5, wherein: the step S2 specifically comprises the following steps:

s201: and (3) entity sampling: selecting all possible entities consisting of 1-10 words to form an entity sampling pool, randomly selecting 100 entity negative samples and entity positive samples in the entity sampling pool, wherein each entity sample is in sentence d _i The location information, entity type and word number of the entity constitute an entity sample setThe entity positive samples are entities contained in the data set D, and the entity negative samples are entities which are randomly generated in the entity sampling pool and do not belong to the entity positive samples;

s202: and (3) relation sampling: integrating entity samplesThe entities in the sentence pair by pair, 100 relationship negative samples are randomly selected to be combined with the relationship positive samples, and the position information of the head and tail entities of each relationship sample in the original sentence and the relationship type form a relationship sample set +.>The positive relation sample is the head and tail entity of the existing relation in the data set D, and the negative relation sample is the head and tail entity without relation.

8. The attention contribution based entity relationship extraction method of any of claims 1-5, wherein: the step S3 specifically comprises the following steps:

s301: BERT pre-training module: using a model BioBERT based on BERT-base-based pre-training,according to the input sentence d _i Providing downstream tasks with respective entities e from last hidden and items capture of the BERT pretraining model _ij Semantic features of (a)

S302: size reducing: for the Embedding layer to learn an entity e consisting of different word numbers _ij Length feature vector of (a)

9. The attention contribution based entity-relationship extraction method of claim 8, wherein: each entity sample setCalculated entity feature vector +.>Attention contribution degree->Combining, inputting the entity extraction module entity classification to obtain the predicted entity type entity of the entity extraction module _ij The formula is as follows:

10. the attention contribution based entity-relationship extraction method of claim 8, wherein: the step S5 specifically comprises the following steps:

s501: calculate each sentence d _i Each relation r of (2) _ij Corresponding header entityFeature vector +.>The calculation formula is as follows:

wherein token _n For sentence d _i Words of (a); cls is sentence d _i The special classification words which are not maximally pooled are embedded;expressed as head entity->Is defined by the intermediate feature vector of (a);For head entity->Length feature vectors of (a);Expressed as implicit feature vector +.>Is a mask matrix of (a);

calculate each sentence d _i Each relation r of (2) _ij Corresponding tail entityFeature vector +.>The calculation formula is as follows:

wherein token _n For sentence d _i Words of (a); cls is sentence d _i The special classification words which are not maximally pooled are embedded;expressed as tail entity->Is defined by the intermediate feature vector of (a);For tail entity->Length feature vectors of (a);Expressed as implicit feature vector +.>Is a mask matrix of (a);

calculate each sentence d _i Each relation r of (2) _ij Corresponding feature vectorSpecifically, the head entity feature vector corresponding to each relation>And tail entity eigenvector->Combining:

s502: calculate each sentence d _i Each relation r of (2) _ij Corresponding header entityAttention contribution degree->The calculation formula is as follows:

wherein:representing the composition of the head entity->Attention matrix of all words of (a);Representing attention matrix->In->Attention matrix of (2), and-> Representation->A maximally pooled attention scalar;For head entity->Words corresponding to position m, and +.>θ ₁ The contribution threshold is specifically 0.4-0.8;

calculate each sentence d _i Each relation r of (2) _ij Corresponding tail entityAttention contribution degree->The calculation formula is as follows:

wherein:representing the constituent tail entities->Attention matrix of all words of (a);Representing an attention matrixIn->Attention matrix of (2), and-> Representation->A maximally pooled attention scalar;For tail entity->Words corresponding to position m, and +.>θ ₁ The contribution threshold is specifically 0.4-0.8;

calculate each sentence d _i Each relation r of (2) _ij Corresponding attention contribution degreeSpecifically, the attention contribution degree of the head entity corresponding to each relation is->And tail entity attention contribution +.>Combining:

s503: contextual informationThe calculation formula of (2) is as follows:

wherein:to be in sentence d _i Middle located head entity->And tail entity->A last hidden vector of words in between;

s504: each relation sample setCalculated feature vector +.>Attention contribution degree->And context informationCombining the input relation extraction module relation classification to obtain the predicted relation type relation of the relation classification module _ij The formula is as follows:

11. the attention contribution based entity relationship extraction method of any of claims 1-5, wherein: defining a joint loss function l=wl in step S6 ^e +(1-w)L ^r Training a physical relationship extraction model; wherein: w is a weight, L ^e Cross entropy loss function for entity extraction module, L ^r And a binary cross entropy loss function of the relation classification module.

12. The attention contribution based entity-relationship extraction method of claim 11, wherein: the step S6 specifically comprises the following steps:

s601: setting the loss function to l=wl using Adam optimizer ^e +(1-w)L ^r ；

13. The attention contribution based entity relationship extraction method of any of claims 1-5, 9-10, 12, wherein: the entity relationship extraction method is used for analyzing medical reports, judicial judgement books or travel data analysis reports.

14. The attention contribution based entity-relationship extraction method of claim 6, wherein: the entity relationship extraction method is used for analyzing medical reports, judicial judgement books or travel data analysis reports.

15. The attention contribution based entity-relationship extraction method of claim 7, wherein: the entity relationship extraction method is used for analyzing medical reports, judicial judgement books or travel data analysis reports.

16. The attention contribution based entity-relationship extraction method of claim 8, wherein: the entity relationship extraction method is used for analyzing medical reports, judicial judgement books or travel data analysis reports.

17. The attention contribution based entity-relationship extraction method of claim 11, wherein: the entity relationship extraction method is used for analyzing medical reports, judicial judgement books or travel data analysis reports.

18. The attention contribution based entity-relationship extraction method of claim 13, wherein: the entity relationship extraction method is used for analyzing medical reports of adverse drug reactions.

19. The attention contribution based entity-relationship extraction method of any of claims 14-17, wherein: the entity relationship extraction method is used for analyzing medical reports of adverse drug reactions.