CN114417846B - Entity relation extraction method based on attention contribution degree - Google Patents

Entity relation extraction method based on attention contribution degree Download PDF

Info

Publication number
CN114417846B
CN114417846B CN202111410469.9A CN202111410469A CN114417846B CN 114417846 B CN114417846 B CN 114417846B CN 202111410469 A CN202111410469 A CN 202111410469A CN 114417846 B CN114417846 B CN 114417846B
Authority
CN
China
Prior art keywords
entity
relation
attention
sentence
contribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111410469.9A
Other languages
Chinese (zh)
Other versions
CN114417846A (en
Inventor
欧阳建权
张晶
李波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Hailong International Intelligent Technology Co ltd
Xiangtan University
Original Assignee
Hunan Hailong International Intelligent Technology Co ltd
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Hailong International Intelligent Technology Co ltd, Xiangtan University filed Critical Hunan Hailong International Intelligent Technology Co ltd
Priority to CN202111410469.9A priority Critical patent/CN114417846B/en
Publication of CN114417846A publication Critical patent/CN114417846A/en
Application granted granted Critical
Publication of CN114417846B publication Critical patent/CN114417846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an entity relation extraction method based on attention contribution, which comprises the steps of segmenting an original sentence in a dataset through spaCy to obtain a word list, and storing the word list and a label into an input dataset D in a dictionary form; then sampling the input data set D to obtain an entity sample set and a relation sample set of each sentence in the input data set D; selecting a BERT model pre-trained on a large-scale biomedical corpus, a judicial database and a travel database, calculating interaction information among entities by adopting an attention contribution algorithm, and transmitting the interaction information to downstream entity extraction and relationship extraction tasks to form a span-based entity relationship extraction model; and finally, placing the entity sample set and the relation sample set into a span-based entity relation extraction model for training, and greatly improving the F1 value of the entity extraction task and the F1 value of the relation extraction task.

Description

Entity relation extraction method based on attention contribution degree
Technical Field
The invention relates to the field of knowledge extraction, in particular to a physical relationship extraction method based on attention contribution degree and a medical report analysis method based on the attention contribution degree.
Background
In the field of natural language processing, information extraction has been attracting attention, and the information extraction mainly comprises 3 subtasks: entity extraction, relation extraction and event extraction, wherein relation extraction is a core task and an important link in the information extraction field. The main objective of entity relation extraction is to identify and judge specific relations existing between entity pairs from natural language texts, which provides basic support for intelligent retrieval, semantic analysis and the like, is helpful for improving search efficiency and promotes automatic construction of a knowledge base.
The types of relationships involved in the entity relationship extraction of the initial MUC, ACE evaluating conference are limited to a few types of entity relationships between named entities (including person names, place names, organization names, etc.), such as employment relationships, geographic location relationships, person-to-social organization relationships, etc. The evaluation task of SemEval-2007 defines the physical relationship between 7 common nouns or noun phrases, but provides a smaller english corpus scale. The evaluation task of SemEval-2010 is enriched and perfected, and the entity relationship types are expanded to 9 types.
The entity relation corpus issued by UC, ACE and SemEval evaluation conference is obtained by means of manual labeling, namely, domain experts firstly prepare relation type systems and labeling rules, and then judge and screen one by one from large-scale texts. The method consumes a great deal of manpower, has higher cost and is difficult to expand the corpus. In addition, the entity relation corpus obtained by the method has narrow coverage and single sentence instance form.
The entity relationship of the text in the specific field is complex, and has certain requirements on the professional literacy of labeling personnel, so that an automatic entity relationship extraction technology is of great importance. The entity relation extraction research mainly comprises a sequence labeling scheme and a span-based scheme, but the current research has the problems of overlapping relation, entity nesting, large calculated amount, insufficient information mining in specific fields and the like.
The attention contribution degree algorithm is used, the span-based strong negative sample joint extraction method proposed by the SpERT model is used for reference, interaction information between words carried by BERT attention heads trained by a data set in a specific field is fully utilized, and the effect that the F1 value reaches 82.76% on the ADE of a medical report is achieved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an entity relation extraction method based on attention contribution, which adopts a BERT model pre-trained by a large-scale data set, calculates interaction information between entities by adopting an attention contribution algorithm, and transmits the interaction information to a downstream task, thereby greatly improving the F1 value of the entity extraction task and the F1 value of the relation extraction task. And is particularly suited for analysis of data sets in a particular field, such as medical reports.
According to a first embodiment of the present invention, there is provided a method for extracting an entity relationship based on attention contribution.
An attention contribution degree-based entity relationship extraction method, comprising the following steps:
s0: selecting a data set D';
s1: the original sentences in the data set D' are segmented through the spaCy to obtain a word list; storing the word list and the tags contained in the data set D' in the input data set D in a dictionary form;
s2: sampling the input data set D to obtain each sentence D in the input data set D i (d i E) entity sample setAnd relation sample set->
S3: the method comprises the steps of constructing a span-based entity relation extraction model, wherein the entity relation extraction model specifically comprises a BERT pre-training module, an entity extraction module and a relation classification module;
s4: from a set of entity samples for each sentenceCalculate each sentence d i Each entity of->Feature vector +.>And attention contribution->Feature vector +.>Attention contribution degree->Combining and inputting the entity extraction modules to obtain the predicted entity type entity of the entity extraction modules ij
S5: according to each relation sample setCalculate each sentence d i Is>Corresponding head entity->And tail entity->Feature vector +.>Context information->Attention contribution degreeInputting the relation classification module to obtain the predicted relation type relation of the relation classification module ij
S6: training a span-based entity relation extraction model to obtain an attention contribution-based entity relation extraction method.
In the present invention, the dataset D' includes original sentences and tags.
In the present invention, the data set D' is a medical report data set, a judicial data set, or a travel database.
In the present invention, the data set D' is preferably a drug adverse effect (ADE) data set from a benchmark corpus created by gurulingppa.
Alternatively, the judicial dataset is from a China judicial archive database or a CourData judicial field base database. The travel database is from a Rui Si data-travel database.
In the present invention, step S1 specifically includes:
s101: dividing the original sentences in the data set D' into words through the spaCy to obtain a word list; storing the word list after word segmentation into a dictionary dic with keys as token;
s102: the tags in the dataset include entities and entity relationships, the entities being comprised of one or more words; storing the entity type of each entity and start and end of the entity in the keys of the dictionary dic through the dictionary, storing the entity type and the start and end of the entity in the form of an entity element in an entity list, and storing the entity list in the dictionary dic with keys as entries; storing the relation type of each entity relation and the index positions head and tail of the head and tail entities in the items of the dictionary dic through the dictionary, storing the relation type and the index positions tail and tail of each entity relation into a relation list in the form of a relation element, and storing the relation list into the dictionary dic taking the relation as the relation;
s103: one dictionary dic constitutes one sample, a plurality of dictionaries constitutes an input data set D, and the input data set D is formed in a list form and stored in a json file.
In the present invention, step S2 specifically includes:
s201: and (3) entity sampling: selecting all possible entities consisting of 1-10 words to form an entity sampling pool, wherein 100 entity negative samples are randomly selected to be combined with entity positive samples in the entity sampling pool, and each entity sample (including the entity negative samples and the entity positive samples) is combined in a sentence d i The location information, entity type and word number of the entity constitute an entity sample setThe entity positive samples are entities contained in the data set D, and the entity negative samples are entities which are randomly generated in the entity sampling pool and do not belong to the entity positive samples;
s202: and (3) relation sampling: integrating entity samplesThe entities in the sentence are paired in pairs, 100 relationship negative samples are randomly selected to be combined with a relationship positive sample, and the position information of the head and tail entities of each relationship sample (comprising the relationship negative sample and the relationship positive sample) in the original sentence and the relationship type form a relationship sample set->The positive relation sample is the head and tail entity of the existing relation in the data set D, and the negative relation sample is the head and tail entity without relation.
In the present invention, step S3 specifically includes:
s301: BERT pre-training module: using BERT-base-based pre-trained model BioBERT according to input sentence d i Providing downstream tasks with respective entities e from last hidden and items capture of the BERT pretraining model ij Semantic features of (a)
S302: size reducing: for the Embedding layer to learn an entity e consisting of different word numbers ij Features of (2)
S303: entity extraction module entity classification: sequentially a dropout layer, a full connection layer and a softmax layer to obtain the predicted entity type entity of the entity extraction module ij
S304: relationship classification module relation classification: sequentially a dropout layer, a full connection layer and a sigmod layer to obtain a predicted relationship type relation of the relationship classification module ij
In the present invention, step S4 specifically includes:
s401: calculate each sentence d i Each of the entities of (a)Feature vector +.>The formula of (2) is as follows:
wherein token n For sentence d i Words in (i.e. token) n ∈d i ) The method comprises the steps of carrying out a first treatment on the surface of the cls is sentence d i The special classification words which are not maximally pooled are embedded;for entity e ij Length feature vectors of (a);Representation ofEntity e ij Is defined by the intermediate feature vector of (a);Expressed as implicit feature vector +.>Is a mask matrix of (a);
s402: calculate each sentence d i Each entity e of ij Is the physical attention contribution degree of (1)The formula of (2) is as follows:
wherein:representing the constituent entity e ij Attention matrix of all words of (a);Representing attention matrix->In represents token m0 Attention matrix of (2), and-> Representing token m0 A maximally pooled attention scalar; token (token) m0 For entity e ij Words corresponding to position m, and +.>θ 1 The contribution threshold is specifically 0.4-0.8;
s403: each entity sample setCalculated entity feature vector +.>Attention contribution degree->Combining, inputting the entity extraction module entity classification to obtain the predicted entity type entity of the entity extraction module ij The formula is as follows:
in the present invention, step S5 specifically includes:
s501: calculate each sentence d i Each relation of (3)Corresponding head entity->Feature vector +.>The calculation formula is as follows:
wherein token n For sentence d i Words in (i.e. token) n ∈d i ) The method comprises the steps of carrying out a first treatment on the surface of the cls is sentence d i The special classification words which are not maximally pooled are embedded;expressed as head entity->Is defined by the intermediate feature vector of (a);For head entity->Length feature vectors of (a);expressed as implicit feature vector +.>Is a mask matrix of (a);
calculate each sentence d i Each relation of (3)Corresponding tail entity->Feature vectors of (a)The calculation formula is as follows:
wherein token n For sentence d i Words in (i.e. token) n ∈d i ) The method comprises the steps of carrying out a first treatment on the surface of the cls is sentence d i The special classification words which are not maximally pooled are embedded;expressed as tail entity->Is defined by the intermediate feature vector of (a);For tail entity->Length feature vectors of (a);expressed as implicit feature vector +.>Is a mask matrix of (a);
calculate each sentence d i Each relation of (3)Corresponding feature vector>Specifically, the head entity feature vector corresponding to each relation>And tail entity eigenvector->Combining:
s502: calculate each sentence d i Each relation of (3)Corresponding head entity->Attention contribution degree->The calculation formula is as follows:
wherein:representing the composition of the head entity->Attention matrix of all words of (a);Representing an attention matrixIn represents token m1 Attention matrix of (2), and-> Representing token m1 A maximally pooled attention scalar; token (token) m1 For head entity->Words corresponding to position m, and +.>θ 1 The contribution threshold is specifically 0.4-0.8;
calculate each sentence d i Each relation of (3)Corresponding tail entity->Attention contribution degree->The calculation formula is as follows:
wherein:representing the constituent tail entities->Attention matrix of all words of (a);Representing an attention matrixIn represents token m2 Attention matrix of (2), and-> A focus scalar representing a maximum pooling of token nm; tiken m2 For tail entity->Words corresponding to position m, and +.>θ 1 The contribution threshold is specifically 0.4-0.8;
calculate each sentence d i Each relation of (3)Corresponding attention contribution degree->Specifically, the attention contribution degree of the head entity corresponding to each relation is->And tail entity attention contribution +.>Combining:
s503: contextual informationThe calculation formula of (2) is as follows:
wherein the method comprises the steps ofTo be in sentence d i Middle located head entity->And tail entity->A last hidden vector of words in between;
s504: each relation sample setCalculated feature vector +.>Attention contribution degree->And context information->Combining the input relation extraction module relation classification to obtain the predicted relation type relation of the relation classification module ij The formula is as follows:
in the present invention, a joint loss function l=wl is defined in step S6 e +(1-w)L r Training a physical relationship extraction model; wherein: w is a weight, L e Cross entropy loss function for entity extraction module, L r And a binary cross entropy loss function of the relation classification module.
In the present invention, step S6 specifically includes:
s601: setting the loss function to l=wl using Adam optimizer e +(1-w)L r
S602: setting an evaluation standard as a micro F1 value, and training an entity relation extraction model;
if the relation type of the model prediction, the types of the two related entities and the span are consistent with the labels, the prediction is considered to be correct;
a model prediction is considered erroneous if its relationship type, the type of two related entities, and the span are inconsistent with the label.
According to a second embodiment of the present invention, there is provided a use of an entity relationship extraction method based on attention contribution.
An attention contribution based entity relationship extraction method of the first embodiment is used to analyze medical reports or judicial decisions or travel data reports.
Preferably, the method is used for analyzing medical reports of adverse drug reactions.
The invention provides an entity relation extraction method based on attention contribution, which comprises the steps of segmenting an original sentence in a dataset through spaCy to obtain a word list, and storing the word list and a label into an input dataset D in a dictionary form; then sampling the input data set D to obtain an entity sample set and a relation sample set of each sentence in the input data set D; selecting a BERT model pre-trained on a large-scale biomedical corpus, calculating interaction information between entities by adopting an attention contribution algorithm, and transmitting the interaction information to downstream entity extraction and relationship extraction tasks to form a span-based entity relationship extraction model; and finally, placing the entity sample set and the relation sample set into a span-based entity relation extraction model for training, and greatly improving the F1 value of the entity extraction task and the F1 value of the relation extraction task.
The method is aimed at a data set (especially the data set in a specific field), and the data set is preprocessed (sentence word segmentation, and a label and word sequences after word segmentation are stored in a dictionary); then positive sampling and negative sampling are carried out; constructing a span-based entity relation extraction model (comprising a BERT pre-training module, an entity extraction module and a relation classification module); and training to obtain the entity relation extraction method based on the attention contribution degree.
In the present invention, maxpool is derived from the torch.nn.maxpool2d function of the pyrtorch library, i.e. solving for maximum pooling. And inputting the vector or matrix to be solved into the function to obtain the corresponding vector.
In the present invention, the implicit feature vector Output [ last hidden ] is derived from the hidden_states of the transformers. Bertmodel function Output of the transformers library. And inputting the entity to be solved into the function to obtain the corresponding vector.
In the present invention, output [ atttures ] originates from atttures Output by the transformers. BertModel functions of the transformers library. And inputting the vector or matrix to be solved into the function to obtain the corresponding vector.
In the present invention, the average is derived from the torch.mean function of the pyrtorch library, i.e., solving the arithmetic square. The vector or matrix is input into the function to obtain the corresponding solution operand square.
In the present invention, entity classification employs entity extraction module functions in a span-based entity relationship extraction model.
In the present invention, relation classification employs a relationship classification module function in a span-based entity relationship extraction model.
The invention has the following beneficial technical effects:
1. the attention-based contribution algorithm adopted by the invention fully utilizes the interaction information between words carried by the BERT attention head pre-trained by the data set in the specific field, thereby greatly enhancing the mining of the model on the information in the specific field;
2. by adopting the entity relation extraction method, the entity relation can be better migrated to various data sets in specific fields, such as judicial, travel, medical science, news, natural science and the like, through fine tuning;
3. the method provided by the invention is used for analyzing the medical report of the adverse reaction of the medicine, and compared with the SpBert, CMAN, table-Sequence and other advanced methods, the micro F1 value is improved by at least 1.31%.
Drawings
FIG. 1 is a flow chart showing the steps of a method for extracting entity relationships based on attention contribution degree according to the present invention;
FIG. 2 is a span-based entity relationship extraction model according to the present invention.
Detailed Description
The following examples illustrate the technical aspects of the invention, and the scope of the invention claimed includes but is not limited to the following examples.
Example 1
An attention contribution degree-based entity relationship extraction method, comprising the following steps:
s0: selecting a data set D';
s1: word segmentation is carried out on the original sentences through space, and the word list and the labels contained in the data set D' are stored in the input data set D in a dictionary form
S2: sampling the data set D to obtain each sentence D in the input data set D i (d i E) entity sample setClosing deviceSample set of lines->
S3: the method comprises the steps of constructing a span-based entity relation extraction model, wherein the entity relation extraction model specifically comprises a BERT pre-training module, an entity extraction module and a relation classification module;
s4: from each entity sample setCalculate each sentence d i Is->Feature vectors of (a)And entity attention contribution +.>The entity feature vector->Attention contribution degree->Combining and transmitting the entity extraction modules to obtain the predicted entity type entity of the entity extraction modules ij
S5: according to each relation sample setCalculate each sentence d i Every relation->Corresponding head entity->And tail entity->Feature vector +.>And contextual information->Attention contribution of head entity->Tail entity attention contribution +.>Combining the relationship extraction modules to obtain the predicted relationship type relation of the relationship classification module ij
S6: definition of the joint loss function l=wl e +(1-w)L r Training a span-based entity relation extraction model to obtain an attention contribution-based entity relation extraction method.
Application examples
The method of example 1 was employed wherein: the data set D' is used for selecting a medical report of adverse reaction generated by using a certain antibacterial drug as the data set; the data set is from a benchmark corpus created by Guulingappa.
The specific operation of converting the original sentence and the tag into the input data set D described in step S1 is as follows:
(1) Dividing the original sentences in the data set D' into words through space to obtain a word list; storing the word list after word segmentation into a dictionary dic with keys as token;
(2) The tags in the dataset include entities and entity relationships, the entities being comprised of one or more words; storing the entity type (type) of each entity and the start and end index positions (start and end) of the entity marks in the keys of the dictionary dic through the dictionary, storing the entity list in the form of one entity element, and storing the entity list in the dictionary dic with keys as entries; storing the relation type (type) of each relation and the index position (head, tail) of the head and tail entity in the items of the dictionary dic through the dictionary, storing the relation list in the form of a relation element, and storing the relation list in the dictionary dic with the key as the relation;
(3) One dictionary dic constitutes one sample, a plurality of dictionaries constitutes an input data set D, and the input data set D is formed in a list form and stored in a json file.
The specific form of the preprocessed data set D is as follows:
{"tokens":
["Intravenous","azithromycin","-","induced","ototoxicity","."],
"entities":
[{"type":"Adverse-Effect","start":4,"end":5},
{"type":"Drug","start":1,"end":2}],
"relations":
[{"type":"Adverse-Effect","head":0,"tail":1}],
"orig_id":0},
{"tokens":
["Immobilization",",","while","Paget","'s","bone","disease","was","present",",","and","perhaps","enhanced","activation","of","dihydrotachysterol","by","rifampicin",",","could","have","led","to","increased","calcium","-","release","into","the","circulation","."],
"entities":
[{"type":"Adverse-Effect","start":23,"end":27},
{"type":"Drug","start":15,"end":16}],
"relations":
[{"type":"Adverse-Effect","head":0,"tail":1}],
"orig_id":1}
the specific operation of sampling the data set D in step S2 is as follows:
(1) Sampling the entity, selecting all possible entities consisting of 1-10 words to form an entity sampling pool, randomly selecting 100 entity negative samples (entity type is no entity) from the entity sampling pool to be combined with entity positive samples, and combining the entity negative samples with the entity positive samplesSample in sentence d i The location information, entity type and word number of the entity constitute an entity sample setThe entity positive samples are entities contained in the data set D, and the entity negative samples are entities which are randomly generated in the entity sampling pool and do not belong to the entity positive samples;
(2) Relational sampling, sampling entity sample setThe entities in the two-to-two pairing are combined with 100 relation negative samples (the relation type is None) and the relation positive samples to form a relation sample set +.>Relation sample set->The method comprises the basic information of the head and tail entities of each relation in the original sentence, such as the relation type and the like; the positive relation sample is the head and tail entity of the existing relation in the data set D, and the negative relation sample is the head and tail entity without relation.
The specific structure of the span-based entity relationship extraction model (as shown in fig. 2) in step S3 is as follows:
(1) The BERT pre-training module adopts a BERT-base-based pre-trained model BioBERT according to an input sentence d i Providing downstream tasks with last hitden and items captured entities e from the BERT pre-training model ij Semantic features of (a)
(2) size reducing, which is an Embedding layer, with num_reducing being 100 and reducing_dim being 25, to learn an entity e consisting of different word numbers ij Features of (2)
(3) The entity extraction module entity classification sequentially comprises a dropout layer (prop_drop is 0.1), a full connection layer and a softmax layer, wherein the classifier weight is initialized by adopting a normal distributed random number with a mean value of 0 and a variance of 0.02 to obtain the predicted entity type entity of the entity extraction module ij
(4) The relation classification module relation classification sequentially comprises a dropout layer (prop_drop is 0.1), a full connection layer and a sigmod layer, wherein the classifier weight is initialized by adopting a normal-too-distributed random number with the mean value of 0 and the variance of 0.02, and the prediction relation type entity of the entity relation classification module is obtained ij
The specific operation of inputting data by the entity extraction module in step S4 is as follows:
(1) Calculate each sentence d i Each of the entities of (a)Feature vector +.>The formula of (2) is as follows:
wherein token n For sentence d i Words in (i.e. token) n ∈d i ) The method comprises the steps of carrying out a first treatment on the surface of the cls is sentence d i The special classification words which are not maximally pooled are embedded;for entity e ij Length feature vectors of (a);Representing entity e ij Is defined by the intermediate feature vector of (a);Expressed as implicit feature vector +.>Is a mask matrix of (a);
(2) Calculate each sentence d i Each entity e of ij Is the physical attention contribution degree of (1)The method comprises the following steps:
a. extraction entity e ij All corresponding token m0 In sentence d i Attention moment array in (a)The formula is as follows:
b. each token is provided with m0 Attention moment array of (a)Conversion to attention scalar +.>The formula is as follows:
c. each token is provided with m0 In greater than the threshold value theta 1 All of the attention of (3)Force scalarObtaining each entity e by averaging ij The formula is as follows:
wherein:representing the constituent entity e ij Attention matrix of all words of (a);Representing attention matrix->Attention matrix indicated in (a), and +.> Representing token m0 A maximally pooled attention scalar; token (token) m0 For entity e ij Words corresponding to position m, and +.>θ 1 A contribution threshold, specifically 0.5;
(3) Each entity sample setCalculated entity feature vector +.>Attention contribution degree->Splicing in the last dimension->The incoming entity extraction module entity classification obtains the predicted entity type entity of the entity extraction module ij The formula is as follows:
the specific operation of inputting data by the relationship classification module in step S5 is as follows:
(1) From a set of relational samplesCalculate each sentence d i Every relation->Corresponding header entityFeature vector +.>The calculation formula is as follows:
wherein token n For sentence d i Words in (i.e. token) n ∈d i ) The method comprises the steps of carrying out a first treatment on the surface of the cls is sentence d i The special classification words which are not maximally pooled are embedded;expressed as head entity->Is defined by the intermediate feature vector of (a);For head entity->Length feature vectors of (a);expressed as implicit feature vector +.>Is a mask matrix of (a);
calculate each sentence d i Each relation of (3)Corresponding tail entity->Feature vectors of (a)The calculation formula is as follows:
wherein token n For sentence d i Words in (i.e. token) n ∈d i ) The method comprises the steps of carrying out a first treatment on the surface of the cls is sentence d i The special classification words which are not maximally pooled are embedded;expressed as tail entity->Is defined by the intermediate feature vector of (a);For tail entity->Length feature vectors of (a);expressed as implicit feature vector +.>Is a mask matrix of (a);
calculate each sentence d i Each relation of (3)Corresponding feature vector>Specifically, the head entity feature vector corresponding to each relation>And tail entity eigenvector->Combining:
(2) Calculate each sentence d i Each relation of (3)Corresponding head entity->Attention contribution degree->The calculation formula is as follows:
wherein:representing the composition of the head entity->Attention matrix of all words of (a);Representing an attention matrixIn represents token m1 Attention matrix of (2), and-> Representing token m1 A maximally pooled attention scalar; token (token) m1 For head entity->Words corresponding to position m, and +.>θ 1 A contribution threshold, specifically 0.5;
calculate each sentence d i Each relation of (3)Corresponding tail entity->Attention contribution degree->The calculation formula is as follows:
wherein:representing the constituent tail entities->Attention matrix of all words of (a);Representing attention matrix->In represents token m2 Attention matrix of (2), and-> Representing token m2 A maximally pooled attention scalar; token (token) m2 For tail entity->Words corresponding to position m, and +.>θ 1 A contribution threshold, specifically 0.5;
calculate each sentence d i Each relation of (3)Corresponding attention contribution degree->Specifically, the attention contribution degree of the head entity corresponding to each relation is->And tail entity attention contribution +.>And (3) splicing and combining:
(3) From a set of relational samplesPosition information of head and tail entities in the original sentence, head entity +.>And tail entity->The last hidden vector of the word in between is maximally pooled to obtain context information +.>The formula is as follows:
wherein the method comprises the steps ofFor each relationship r ij Corresponding head entity->And tail entity->In sentence d i Middle located head entity->And tail entity->A last hidden vector of words in between;
(4) Each relation sample setCalculated feature vector +.>Attention contribution degree->And context information->Splicing in the last dimension->The incoming entity extraction module relation classification obtains the predicted relationship type relationship of the relationship classification module ij The formula is as follows:
the specific operation of the training model in step S6 is as follows:
(1) Adopts an Adam optimizer, and the learning rate is 5 multiplied by 10 -5 The method comprises the steps of carrying out a first treatment on the surface of the Setting the loss function to l=wl e +(1-w)L r Wherein L is e Cross entropy loss function for entity extraction module, L r The weight w is 0.5 for the binary cross entropy loss function of the relation classification module;
(2) Wherein the evaluation standard is micro F1 value, and if the relationship type of the model prediction, the types and spans of two related entities are consistent with the labels, the prediction is considered to be correct;
(3) The training set has a batch_size of 6 and an epochs of 30.
By using the method provided in the above embodiment, ADE is now used as a data set, a data set extracted from medical reports describing adverse effects of the use of a certain antibacterial drug, which contains 4272 sentences, and the micro F1 value of an attention contribution-based entity relationship extraction method of the present invention is calculated by using the following test method.
Wherein: a=the number of correctly classified relationship instances in the medical report of adverse effects caused by the use of the antibacterial drug, b is the total number of relationship instances in the medical report determined to be adverse effects caused by the use of the antibacterial drug, and c is the total number of relationship instances in the input data set D.
Through calculation, the micro F1 value of the medical report for analyzing the adverse drug reaction by adopting the entity relation extraction method based on the attention contribution degree provided by the invention is 82.76, and compared with the medical report for analyzing the adverse drug reaction by adopting the SpBert, CMAN, table-Sequence and other advanced methods, the micro F1 value of the invention is improved by 1.31%. The entity relation extraction model trained based on the attention contribution method provided by the embodiment can effectively capture word-to-word interaction information in the BERT model pre-trained by the data set in the specific field, and the interaction information is combined into the span-based sample code, so that the model is helped to know the context of an entity described by a sentence, and the effect is better than that of the traditional BERT-based entity relation extraction model.

Claims (19)

1. An attention contribution degree-based entity relationship extraction method, comprising the following steps:
s0: selecting a data set D';
s1: the original sentences in the data set D' are segmented through the spaCy to obtain a word list; storing the word list and the tags contained in the data set D' in the input data set D in a dictionary form;
s2: sampling the input data set D to obtain each sentence D in the input data set D i Is a set of entity samples of (1)And relation sample set->
S3: the method comprises the steps of constructing a span-based entity relation extraction model, wherein the entity relation extraction model specifically comprises a BERT pre-training module, an entity extraction module and a relation classification module;
s4: from a set of entity samples for each sentenceCalculate each sentence d i Each entity e in (a) ij Feature vector +.>And attention contribution->Feature vector +.>Attention contribution degree->Combining and inputting the entity extraction modules to obtain the predicted entity type entity of the entity extraction modules ij The method comprises the steps of carrying out a first treatment on the surface of the Wherein: calculate each sentence d i Each entity e of ij Feature vector +.>The formula of (2) is as follows:
wherein token n For sentence d i Words of (a); cls is sentence d i The special classification words which are not maximally pooled are embedded;for entity e ij Length feature vectors of (a);Representing entity e ij Is defined by the intermediate feature vector of (a);Expressed as implicit feature vectorsIs a mask matrix of (a);
calculate each sentence d i Each entity e of ij Is the physical attention contribution degree of (1)The formula of (2) is as follows:
wherein:representing the constituent entity e ij Attention matrix of all words of (a);Representing attention matrix->In represents token m0 Attention matrix of (2), and-> Representing token m0 A maximally pooled attention scalar; token (token) m0 For entity e ij Words corresponding to position m, and +.>θ 1 The contribution threshold is specifically 0.4-0.8;
s5: according to each relation sample setCalculate each sentence d i Each relation r of (2) ij Corresponding head entity->And tail entityFeature vector +.>Context information->Attention contribution degree->Inputting the relation classification module to obtain the predicted relation type relation of the relation classification module ij The method comprises the steps of carrying out a first treatment on the surface of the Wherein: calculate each sentence d i Each relation r of (2) ij Corresponding feature vectorSpecifically, the head entity feature vector corresponding to each relation>And tail entity eigenvector->Combining:calculate each sentence d i Each relation r of (2) ij Corresponding attention contribution degree->Specifically, the attention contribution degree of the head entity corresponding to each relation is->And tail entity attention contribution +.>Combining:
S6: training a span-based entity relation extraction model to obtain an attention contribution-based entity relation extraction method.
2. The attention contribution based entity relationship extraction method of claim 1, wherein: the dataset D' includes original sentences and tags.
3. The attention contribution based entity relationship extraction method of claim 1, wherein: the data set D' is a medical report data set, a judicial data set or a travel database.
4. The attention contribution based entity relationship extraction method of claim 3, wherein: the data set D' is an adverse drug reaction data set, and the adverse drug reaction data set is from a benchmark corpus created by Guulingappa.
5. The attention contribution based entity relationship extraction method of claim 3, wherein: the judicial data set is from a Chinese judicial archive database or a CourData judicial field basic database; the travel database is from a Rui Si data-travel database.
6. The attention contribution based entity relationship extraction method of any of claims 1-5, wherein: the step S1 specifically comprises the following steps:
s101: dividing the original sentences in the data set D' into words through the spaCy to obtain a word list; storing the word list after word segmentation into a dictionary dic with keys as token;
s102: the tags in the dataset include entities and entity relationships, the entities being comprised of one or more words; storing the entity type of each entity and start and end of the entity in the keys of the dictionary dic through the dictionary, storing the entity type and the start and end of the entity in the form of an entity element in an entity list, and storing the entity list in the dictionary dic with keys as entries; storing the relation type of each entity relation and the index positions head and tail of the head and tail entities in the items of the dictionary dic through the dictionary, storing the relation type and the index positions tail and tail of each entity relation into a relation list in the form of a relation element, and storing the relation list into the dictionary dic taking the relation as the relation;
s103: one dictionary dic constitutes one sample, a plurality of dictionaries constitutes an input data set D, and the input data set D is formed in a list form and stored in a json file.
7. The attention contribution based entity relationship extraction method of any of claims 1-5, wherein: the step S2 specifically comprises the following steps:
s201: and (3) entity sampling: selecting all possible entities consisting of 1-10 words to form an entity sampling pool, randomly selecting 100 entity negative samples and entity positive samples in the entity sampling pool, wherein each entity sample is in sentence d i The location information, entity type and word number of the entity constitute an entity sample setThe entity positive samples are entities contained in the data set D, and the entity negative samples are entities which are randomly generated in the entity sampling pool and do not belong to the entity positive samples;
s202: and (3) relation sampling: integrating entity samplesThe entities in the sentence pair by pair, 100 relationship negative samples are randomly selected to be combined with the relationship positive samples, and the position information of the head and tail entities of each relationship sample in the original sentence and the relationship type form a relationship sample set +.>The positive relation sample is the head and tail entity of the existing relation in the data set D, and the negative relation sample is the head and tail entity without relation.
8. The attention contribution based entity relationship extraction method of any of claims 1-5, wherein: the step S3 specifically comprises the following steps:
s301: BERT pre-training module: using a model BioBERT based on BERT-base-based pre-training,according to the input sentence d i Providing downstream tasks with respective entities e from last hidden and items capture of the BERT pretraining model ij Semantic features of (a)
S302: size reducing: for the Embedding layer to learn an entity e consisting of different word numbers ij Length feature vector of (a)
S303: entity extraction module entity classification: sequentially a dropout layer, a full connection layer and a softmax layer to obtain the predicted entity type entity of the entity extraction module ij
S304: relationship classification module relation classification: sequentially a dropout layer, a full connection layer and a sigmod layer to obtain a predicted relationship type relation of the relationship classification module ij
9. The attention contribution based entity-relationship extraction method of claim 8, wherein: each entity sample setCalculated entity feature vector +.>Attention contribution degree->Combining, inputting the entity extraction module entity classification to obtain the predicted entity type entity of the entity extraction module ij The formula is as follows:
10. the attention contribution based entity-relationship extraction method of claim 8, wherein: the step S5 specifically comprises the following steps:
s501: calculate each sentence d i Each relation r of (2) ij Corresponding header entityFeature vector +.>The calculation formula is as follows:
wherein token n For sentence d i Words of (a); cls is sentence d i The special classification words which are not maximally pooled are embedded;expressed as head entity->Is defined by the intermediate feature vector of (a);For head entity->Length feature vectors of (a);Expressed as implicit feature vector +.>Is a mask matrix of (a);
calculate each sentence d i Each relation r of (2) ij Corresponding tail entityFeature vector +.>The calculation formula is as follows:
wherein token n For sentence d i Words of (a); cls is sentence d i The special classification words which are not maximally pooled are embedded;expressed as tail entity->Is defined by the intermediate feature vector of (a);For tail entity->Length feature vectors of (a);Expressed as implicit feature vector +.>Is a mask matrix of (a);
calculate each sentence d i Each relation r of (2) ij Corresponding feature vectorSpecifically, the head entity feature vector corresponding to each relation>And tail entity eigenvector->Combining:
s502: calculate each sentence d i Each relation r of (2) ij Corresponding header entityAttention contribution degree->The calculation formula is as follows:
wherein:representing the composition of the head entity->Attention matrix of all words of (a);Representing attention matrix->In->Attention matrix of (2), and-> Representation->A maximally pooled attention scalar;For head entity->Words corresponding to position m, and +.>θ 1 The contribution threshold is specifically 0.4-0.8;
calculate each sentence d i Each relation r of (2) ij Corresponding tail entityAttention contribution degree->The calculation formula is as follows:
wherein:representing the constituent tail entities->Attention matrix of all words of (a);Representing an attention matrixIn->Attention matrix of (2), and-> Representation->A maximally pooled attention scalar;For tail entity->Words corresponding to position m, and +.>θ 1 The contribution threshold is specifically 0.4-0.8;
calculate each sentence d i Each relation r of (2) ij Corresponding attention contribution degreeSpecifically, the attention contribution degree of the head entity corresponding to each relation is->And tail entity attention contribution +.>Combining:
s503: contextual informationThe calculation formula of (2) is as follows:
wherein:to be in sentence d i Middle located head entity->And tail entity->A last hidden vector of words in between;
s504: each relation sample setCalculated feature vector +.>Attention contribution degree->And context informationCombining the input relation extraction module relation classification to obtain the predicted relation type relation of the relation classification module ij The formula is as follows:
11. the attention contribution based entity relationship extraction method of any of claims 1-5, wherein: defining a joint loss function l=wl in step S6 e +(1-w)L r Training a physical relationship extraction model; wherein: w is a weight, L e Cross entropy loss function for entity extraction module, L r And a binary cross entropy loss function of the relation classification module.
12. The attention contribution based entity-relationship extraction method of claim 11, wherein: the step S6 specifically comprises the following steps:
s601: setting the loss function to l=wl using Adam optimizer e +(1-w)L r
S602: setting an evaluation standard as a micro F1 value, and training an entity relation extraction model;
if the relation type of the model prediction, the types of the two related entities and the span are consistent with the labels, the prediction is considered to be correct;
a model prediction is considered erroneous if its relationship type, the type of two related entities, and the span are inconsistent with the label.
13. The attention contribution based entity relationship extraction method of any of claims 1-5, 9-10, 12, wherein: the entity relationship extraction method is used for analyzing medical reports, judicial judgement books or travel data analysis reports.
14. The attention contribution based entity-relationship extraction method of claim 6, wherein: the entity relationship extraction method is used for analyzing medical reports, judicial judgement books or travel data analysis reports.
15. The attention contribution based entity-relationship extraction method of claim 7, wherein: the entity relationship extraction method is used for analyzing medical reports, judicial judgement books or travel data analysis reports.
16. The attention contribution based entity-relationship extraction method of claim 8, wherein: the entity relationship extraction method is used for analyzing medical reports, judicial judgement books or travel data analysis reports.
17. The attention contribution based entity-relationship extraction method of claim 11, wherein: the entity relationship extraction method is used for analyzing medical reports, judicial judgement books or travel data analysis reports.
18. The attention contribution based entity-relationship extraction method of claim 13, wherein: the entity relationship extraction method is used for analyzing medical reports of adverse drug reactions.
19. The attention contribution based entity-relationship extraction method of any of claims 14-17, wherein: the entity relationship extraction method is used for analyzing medical reports of adverse drug reactions.
CN202111410469.9A 2021-11-25 2021-11-25 Entity relation extraction method based on attention contribution degree Active CN114417846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111410469.9A CN114417846B (en) 2021-11-25 2021-11-25 Entity relation extraction method based on attention contribution degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111410469.9A CN114417846B (en) 2021-11-25 2021-11-25 Entity relation extraction method based on attention contribution degree

Publications (2)

Publication Number Publication Date
CN114417846A CN114417846A (en) 2022-04-29
CN114417846B true CN114417846B (en) 2023-12-19

Family

ID=81266023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111410469.9A Active CN114417846B (en) 2021-11-25 2021-11-25 Entity relation extraction method based on attention contribution degree

Country Status (1)

Country Link
CN (1) CN114417846B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400455A (en) * 2020-03-18 2020-07-10 北京工业大学 Relation detection method of question-answering system based on knowledge graph
CN112136145A (en) * 2018-03-29 2020-12-25 伯耐沃伦人工智能科技有限公司 Attention filtering for multi-instance learning
CN112148832A (en) * 2019-06-26 2020-12-29 天津大学 Event detection method of dual self-attention network based on label perception
CN112632986A (en) * 2020-12-22 2021-04-09 安徽淘云科技有限公司 Entity characterization model training and characterization method, electronic device and storage medium
CN112818676A (en) * 2021-02-02 2021-05-18 东北大学 Medical entity relationship joint extraction method
CN113051929A (en) * 2021-03-23 2021-06-29 电子科技大学 Entity relationship extraction method based on fine-grained semantic information enhancement
WO2021174774A1 (en) * 2020-07-30 2021-09-10 平安科技(深圳)有限公司 Neural network relationship extraction method, computer device, and readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112136145A (en) * 2018-03-29 2020-12-25 伯耐沃伦人工智能科技有限公司 Attention filtering for multi-instance learning
CN112148832A (en) * 2019-06-26 2020-12-29 天津大学 Event detection method of dual self-attention network based on label perception
CN111400455A (en) * 2020-03-18 2020-07-10 北京工业大学 Relation detection method of question-answering system based on knowledge graph
WO2021174774A1 (en) * 2020-07-30 2021-09-10 平安科技(深圳)有限公司 Neural network relationship extraction method, computer device, and readable storage medium
CN112632986A (en) * 2020-12-22 2021-04-09 安徽淘云科技有限公司 Entity characterization model training and characterization method, electronic device and storage medium
CN112818676A (en) * 2021-02-02 2021-05-18 东北大学 Medical entity relationship joint extraction method
CN113051929A (en) * 2021-03-23 2021-06-29 电子科技大学 Entity relationship extraction method based on fine-grained semantic information enhancement

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Novel Document-Level Relation Extraction Method Based on BERT and Entity Information;XIAOYU HAN 等;《IEEE Access》;96912-96919 *
ENT-BERT: 结合BERT 和实体信息的实体关系分类模型;张东东 等;《小型微型计算机系统》;第41卷(第12期);2557-2562 *
Span-based Joint Entity and Relation Extraction with Transformer Pre-training;Markus Eberts 等;《arXiv:1909.07755v1》;1-8 *
融合多特征BERT 模型的中文实体关系抽取;谢腾 等;《计算机系统应用》;第30卷(第5期);253-261 *

Also Published As

Publication number Publication date
CN114417846A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
Du et al. Explicit interaction model towards text classification
CN112214995B (en) Hierarchical multitasking term embedded learning for synonym prediction
CN109902145B (en) Attention mechanism-based entity relationship joint extraction method and system
CN113761936B (en) Multi-task chapter-level event extraction method based on multi-head self-attention mechanism
Yu et al. Topic-oriented image captioning based on order-embedding
Habernal et al. Exploiting debate portals for semi-supervised argumentation mining in user-generated web discourse
Belinkov et al. Arabic diacritization with recurrent neural networks
CN111143576A (en) Event-oriented dynamic knowledge graph construction method and device
CN113806563B (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
US20160350288A1 (en) Multilingual embeddings for natural language processing
Fu et al. Nested named entity recognition with partially-observed treecrfs
Gao et al. Named entity recognition method of Chinese EMR based on BERT-BiLSTM-CRF
Li et al. UD_BBC: Named entity recognition in social network combined BERT-BiLSTM-CRF with active learning
CN110765240A (en) Semantic matching evaluation method for multiple related sentence pairs
Nasim et al. Sentiment analysis on Urdu tweets using Markov chains
Othman et al. Learning english and arabic question similarity with siamese neural networks in community question answering services
CN111710428B (en) Biomedical text representation method for modeling global and local context interaction
CN111091002B (en) Chinese named entity recognition method
Yang et al. Bidirectional LSTM-CRF for biomedical named entity recognition
Hong et al. BioPREP: deep learning-based predicate classification with SemMedDB
CN113361259B (en) Service flow extraction method
Shirghasemi et al. The impact of active learning algorithm on a cross-lingual model in a Persian sentiment task
CN114417846B (en) Entity relation extraction method based on attention contribution degree
CN116522165A (en) Public opinion text matching system and method based on twin structure
CN114970557B (en) Knowledge enhancement-based cross-language structured emotion analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant