CN113553850A - Entity relation extraction method based on ordered structure encoding pointer network decoding - Google Patents
Entity relation extraction method based on ordered structure encoding pointer network decoding Download PDFInfo
- Publication number
- CN113553850A CN113553850A CN202110338079.9A CN202110338079A CN113553850A CN 113553850 A CN113553850 A CN 113553850A CN 202110338079 A CN202110338079 A CN 202110338079A CN 113553850 A CN113553850 A CN 113553850A
- Authority
- CN
- China
- Prior art keywords
- entity
- layer
- sentence
- decoding
- head
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 64
- 239000013598 vector Substances 0.000 claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 24
- 238000000034 method Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 7
- 238000003058 natural language processing Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000007476 Maximum Likelihood Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 210000002569 neuron Anatomy 0.000 claims description 4
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000013527 convolutional neural network Methods 0.000 claims description 2
- 125000004122 cyclic group Chemical group 0.000 claims description 2
- 230000010354 integration Effects 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 230000000306 recurrent effect Effects 0.000 claims description 2
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 238000003786 synthesis reaction Methods 0.000 claims 1
- 238000002372 labelling Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000013145 classification model Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 239000002585 base Substances 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000012458 free base Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention provides an entity relation extraction method based on ordered structure encoding pointer network decoding, which comprises the following steps: performing Word Embedding on an input layer by using a BERT pre-training model training Word vector, adding a negative example represented by a sentence vector generated by countermeasure training, and constructing a sentence initial vector; capturing global semantic information of the text by using Bi-OnLSTM at an encoding layer; and (3) respectively extracting a head entity, a tail entity and a relation at a decoding layer by using a decoding idea of a pointer network, and using Sigmoid to replace Softmax prediction input to finish an entity relation triple extraction task. Because the decoding layer adopts a pointer network decoding mode, the problems of entity relationship overlapping and effective extraction of more triples contained in sentences can be well solved, and the accuracy of extracting entities and relationships in real time is improved.
Description
Technical Field
The invention belongs to the field of natural language processing.
Background
The birth of computers, the continuous innovation and breakthrough of the technology and the popularization of the internet in the world all lead to the unprecedented improvement of life, study, food and transportation of people. Meanwhile, a large amount of text data is generated every day in the forms of news journal articles, blogs, question-and-answer community forums, social media and the like. Many important information is hidden in the document text data, and people need to acquire the important information through a large amount of complicated screening and reading. Therefore, an information extraction technology is developed to remove redundant data and reduce the amount of human reading while actually acquiring effective information. The information extracted by the extraction technology can help us to acquire and manage the implicit knowledge in a large text corpus, and can be used for constructing a question-answering system, a retrieval and recommendation system. Information extraction techniques differ from manual data filtering to return a series of document data, which can extract the event fact information contained in a given sentence, a speech, a document, or even a batch of data, and the information is composed of entity and relationship information, and is generally called triple data. Entity types such as people, organizations, etc. are the most basic units of information, and entities appearing in a sentence can be related by an explicit relationship of "birth to", "presence", etc. The entity and relationship extraction task (RE) is to automatically identify the relationships between these entities and entities. Through the information extraction technology, people can acquire effective contents in information without reading data word by word. Research aiming at information extraction technology, especially entity relation extraction technology, is still one of the major hotspots in the field of artificial intelligence until now.
Information Extraction (IE) is a new sub-field in natural languages, which has been developed for twenty years now, and its predecessor is text understanding, which has been developed for decades. In the 80 s, the Message Understanding Conference (MUC) established with U.S. government support has been working to drive the development of information extraction technology. MUC attracts the participation of companies laboratories and academic research institutions around the world by holding information extraction games, each competition team can construct a model through three major indexes of an official release data set and an information extraction technology, and then the official evaluates the models by using a test set, so that the information extraction technology is continuously developed and improved.
At present, the top task of natural language processing is to construct a Knowledge Graph (KG), and KG is a large-scale information representation method which can be used in various fields. The most common method for representing KG is to follow a Resource Description Framework (RDF) method, i.e. representing entities by using nodes, and representing the relationship between entities by using edges between every two nodes. Each edge and the two end points of the edge form a set of fact information of the triplets (head entity, relation, tail entity), such as: (Zhou Ji Lun, born in New North City, Taiwan), it means that the birth place of Zhou Ji Lun is in New North City, Taiwan. The KG is a heterogeneous graph network, which contains a large number of different types of entity nodes and relationships, and may even have sentence nodes. By so representing, we can discover from it various attributes of entities, high-level relationships between entities, and associations between relationships. Therefore, the entity relation extraction technology is not important as the bottom layer for constructing the knowledge graph base.
The entity relationship extraction task is a first-stage subtask in the information extraction task, and the main task of the entity relationship extraction task can be divided into 2 subtasks: firstly, named entity recognition is carried out, namely a head entity (also called a main entity object) contained in a sentence is recognized, and then a tail entity (also called a guest entity object) is recognized; the next step is relationship extraction, which is to identify the implied relationship (predicate) between the head entity and the tail entity. The pair of entities and relationships are integrated together in a triplet form (S, P, O), for example (zhou jen, born in taiwan). However, the entity relationship extraction task has two types of problems, which are summarized as follows:
in the first category, the conventional Pipeline method for Pipeline processing includes recognizing named entities, i.e., recognizing two entities existing in a sentence, and then sending the two entities into a relationship classification model to recognize the relationship between the two entities. The essence is that the relation extraction task is divided into 2 subtasks, and the output result of the entity recognition task model is used as the input of the relation classification model. However, this creates several problems:
(1) and (3) error accumulation: errors in entity-phase extraction can affect the relationship extraction performance in the relationship classification phase.
(2) Physical redundancy: because the head and tail entities are extracted first, and the two entities may be found to have no relationship when being classified, the two entities having no relationship are redundant for other tasks of the subsequent knowledge graph, such as entity linking, and using unrelated candidate entities by a question and answer system, the calculated amount is increased, and the model accuracy is reduced.
(3) Interactive missing: the entity recognition task and the relation extraction task may have an association or parameter sharing, and the interaction is lost simply by taking the last subtask output as the next subtask input.
In the second category, a pipeline-based method sometimes extracts a pair of entities that do not have a relationship, and because this method only performs a triple extraction on a sentence once, it also results in that a plurality of triples contained in a sentence are not extracted all. Most importantly, if a sentence contains overlapping entities or relationships, neither the traditional model nor the joint extraction model can completely extract the triples. As shown in fig. 7.
(1) Single Entity Overlap (Single Entity overlay) for example, as shown in Table 1, the sentences "Zhou Jie Lun has evolved" head word D "and" secret cannot say ". "wherein the head entity" Zhou Jilun "corresponds to 2 tail entities" head word D "and" secret that can not be said ", two groups of triples are formed respectively, except that the relation and tail entities are the same, the head entity is overlapped. This problem exists in many sentences.
(2) Entity Pair Overlap (Entity Pair overlay) the sentence "secret cannot say" from the director and lead of Zhou Jie Lung is sold in the box office. "includes 2 sets of entity pairs (Zhoujilun, secret to say) and is a typical overlap of entity pairs where the relationships are different, one is" actor "and one is" director ".
The invention provides a joint entity relationship extraction method aiming at the problems of the entity relationship extraction task. In view of the good performance of the codec frame on other natural language processing tasks, the invention builds the AT-BiOnLSTM-Point pointer network decoding model added with the disturbance item to extract the entity relationship triples by improving the traditional LSTM network based on the codec frame.
Disclosure of Invention
The invention provides an entity relationship extraction method based on ordered structure encoding pointer network decoding, and aims to improve three indexes of entity relationship extraction task accuracy, recall rate and F1 value and the capability of extracting overlapping entity triples. The method comprises the following steps:
(1) and selecting characteristics in an input layer to construct an initial sentence vector, and vectorizing and representing the sentence.
(2) And capturing hierarchical structure information at the coding layer to obtain the hidden embedding of each word of the sentence.
(3) And (4) further extracting abstract features by using a pointer network from the features before the integration of the decoding layer, and extracting sentence triples.
Drawings
FIG. 1 is an overall framework diagram of the entity relationship extraction model of the present invention.
FIG. 2 is an example of an entity to be extracted and a relational data set in a sentence according to the present invention.
Fig. 3 is a schematic diagram of adding an AT perturbation term after a presentation layer, which is adopted by the present invention.
FIG. 4 is a diagram illustrating hierarchical granularity in a sentence according to the present invention.
FIG. 5 is a schematic diagram of the structure of the On-LSTM unit employed in the present invention.
FIG. 6 is a schematic diagram of a pointer network employed in the present invention.
FIG. 7 is an example of the entity overlap type problem of the present invention.
FIG. 8 is a diagram of NYT data set information as used in the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
As shown in fig. 1, the invention mainly uses a codec framework as a basis, constructs a pointer network added with a disturbance item by improving the traditional LSTM to extract the entity relationship, and mainly comprises an Input Layer (Input Layer), an encoding Layer (Encoder Layer), and a decoding Layer (Decoder Layer) (including a head entity pointer labeling Layer and a tail entity relationship pointer labeling Layer). The specific implementation mode is as follows:
The method comprises the following steps: input layer
The improved combined entity relation extraction model is put on a standard English data set NYT (New York times) and a derivative version data set thereof for evaluation. The corpus used in the experiment is an NYT data set obtained by Zeng et al by aligning original data with the relationship on freebase, and has 24 relationships, the test set of the data set is manually labeled, and more overlapping entity relationships are added. As shown in fig. 8 (a).
The NYT data set contains a statistically derived number of overlapping type triples as shown in FIG. 8(b), where it can be seen that each sentence contains an average of 1.5 triples, whether in the training set or the test set. The overlapping entity types are then divided into 3 types: neo (normal) indicates sentences with no entities or overlapping pairs of entities, EPO indicates sentences in which the head and tail entities overlap, SPO indicates sentences in which only a single entity (which may be either the head or tail entity) overlaps.
As shown in fig. 2, the example of the entity to be extracted and the relationship data set in the NYT data set applied by the entity relationship extraction task is shown.
The method comprises the steps of completing sentence vectorization on an input layer, firstly performing Word Embedding on an input sentence by using a Word vector trained by a pre-training model, then adding countermeasure training in the middle of outputting the Word vector to an encoder on a vector representation layer, generating a negative example represented by a sentence vector, and enhancing the performance of model training.
We convert each word into a vector consisting of the following two parts.
1. Word vector
The entity relation extraction task needs to contact the context to find out each entity word and relation word, and how to find the entities and the relation needs to be identified according to the context. Therefore, the invention adopts the word vector which is trained by the BERT pre-training language model and is based on the context to carry out space mapping on the input sentence.
The advantage of BERT over word2vec is that the word vectors trained by BERT are not static, i.e., the semantics are not fixed, and can be well represented for sentences containing ambiguous words; compared with the mode that the ElMo adopts the bidirectional LSTM splicing fusion characteristic, the mode is naturally weaker than the BERT integrated fusion characteristic mode; the better GPT is because it is a one-way language model, and naturally much weaker than BERT.
BERT employs a two-stage training model consistent with GPT: first, language pre-training is performed, and second, Fine-Tuning (Fine-Tuning) is used when applied to downstream tasks.
First, the input sentence sequence may be expressed as X ═ { X1,x2,…xi,…,xnIn which xiRepresent the ith character in the sequence, then we use the pre-trained BERT word vector to convert x iIs shown asThe vector dimension is d.
Then the word vector matrix for the entire sentence is as shown in equation 1.
E=[e1,e2,…,en] (1)
2. Counter training
The countermeasure training (AT) is first proposed in image processing, and aims to improve the robustness of successful classifier identification in an image identification environment. In natural language processing, various variants are generated for different task confrontation training, such as based on text classification, part-of-speech tagging, and the like. So-called confrontational training is actually considered a regularization method. But the countertraining is different from many regularization methods, which introduce random noise, and the countertraining improves the model performance by generating perturbation that is easily recognized as an error example by the classifier.
In order to improve the performance of the entity relationship extraction model, the invention adds countermeasure training on the word embedding layer, and generates a negative example of the original input information by adding some noise on the spliced word vector representation layer, as shown in fig. 2.
The input representation layer model comprises word vectors and countertraining, and a small perturbation function is added in the training data. As shown in equation 2.
I.e. by applying the worst-case disturbance ηadvAdded to the original embedded vector ω to maximize the loss function. Wherein,is a copy of the current model parameters. Then, the original case and the generated negative case are jointly trained, so the final loss is as shown in equation 3.
Step two: coding layer
For tasks in different fields, different combination modes can be selected for the coding layer and the decoding layer, for example, on an image processing task, a convolutional neural network is usually used to form the coding layer, and for a natural language processing field task of extracting event elements, a cyclic neural network is usually selected.
In the text processing of Chinese, we have a level concept, where words are the lowest level, followed by words, followed by sentences, paragraphs, and the like. The higher the hierarchy, the coarser the granularity, the larger the span of information in the sentence. FIG. 4 is a schematic diagram of hierarchical granularity.
However, the neurons of the conventional recurrent neural networks such as LSTM are often disordered, so that the neurons cannot learn and extract hierarchical structure information. Therefore, the invention selects the Bi-directional ordered long-short term memory network (Bi-OnLSTM) as the basic structure of the coding layer, so that the high-level information can be kept for a longer time in the corresponding period, the low-level information is easier to forget in the corresponding interval, and the different information propagation spans form the hierarchical structure of the input sequence. The forward calculation formula of the On-LSTM is shown in formula 4, and FIG. 5 is a schematic structural diagram of the On-LSTM unit.
Wherein, the On-LSTM is modified compared with the traditional LSTM, and the On-LSTM is mainly provided with a main forgetting doorMain input gateAndright/left direction cumsum operations, respectively.
The present invention designs the introduced On-LSTM as a bi-directional network. In the entity relationship extraction task, the acquisition of the unidirectional left-to-right information is not enough to support the entity relationshipAn extraction task needs a layer of On-LSTM from right to left to obtain the following information, so that the structure of an improved joint entity relation extraction model coding layer is Bi-OnLSTM. Computing word x at t time by forward On-LSTMtLeft state(final hidden state of forward propagation layer), and then utilizing backward On-LSTM to calculate word x at time ttRight state(final hidden state of the counter-propagating layer), then the word xtThe output result at the coding layer at time t is
Step three: decoding layer
Because the Bi-OnLSTM of the coding layer captures all hierarchical information and sequence information, the invention extracts the joint entity relationship at the decoding layer and solves the problem of entity relationship overlapping by using the decoding idea of a pointer network.
The invention is different from the prior method of extracting the entities first and then judging the relationship between the entities, and adopts an improved extraction mechanism instead. Fig. 6 is a schematic view of a pointer network. The task can be divided into two stages, the first stage is to mark out possible candidate head entities in the sentence, the second stage is to mark out tail entities and relations according to the semantic and position characteristics of the candidate head entities, so that the problem that one head entity can be overlapped with a plurality of tail entities and relations is solved, and because one head entity obtains triples according to the semantic and position characteristics, the extraction of meaningless triples is avoided, and the redundant information is reduced.
Then, the conventional triple extraction formula becomes a conditional probability formula, as shown in formula 5.
p(s,p,o|Sen)=p(s|Sen)p(p,o|s,Sen) (5)
In the formula, Sen is a sentence representation, and s, p, o are entity relationship triples. First, we use the head entity tag p (s | Sen) to identify the head entity in a sentence, and then use the tail entity tag p (p, o | s, Sen) to identify the tail entity having a corresponding relationship with the head entity for each relationship r.
The extraction of the abstract feature triples by the joint entity relationship extraction decoding layer is composed of the following two modules.
1. Head entity extraction module
Coding vector h output by Bi-On-LSTM coding layeriAnd a head entity extraction module of the decoding layer decodes the vector to identify all possible vectors of the head entity. First, add a header entity label layer on top of the coding layer output, i.e. use two layers of classifiers (label layer): the start layer and the end layer recognize the start position and the end position of the header entity. The specific operation is to use a binary label (0, 1) to label each token represented by a sentence: the token in the start layer if it carries a "1" tag indicates the start position, and the token in the end layer if it carries a "0" tag. The head entity labeling layer calculates the probability of the head entity possibly existing in the sentence as shown in formula 6.
Wherein, a Bi-OnLSTM layer is added in the start label layer firstly, and h before decodingiSending to the layer to obtain further hidden state vector of initial position of head entityIs the probability of the beginning position of the head entity token. Head entity end position hidden state vectorσ is the probability of the head entity token ending position, σ is the activation function.
Then, the maximum likelihood function is calculated for all possible starting positions and ending positions of the head entity, so as to obtain an input sentence token representation x (x)i=hN[i]) The head entity range of (2) is shown in equation 7.
Where L is the length of the sentence token, when x is 1, f { x } ═ 1; conversely, when x is 0, f { x }, is 0.
2. Tail entity and relation extraction module
W head entity vector representation of head entity label layer outputAnd hidden state vector representation x of the coding layer outputi=hN[i]Sending the tail entity and the relation label layer. Similarly, the probability of the possible tail entity is shown in equation 8.
The tail entity maximum likelihood function is shown in equation 9.
Finally, the loss function is calculated according to equations 8 and 9 as shown in equation 10.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited in scope to the specific embodiments. Such variations are obvious and all the inventions utilizing the concepts of the present invention are intended to be protected.
Claims (4)
1. An entity relation extraction method based on ordered structure encoding pointer network decoding is characterized in that the method aims at identifying and extracting a triple composed of entities and relations in a sentence, and the method comprises the following steps:
step 1: selecting characteristics on an input layer to construct an initial sentence vector, and vectorizing and representing the sentence;
step 2: capturing hierarchical structure information on a coding layer, and acquiring hidden embedding of each word of a sentence;
and step 3: and (4) further extracting abstract features by using a pointer network from the features before the integration of the decoding layer, and extracting sentence triples.
2. The method for extracting entity relationship based on ordered structure coded pointer network decoding as claimed in claim 1, wherein the constructing of sentence initial vector in step 1 specifically refers to: in the entity relation extraction task, the invention selects a word vector to add into a countertraining negative case to represent a sentence;
step 1.1: training word vector
With X ═ X1,x2,…xi,…,xnDenotes a sequence of input sentences, where xiRepresenting the ith character in the sequence, adopting word vectors which are trained by a BERT pre-training language model and are based on context to perform space mapping on an input sentence, and adopting a two-stage training model: first, language pre-training is performed, and second, Fine-Tuning (Fine-Tuning) is used when applied to downstream tasks; then, we use the pre-trained BERT word vector to convert x iIs shown asThe vector dimension is d;
then the word vector matrix of the whole sentence is as shown in equation 1;
E=[e1,e2,…,en] (1)
step 1.2: counter training
In order to improve the performance of the entity relationship extraction model, the invention adds countermeasure training on the word embedding layer, and generates a negative example of the original input information by adding some noises on the spliced word vector representation layer, as shown in fig. 2;
the input representation layer model comprises word vectors and countermeasure training, and a small disturbance function is added in the training data; as shown in equation 2;
i.e. by applying the worst-case disturbance ηadvAdding to the original embedded vector ω, thereby maximizing the loss function; wherein,is a copy of the current model parameters; then, the original case and the generated negative case are jointly trained, so the final loss is as shown in equation 3.
3. The entity relationship extraction method based on ordered structure coded pointer network decoding as claimed in claim 2, wherein the capturing of hierarchical structure information and sequence information at the coding layer in step 2 specifically refers to:
for tasks in different fields, different combination modes can be selected for the coding layer and the decoding layer, for example, on an image processing task, a convolutional neural network is usually used for forming the coding layer, and for the natural language processing field task of extracting event elements, a cyclic neural network is usually selected;
In the text processing of Chinese, a concept of a hierarchy exists, a word is the lowest hierarchy, words are the second, and sentences, paragraphs and the like are the next; the higher the hierarchy is, the coarser the granularity is, the larger the span of the information in the sentence is; FIG. 4 is a schematic diagram of hierarchical granularity;
however, the neurons of the conventional recurrent neural networks such as LSTM are usually disordered, so that the neurons cannot learn and extract hierarchical structure information; therefore, the invention selects the bidirectional ordered long-short term memory network (Bi-OnLSTM) as the basic structure of the coding layer, so that the high-level information can be kept for a longer time in the corresponding period, the low-level information is easier to forget in the corresponding interval, and the different information propagation spans form the hierarchical structure of the input sequence; the forward calculation formula of the On-LSTM is shown as formula 4, and FIG. 5 is a schematic structural diagram of the On-LSTM unit;
wherein, the On-LSTM is modified compared with the traditional LSTM, and the On-LSTM is mainly provided with a main forgetting doorMain input gate Andright/left cumsum operation, respectively;
the introduced On-LSTM is designed into a bidirectional network; in the entity relationship extraction task, only acquiring unidirectional left-to-right upper information is not enough to support the entity relationship extraction task, a layer of right-to-left On-LSTM is needed to acquire the lower information, and then the improved coding layer structure of the combined entity relationship extraction model is Bi-OnLSTM; computing word x at t time by forward On-LSTM tLeft state(final hidden state of forward propagation layer), and then utilizing backward On-LSTM to calculate word x at time ttRight state(final hidden state of the counter-propagating layer), then the word xtThe output result at the coding layer at time t is
4. The entity relation extraction method based on ordered structure coded pointer network decoding as claimed in claim 3, wherein the feature before decoding layer synthesis in step 4 further extracting abstract features by using pointer network specifically refers to:
because Bi-OnLSTM of the coding layer captures all hierarchical information and sequence information, the invention extracts the joint entity relationship at the decoding layer and solves the problem of entity relationship overlapping by using the decoding idea of a pointer network;
the invention is different from the prior method of extracting the entities first and then judging the relationship between the entities, and adopts an improved extraction mechanism; FIG. 6 is a schematic diagram of a pointer network; the task can be divided into two stages, wherein the first stage is to mark possible candidate head entities in the sentence, and the second stage is to mark tail entities and relations according to the semantic and position characteristics of the candidate head entities, so that the overlapping problem that one head entity can correspond to a plurality of tail entities and relations is solved, and because one head entity obtains triples according to the semantic and position characteristics, the extraction of meaningless triples is avoided, and the redundant information is reduced;
Therefore, the conventional triple extraction formula becomes a conditional probability solving formula, as shown in formula 5;
p(s,p,o|Sen)=p(s|Sen)p(p,o|s,Sen) (5)
in the formula, Sen is represented by sentences, and s, p and o are entity relationship triples; firstly, using a head entity label p (s-Sen) to identify a head entity in a sentence, and then using a tail entity label p (p, o-s, Sen) to identify a tail entity corresponding to the head entity for each relation r;
extracting an abstract feature triple through a combined entity relationship extraction decoding layer, wherein the abstract feature triple consists of the following two modules;
step 4.1: head entity extraction
Coding vector h output by Bi-OnLSTM coding layeriSending the vector to a head entity extraction module of a decoding layer for decoding, and identifying all vectors which may be head entities; first, add a header entity label layer on top of the coding layer output, i.e. use two layers of classifiers (label layer): identifying the starting position and the ending position of the head entity by the start layer and the end layer; the specific operation is to use a binary label (0, 1) to label each token represented by a sentence: if the token in the start layer is provided with a label of '1' to represent the starting position, and if the token in the end layer is provided with a label of '0'; the probability of the head entity possibly existing in the sentence is solved by the head entity label layer and is shown as a formula 6;
Wherein, a Bi-LSTM layer is added in the start label layer, h _ i before decoding is sent to the layer to obtain a further hidden state vector of the initial position of the head entityProbability of starting position of token which is a head entity; head entity end position hidden state vectorThe probability of the ending position of the token of the head entity is sigma, and sigma is an activation function;
then, the maximum likelihood function is calculated for all possible starting positions and ending positions of the head entity, so as to obtain an input sentenceChild token represents x (x)i=hN[i]) The head entity range of (a) is shown in equation 7;
where L is the length of the sentence token, when x is 1, f { x } ═ 1; conversely, when x is 0, f { x }, is 0;
step 4.2: tail entity and relationship extraction
W head entity vector representation of head entity label layer outputAnd hidden state vector representation x of the coding layer outputi=hN[i]Sending a tail entity and a relation label layer; similarly, the probability of the possible tail entity is shown in equation 8;
the maximum likelihood function of the tail entity is shown in equation 9;
finally, the loss function is calculated according to equations 8 and 9 as shown in equation 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110338079.9A CN113553850A (en) | 2021-03-30 | 2021-03-30 | Entity relation extraction method based on ordered structure encoding pointer network decoding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110338079.9A CN113553850A (en) | 2021-03-30 | 2021-03-30 | Entity relation extraction method based on ordered structure encoding pointer network decoding |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113553850A true CN113553850A (en) | 2021-10-26 |
Family
ID=78101730
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110338079.9A Pending CN113553850A (en) | 2021-03-30 | 2021-03-30 | Entity relation extraction method based on ordered structure encoding pointer network decoding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113553850A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113051929A (en) * | 2021-03-23 | 2021-06-29 | 电子科技大学 | Entity relationship extraction method based on fine-grained semantic information enhancement |
CN113869049A (en) * | 2021-12-03 | 2021-12-31 | 北京大学 | Fact extraction method and device with legal attribute based on legal consultation problem |
CN114298052A (en) * | 2022-01-04 | 2022-04-08 | 中国人民解放军国防科技大学 | Entity joint labeling relation extraction method and system based on probability graph |
CN114691895A (en) * | 2022-05-31 | 2022-07-01 | 南京航天数智科技有限公司 | Criminal case entity relationship joint extraction method based on pointer network |
CN115169326A (en) * | 2022-04-15 | 2022-10-11 | 山西长河科技股份有限公司 | Chinese relation extraction method, device, terminal and storage medium |
CN116226408A (en) * | 2023-03-27 | 2023-06-06 | 中国科学院空天信息创新研究院 | Agricultural product growth environment knowledge graph construction method and device and storage medium |
CN117408247A (en) * | 2023-12-15 | 2024-01-16 | 南京邮电大学 | Intelligent manufacturing triplet extraction method based on relational pointer network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408812A (en) * | 2018-09-30 | 2019-03-01 | 北京工业大学 | A method of the sequence labelling joint based on attention mechanism extracts entity relationship |
US20190370325A1 (en) * | 2018-06-04 | 2019-12-05 | Infosys Limited | Extraction of tokens and relationship between tokens to form an entity relationship map |
CN111914091A (en) * | 2019-05-07 | 2020-11-10 | 四川大学 | Entity and relation combined extraction method based on reinforcement learning |
CN111950297A (en) * | 2020-08-26 | 2020-11-17 | 桂林电子科技大学 | Abnormal event oriented relation extraction method |
CN112183103A (en) * | 2020-10-27 | 2021-01-05 | 杭州电子科技大学 | Convolutional neural network entity relationship extraction method fusing different pre-training word vectors |
-
2021
- 2021-03-30 CN CN202110338079.9A patent/CN113553850A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190370325A1 (en) * | 2018-06-04 | 2019-12-05 | Infosys Limited | Extraction of tokens and relationship between tokens to form an entity relationship map |
CN109408812A (en) * | 2018-09-30 | 2019-03-01 | 北京工业大学 | A method of the sequence labelling joint based on attention mechanism extracts entity relationship |
CN111914091A (en) * | 2019-05-07 | 2020-11-10 | 四川大学 | Entity and relation combined extraction method based on reinforcement learning |
CN111950297A (en) * | 2020-08-26 | 2020-11-17 | 桂林电子科技大学 | Abnormal event oriented relation extraction method |
CN112183103A (en) * | 2020-10-27 | 2021-01-05 | 杭州电子科技大学 | Convolutional neural network entity relationship extraction method fusing different pre-training word vectors |
Non-Patent Citations (2)
Title |
---|
QIANQIAN ZHANG 等: "A Review on Entity Relation Extraction", 《2017 SECOND INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING》 * |
张心怡 等: "面向煤矿的实体识别与关系抽取模型", 《计算机应用》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113051929A (en) * | 2021-03-23 | 2021-06-29 | 电子科技大学 | Entity relationship extraction method based on fine-grained semantic information enhancement |
CN113869049A (en) * | 2021-12-03 | 2021-12-31 | 北京大学 | Fact extraction method and device with legal attribute based on legal consultation problem |
CN114298052A (en) * | 2022-01-04 | 2022-04-08 | 中国人民解放军国防科技大学 | Entity joint labeling relation extraction method and system based on probability graph |
CN115169326A (en) * | 2022-04-15 | 2022-10-11 | 山西长河科技股份有限公司 | Chinese relation extraction method, device, terminal and storage medium |
CN114691895A (en) * | 2022-05-31 | 2022-07-01 | 南京航天数智科技有限公司 | Criminal case entity relationship joint extraction method based on pointer network |
CN116226408A (en) * | 2023-03-27 | 2023-06-06 | 中国科学院空天信息创新研究院 | Agricultural product growth environment knowledge graph construction method and device and storage medium |
CN116226408B (en) * | 2023-03-27 | 2023-12-19 | 中国科学院空天信息创新研究院 | Agricultural product growth environment knowledge graph construction method and device and storage medium |
CN117408247A (en) * | 2023-12-15 | 2024-01-16 | 南京邮电大学 | Intelligent manufacturing triplet extraction method based on relational pointer network |
CN117408247B (en) * | 2023-12-15 | 2024-03-29 | 南京邮电大学 | Intelligent manufacturing triplet extraction method based on relational pointer network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021147726A1 (en) | Information extraction method and apparatus, electronic device and storage medium | |
CN113553850A (en) | Entity relation extraction method based on ordered structure encoding pointer network decoding | |
US20220050967A1 (en) | Extracting definitions from documents utilizing definition-labeling-dependent machine learning background | |
CN113177124B (en) | Method and system for constructing knowledge graph in vertical field | |
CN114064918B (en) | Multi-modal event knowledge graph construction method | |
CN114020936B (en) | Construction method and system of multi-modal affair map and readable storage medium | |
CN113254610B (en) | Multi-round conversation generation method for patent consultation | |
US12002276B2 (en) | Document distinguishing based on page sequence learning | |
CN112100332A (en) | Word embedding expression learning method and device and text recall method and device | |
CN115034224A (en) | News event detection method and system integrating representation of multiple text semantic structure diagrams | |
KR102379660B1 (en) | Method for utilizing deep learning based semantic role analysis | |
Perez-Martin et al. | A comprehensive review of the video-to-text problem | |
CN113312912A (en) | Machine reading understanding method for traffic infrastructure detection text | |
CN111881292A (en) | Text classification method and device | |
CN114492661B (en) | Text data classification method and device, computer equipment and storage medium | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
CN114880307A (en) | Structured modeling method for knowledge in open education field | |
CN116628207A (en) | Training method and device for text classification model, electronic equipment and storage medium | |
CN116384403A (en) | Multi-mode social media named entity recognition method based on scene graph | |
CN114519353B (en) | Model training method, emotion message generation method and device, equipment and medium | |
CN115964497A (en) | Event extraction method integrating attention mechanism and convolutional neural network | |
CN116186241A (en) | Event element extraction method and device based on semantic analysis and prompt learning, electronic equipment and storage medium | |
CN114911940A (en) | Text emotion recognition method and device, electronic equipment and storage medium | |
Divya et al. | An Empirical Study on Fake News Detection System using Deep and Machine Learning Ensemble Techniques | |
Zhang | Exploration of Cross‐Modal Text Generation Methods in Smart Justice |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20211026 |
|
WD01 | Invention patent application deemed withdrawn after publication |