CN113468325A - Document level relation extraction method based on associated sentence selection and relation graph reasoning - Google Patents
Document level relation extraction method based on associated sentence selection and relation graph reasoning Download PDFInfo
- Publication number
- CN113468325A CN113468325A CN202110643706.XA CN202110643706A CN113468325A CN 113468325 A CN113468325 A CN 113468325A CN 202110643706 A CN202110643706 A CN 202110643706A CN 113468325 A CN113468325 A CN 113468325A
- Authority
- CN
- China
- Prior art keywords
- entity
- sentence
- document
- relationship
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 10
- 238000013528 artificial neural network Methods 0.000 claims abstract description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000013145 classification model Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a document-level relation extraction method based on associated sentence selection and relation graph reasoning, aiming at a head entity and a tail entity of a relation to be determined in a document, the extraction method comprises the following steps: acquiring a first sentence set containing a head entity and a second sentence set containing a tail entity in a document, and selecting a plurality of sentences with similar positions in the document in the first sentence set and the second sentence set as related sentences; and performing text coding on each character in the related sentence, fusing entity codes in the related sentence, inputting the fused entity codes as an entity node of the relational graph model, and acquiring the shortest semantic dependency character related to the entity as the dependency node of the relational graph model. The method can only reserve sentences which are beneficial to entity relation judgment through sentence association selection, and then improves the accuracy rate by using the relation graph reasoning based on the graph convolution neural network.
Description
Technical Field
The invention belongs to the field of relation extraction, and particularly relates to a document level relation extraction method based on associated sentence selection and relation graph reasoning.
Background
Relationship extraction refers to determining the relationship between given entities in a text, such as entity relationship triplets (Beijing, located in China), where Beijing is the head entity and China is the tail entity and located in the entity relationship. The relationship extraction may use a rule-based approach, such as considering that there is a corresponding relationship between two entities whenever some keywords appear in the text. The method is high in speed and accuracy in specific texts, but the method needs special design rules for each text, and is high in labor cost. The other method is a deep learning-based method, the method can comprehensively judge the relationship between two entities by combining context information, and the generalization capability is strong.
The relation extraction can be divided into sentence-level relation extraction and document-level relation extraction. The input text extracted by the sentence-level relation is generally a short sentence, and the logic relation between the entities is relatively simple, so that the relation between the entities can be judged well. However, the input text of the document level relation extraction is a long document, and the following problems can exist by directly using the existing method:
1. a large amount of redundant information exists in the document, the entities are sparse relative to other texts, the relationship among the entities is difficult to extract, and the calculation time is too long;
2. the entity relationship in the document is more complex than the entity relationship in the sentence, and the relationship between the entities can be accurately judged by reasoning a plurality of sentences mutually.
Disclosure of Invention
In view of the problems in the prior art, the invention provides a document-level relation extraction method based on association sentence selection and relation graph reasoning.
In order to achieve the purpose, the invention adopts the following technical scheme:
a document-level relation extraction method based on association sentence selection and relation graph reasoning aims at a head entity and a tail entity of a relation to be determined in a document, and comprises the following steps: acquiring a first sentence set containing a head entity and a second sentence set containing a tail entity in a document, and selecting a plurality of sentences with similar positions in the document in the first sentence set and the second sentence set as related sentences; performing text coding on each character in the related sentence, fusing entity codes in the related sentence, inputting the fused entity codes as an entity node of the relational graph model, and acquiring the shortest semantic dependency character related to the entity, and inputting the shortest semantic dependency character as a dependency node of the relational graph model; initializing link weights among all nodes; converging the characteristics of the neighbor nodes by using a graph convolutional neural network so as to fuse the characteristics of all the nodes with each other; and acquiring the characteristics of two entity nodes corresponding to the head entity and the tail entity, and judging the relationship between the head entity and the tail entity.
Preferably, the text encoding of each character in the associated sentence includes: text encoding is performed using a pre-trained language model.
Preferably, the obtaining the shortest semantic dependency character related to the entity includes: and acquiring the shortest semantic dependency character related to the entity by using a semantic dependency analyzer.
Preferably, the initializing link weights among all nodes includes: the link weights between all nodes are initialized using attention mechanism and matrix tree theorem.
Preferably, the feature of using the graph convolutional neural network to converge the neighbor nodes comprises: the features of the neighbor nodes are aggregated using a multi-layer graph convolutional neural network.
Preferably, the determining the relationship between the head entity and the tail entity includes: and judging the relation between the head entity and the tail entity by using the classification model.
A storage medium having stored thereon a computer program for implementing any of the extraction methods described herein.
A document-level relationship extraction apparatus based on associative sentence selection and relationship graph inference, the apparatus comprising: the relevant sentence selection module is used for acquiring a first sentence set containing a head entity and a second sentence set containing a tail entity in a document, and selecting a plurality of sentences with similar positions in the document in the first sentence set and the second sentence set as relevant sentences; the node construction module is used for performing text coding on each character in the related sentence, fusing entity codes in the related sentence and inputting the fused entity codes as an entity node of the relationship graph model, and is also used for acquiring the shortest semantic dependency character related to the entity and inputting the shortest semantic dependency character as a dependency node of the relationship graph model; the relational graph reasoning module is used for initializing the link weights among all the nodes and converging the characteristics of the neighbor nodes by using a graph convolution neural network so as to enable the characteristics among all the nodes to be mutually fused; and the entity relationship judging module is used for acquiring the characteristics of two entity nodes corresponding to the head entity and the tail entity and judging the relationship between the head entity and the tail entity.
Compared with the prior art, the invention has the beneficial effects that: 1. through the selection of the associated sentences, sentences which are beneficial to relation extraction are accurately positioned, and the filtering of redundant sentences can not only improve the running speed, but also reduce the interference so as to improve the accuracy; 2. complex logical reasoning may need to be performed among a plurality of sentences to judge the relationship among the entities, and the reasoning ability can be greatly improved by introducing the graph convolution neural network.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.
As shown in fig. 1, a document-level relationship extraction method is provided for a head entity and a tail entity of a relationship to be determined, and mainly includes the following four parts:
first, associated sentence selection
The full text is searched first, and a sentence set containing a head entity and a sentence set containing a tail entity are respectively obtained. And then selecting a plurality of sentences with similar positions in the two sentence sets as related sentences based on a distance priority principle.
Two, node structure
After the associated sentence is selected, each character in the associated sentence is subjected to text coding by using a pre-training language model. And fusing the entity codes in the related conjunctions and inputting the fused entity codes as entity nodes of the relational graph model. And acquiring the shortest semantic dependency character related to the entity by using a semantic dependency analyzer as the dependency node input of the relationship graph model.
Third, relationship graph reasoning
After the entity nodes and the dependency nodes are obtained, firstly, an attention mechanism and a matrix tree theorem are used for initializing the link weights among all the nodes, and then, a multilayer graph convolution neural network is used for converging the characteristics of the neighbor nodes, so that the characteristics among all the nodes are fused with each other.
Fourth, entity relation judgment
The characteristics of two entity nodes (a head entity node and a tail entity node) are obtained, and the classification model is used for judging the relation between the entities.
Although the present invention has been described in detail with respect to the above embodiments, it will be understood by those skilled in the art that modifications or improvements based on the disclosure of the present invention may be made without departing from the spirit and scope of the invention, and these modifications and improvements are within the spirit and scope of the invention.
Claims (8)
1. A document-level relation extraction method based on association sentence selection and relation graph reasoning is used for a head entity and a tail entity of a relation to be determined in a document, and is characterized by comprising the following steps:
acquiring a first sentence set containing a head entity and a second sentence set containing a tail entity in a document, and selecting a plurality of sentences with similar positions in the document in the first sentence set and the second sentence set as related sentences;
performing text coding on each character in the related sentence, fusing entity codes in the related sentence, inputting the fused entity codes as an entity node of the relational graph model, and acquiring the shortest semantic dependency character related to the entity, and inputting the shortest semantic dependency character as a dependency node of the relational graph model;
initializing link weights among all nodes;
converging the characteristics of the neighbor nodes by using a graph convolutional neural network so as to fuse the characteristics of all the nodes with each other;
and acquiring the characteristics of two entity nodes corresponding to the head entity and the tail entity, and judging the relationship between the head entity and the tail entity.
2. The method of claim 1, wherein the text encoding of each character in the associated sentence comprises: text encoding is performed using a pre-trained language model.
3. The document-level relationship extraction method based on associative sentence selection and relationship graph inference as claimed in claim 2, wherein said obtaining the shortest semantic dependency character related to an entity comprises: and acquiring the shortest semantic dependency character related to the entity by using a semantic dependency analyzer.
4. The document-level relationship extraction method based on associative sentence selection and relationship graph inference as claimed in claim 3, wherein said initializing link weights among all nodes comprises: the link weights between all nodes are initialized using attention mechanism and matrix tree theorem.
5. The document-level relationship extraction method based on associative sentence selection and relational graph inference as claimed in claim 4, wherein said converging features of neighbor nodes using a graph convolutional neural network comprises: the features of the neighbor nodes are aggregated using a multi-layer graph convolutional neural network.
6. The method of claim 5, wherein the determining the relationship between the head entity and the tail entity comprises: and judging the relation between the head entity and the tail entity by using the classification model.
7. A storage medium, wherein a computer program is stored in the storage medium, and the computer program is executed to implement the extraction method according to any one of claims 1 to 6.
8. A document-level relationship extraction apparatus based on associative sentence selection and relationship graph inference, the apparatus comprising:
the relevant sentence selection module is used for acquiring a first sentence set containing a head entity and a second sentence set containing a tail entity in a document, and selecting a plurality of sentences with similar positions in the document in the first sentence set and the second sentence set as relevant sentences;
the node construction module is used for performing text coding on each character in the related sentence, fusing entity codes in the related sentence and inputting the fused entity codes as an entity node of the relationship graph model, and is also used for acquiring the shortest semantic dependency character related to the entity and inputting the shortest semantic dependency character as a dependency node of the relationship graph model;
the relational graph reasoning module is used for initializing the link weights among all the nodes and converging the characteristics of the neighbor nodes by using a graph convolution neural network so as to enable the characteristics among all the nodes to be mutually fused; and
and the entity relationship judging module is used for acquiring the characteristics of two entity nodes corresponding to the head entity and the tail entity and judging the relationship between the head entity and the tail entity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110643706.XA CN113468325A (en) | 2021-06-09 | 2021-06-09 | Document level relation extraction method based on associated sentence selection and relation graph reasoning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110643706.XA CN113468325A (en) | 2021-06-09 | 2021-06-09 | Document level relation extraction method based on associated sentence selection and relation graph reasoning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113468325A true CN113468325A (en) | 2021-10-01 |
Family
ID=77869684
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110643706.XA Pending CN113468325A (en) | 2021-06-09 | 2021-06-09 | Document level relation extraction method based on associated sentence selection and relation graph reasoning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113468325A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114610903A (en) * | 2022-03-29 | 2022-06-10 | 科大讯飞(苏州)科技有限公司 | Text relation extraction method, device, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110209836A (en) * | 2019-05-17 | 2019-09-06 | 北京邮电大学 | Remote supervisory Relation extraction method and device |
CN111444713A (en) * | 2019-01-16 | 2020-07-24 | 清华大学 | Method and device for extracting entity relationship in news event |
CN111831783A (en) * | 2020-07-07 | 2020-10-27 | 北京北大软件工程股份有限公司 | Chapter-level relation extraction method |
US20210019370A1 (en) * | 2019-07-19 | 2021-01-21 | Siemens Aktiengesellschaft | Neural relation extraction within and across sentence boundaries |
CN112347761A (en) * | 2020-11-27 | 2021-02-09 | 北京工业大学 | Bert-based drug relationship extraction method |
CN112883199A (en) * | 2021-03-09 | 2021-06-01 | 重庆大学 | Collaborative disambiguation method based on deep semantic neighbor and multi-entity association |
-
2021
- 2021-06-09 CN CN202110643706.XA patent/CN113468325A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444713A (en) * | 2019-01-16 | 2020-07-24 | 清华大学 | Method and device for extracting entity relationship in news event |
CN110209836A (en) * | 2019-05-17 | 2019-09-06 | 北京邮电大学 | Remote supervisory Relation extraction method and device |
US20210019370A1 (en) * | 2019-07-19 | 2021-01-21 | Siemens Aktiengesellschaft | Neural relation extraction within and across sentence boundaries |
CN111831783A (en) * | 2020-07-07 | 2020-10-27 | 北京北大软件工程股份有限公司 | Chapter-level relation extraction method |
CN112347761A (en) * | 2020-11-27 | 2021-02-09 | 北京工业大学 | Bert-based drug relationship extraction method |
CN112883199A (en) * | 2021-03-09 | 2021-06-01 | 重庆大学 | Collaborative disambiguation method based on deep semantic neighbor and multi-entity association |
Non-Patent Citations (2)
Title |
---|
GUOSHUN NAN 等: "Reasoning with Latent Structure Refinement for Document-Level Relation Extraction", 《ARXIV:2005.06312V3》 * |
武晓阳: "面向海量互联网中文文本的实体关系抽取研究与实现", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114610903A (en) * | 2022-03-29 | 2022-06-10 | 科大讯飞(苏州)科技有限公司 | Text relation extraction method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107133213B (en) | Method and system for automatically extracting text abstract based on algorithm | |
CN108416058B (en) | Bi-LSTM input information enhancement-based relation extraction method | |
CN102799577B (en) | A kind of Chinese inter-entity semantic relation extraction method | |
Wu et al. | Chinese micro-blog sentiment analysis based on multiple sentiment dictionaries and semantic rule sets | |
CN106096664B (en) | A kind of sentiment analysis method based on social network data | |
CN106776562A (en) | A kind of keyword extracting method and extraction system | |
CN110532328B (en) | Text concept graph construction method | |
CN108415953A (en) | A kind of non-performing asset based on natural language processing technique manages knowledge management method | |
US20170052950A1 (en) | Extracting information from structured documents comprising natural language text | |
CN107463658A (en) | File classification method and device | |
CN106776548A (en) | A kind of method and apparatus of the Similarity Measure of text | |
CN102214166A (en) | Machine translation system and machine translation method based on syntactic analysis and hierarchical model | |
CN113707339B (en) | Method and system for concept alignment and content inter-translation among multi-source heterogeneous databases | |
WO2024036840A1 (en) | Open-domain dialogue reply method and system based on topic enhancement | |
CN111476031A (en) | Improved Chinese named entity recognition method based on L attice-L STM | |
CN108491581A (en) | A kind of design process knowledge reuse method and system based on design concept model | |
CN115688776A (en) | Relation extraction method for Chinese financial text | |
CN105117386A (en) | Semantic association method based on book content structures | |
CN114997288A (en) | Design resource association method | |
Chen et al. | TRG-DAtt: The target relational graph and double attention network based sentiment analysis and prediction for supporting decision making | |
Yu et al. | Student sentiment classification model based on GRU neural network and TF-IDF algorithm | |
CN113468325A (en) | Document level relation extraction method based on associated sentence selection and relation graph reasoning | |
Fischbach et al. | Fine-grained causality extraction from natural language requirements using recursive neural tensor networks | |
Ding et al. | A Knowledge-Enriched and Span-Based Network for Joint Entity and Relation Extraction. | |
Luo et al. | Multi-featured cyberbullying detection based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211001 |