CN113468325A - Document level relation extraction method based on associated sentence selection and relation graph reasoning - Google Patents

Document level relation extraction method based on associated sentence selection and relation graph reasoning Download PDF

Info

Publication number
CN113468325A
CN113468325A CN202110643706.XA CN202110643706A CN113468325A CN 113468325 A CN113468325 A CN 113468325A CN 202110643706 A CN202110643706 A CN 202110643706A CN 113468325 A CN113468325 A CN 113468325A
Authority
CN
China
Prior art keywords
entity
sentence
document
relationship
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110643706.XA
Other languages
Chinese (zh)
Inventor
董贇
张希翔
梁仲峰
黄琦
蒙琦
杜春辉
高翔
岳小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Power Grid Co Ltd
Original Assignee
Guangxi Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Power Grid Co Ltd filed Critical Guangxi Power Grid Co Ltd
Priority to CN202110643706.XA priority Critical patent/CN113468325A/en
Publication of CN113468325A publication Critical patent/CN113468325A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a document-level relation extraction method based on associated sentence selection and relation graph reasoning, aiming at a head entity and a tail entity of a relation to be determined in a document, the extraction method comprises the following steps: acquiring a first sentence set containing a head entity and a second sentence set containing a tail entity in a document, and selecting a plurality of sentences with similar positions in the document in the first sentence set and the second sentence set as related sentences; and performing text coding on each character in the related sentence, fusing entity codes in the related sentence, inputting the fused entity codes as an entity node of the relational graph model, and acquiring the shortest semantic dependency character related to the entity as the dependency node of the relational graph model. The method can only reserve sentences which are beneficial to entity relation judgment through sentence association selection, and then improves the accuracy rate by using the relation graph reasoning based on the graph convolution neural network.

Description

Document level relation extraction method based on associated sentence selection and relation graph reasoning
Technical Field
The invention belongs to the field of relation extraction, and particularly relates to a document level relation extraction method based on associated sentence selection and relation graph reasoning.
Background
Relationship extraction refers to determining the relationship between given entities in a text, such as entity relationship triplets (Beijing, located in China), where Beijing is the head entity and China is the tail entity and located in the entity relationship. The relationship extraction may use a rule-based approach, such as considering that there is a corresponding relationship between two entities whenever some keywords appear in the text. The method is high in speed and accuracy in specific texts, but the method needs special design rules for each text, and is high in labor cost. The other method is a deep learning-based method, the method can comprehensively judge the relationship between two entities by combining context information, and the generalization capability is strong.
The relation extraction can be divided into sentence-level relation extraction and document-level relation extraction. The input text extracted by the sentence-level relation is generally a short sentence, and the logic relation between the entities is relatively simple, so that the relation between the entities can be judged well. However, the input text of the document level relation extraction is a long document, and the following problems can exist by directly using the existing method:
1. a large amount of redundant information exists in the document, the entities are sparse relative to other texts, the relationship among the entities is difficult to extract, and the calculation time is too long;
2. the entity relationship in the document is more complex than the entity relationship in the sentence, and the relationship between the entities can be accurately judged by reasoning a plurality of sentences mutually.
Disclosure of Invention
In view of the problems in the prior art, the invention provides a document-level relation extraction method based on association sentence selection and relation graph reasoning.
In order to achieve the purpose, the invention adopts the following technical scheme:
a document-level relation extraction method based on association sentence selection and relation graph reasoning aims at a head entity and a tail entity of a relation to be determined in a document, and comprises the following steps: acquiring a first sentence set containing a head entity and a second sentence set containing a tail entity in a document, and selecting a plurality of sentences with similar positions in the document in the first sentence set and the second sentence set as related sentences; performing text coding on each character in the related sentence, fusing entity codes in the related sentence, inputting the fused entity codes as an entity node of the relational graph model, and acquiring the shortest semantic dependency character related to the entity, and inputting the shortest semantic dependency character as a dependency node of the relational graph model; initializing link weights among all nodes; converging the characteristics of the neighbor nodes by using a graph convolutional neural network so as to fuse the characteristics of all the nodes with each other; and acquiring the characteristics of two entity nodes corresponding to the head entity and the tail entity, and judging the relationship between the head entity and the tail entity.
Preferably, the text encoding of each character in the associated sentence includes: text encoding is performed using a pre-trained language model.
Preferably, the obtaining the shortest semantic dependency character related to the entity includes: and acquiring the shortest semantic dependency character related to the entity by using a semantic dependency analyzer.
Preferably, the initializing link weights among all nodes includes: the link weights between all nodes are initialized using attention mechanism and matrix tree theorem.
Preferably, the feature of using the graph convolutional neural network to converge the neighbor nodes comprises: the features of the neighbor nodes are aggregated using a multi-layer graph convolutional neural network.
Preferably, the determining the relationship between the head entity and the tail entity includes: and judging the relation between the head entity and the tail entity by using the classification model.
A storage medium having stored thereon a computer program for implementing any of the extraction methods described herein.
A document-level relationship extraction apparatus based on associative sentence selection and relationship graph inference, the apparatus comprising: the relevant sentence selection module is used for acquiring a first sentence set containing a head entity and a second sentence set containing a tail entity in a document, and selecting a plurality of sentences with similar positions in the document in the first sentence set and the second sentence set as relevant sentences; the node construction module is used for performing text coding on each character in the related sentence, fusing entity codes in the related sentence and inputting the fused entity codes as an entity node of the relationship graph model, and is also used for acquiring the shortest semantic dependency character related to the entity and inputting the shortest semantic dependency character as a dependency node of the relationship graph model; the relational graph reasoning module is used for initializing the link weights among all the nodes and converging the characteristics of the neighbor nodes by using a graph convolution neural network so as to enable the characteristics among all the nodes to be mutually fused; and the entity relationship judging module is used for acquiring the characteristics of two entity nodes corresponding to the head entity and the tail entity and judging the relationship between the head entity and the tail entity.
Compared with the prior art, the invention has the beneficial effects that: 1. through the selection of the associated sentences, sentences which are beneficial to relation extraction are accurately positioned, and the filtering of redundant sentences can not only improve the running speed, but also reduce the interference so as to improve the accuracy; 2. complex logical reasoning may need to be performed among a plurality of sentences to judge the relationship among the entities, and the reasoning ability can be greatly improved by introducing the graph convolution neural network.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.
As shown in fig. 1, a document-level relationship extraction method is provided for a head entity and a tail entity of a relationship to be determined, and mainly includes the following four parts:
first, associated sentence selection
The full text is searched first, and a sentence set containing a head entity and a sentence set containing a tail entity are respectively obtained. And then selecting a plurality of sentences with similar positions in the two sentence sets as related sentences based on a distance priority principle.
Two, node structure
After the associated sentence is selected, each character in the associated sentence is subjected to text coding by using a pre-training language model. And fusing the entity codes in the related conjunctions and inputting the fused entity codes as entity nodes of the relational graph model. And acquiring the shortest semantic dependency character related to the entity by using a semantic dependency analyzer as the dependency node input of the relationship graph model.
Third, relationship graph reasoning
After the entity nodes and the dependency nodes are obtained, firstly, an attention mechanism and a matrix tree theorem are used for initializing the link weights among all the nodes, and then, a multilayer graph convolution neural network is used for converging the characteristics of the neighbor nodes, so that the characteristics among all the nodes are fused with each other.
Fourth, entity relation judgment
The characteristics of two entity nodes (a head entity node and a tail entity node) are obtained, and the classification model is used for judging the relation between the entities.
Although the present invention has been described in detail with respect to the above embodiments, it will be understood by those skilled in the art that modifications or improvements based on the disclosure of the present invention may be made without departing from the spirit and scope of the invention, and these modifications and improvements are within the spirit and scope of the invention.

Claims (8)

1. A document-level relation extraction method based on association sentence selection and relation graph reasoning is used for a head entity and a tail entity of a relation to be determined in a document, and is characterized by comprising the following steps:
acquiring a first sentence set containing a head entity and a second sentence set containing a tail entity in a document, and selecting a plurality of sentences with similar positions in the document in the first sentence set and the second sentence set as related sentences;
performing text coding on each character in the related sentence, fusing entity codes in the related sentence, inputting the fused entity codes as an entity node of the relational graph model, and acquiring the shortest semantic dependency character related to the entity, and inputting the shortest semantic dependency character as a dependency node of the relational graph model;
initializing link weights among all nodes;
converging the characteristics of the neighbor nodes by using a graph convolutional neural network so as to fuse the characteristics of all the nodes with each other;
and acquiring the characteristics of two entity nodes corresponding to the head entity and the tail entity, and judging the relationship between the head entity and the tail entity.
2. The method of claim 1, wherein the text encoding of each character in the associated sentence comprises: text encoding is performed using a pre-trained language model.
3. The document-level relationship extraction method based on associative sentence selection and relationship graph inference as claimed in claim 2, wherein said obtaining the shortest semantic dependency character related to an entity comprises: and acquiring the shortest semantic dependency character related to the entity by using a semantic dependency analyzer.
4. The document-level relationship extraction method based on associative sentence selection and relationship graph inference as claimed in claim 3, wherein said initializing link weights among all nodes comprises: the link weights between all nodes are initialized using attention mechanism and matrix tree theorem.
5. The document-level relationship extraction method based on associative sentence selection and relational graph inference as claimed in claim 4, wherein said converging features of neighbor nodes using a graph convolutional neural network comprises: the features of the neighbor nodes are aggregated using a multi-layer graph convolutional neural network.
6. The method of claim 5, wherein the determining the relationship between the head entity and the tail entity comprises: and judging the relation between the head entity and the tail entity by using the classification model.
7. A storage medium, wherein a computer program is stored in the storage medium, and the computer program is executed to implement the extraction method according to any one of claims 1 to 6.
8. A document-level relationship extraction apparatus based on associative sentence selection and relationship graph inference, the apparatus comprising:
the relevant sentence selection module is used for acquiring a first sentence set containing a head entity and a second sentence set containing a tail entity in a document, and selecting a plurality of sentences with similar positions in the document in the first sentence set and the second sentence set as relevant sentences;
the node construction module is used for performing text coding on each character in the related sentence, fusing entity codes in the related sentence and inputting the fused entity codes as an entity node of the relationship graph model, and is also used for acquiring the shortest semantic dependency character related to the entity and inputting the shortest semantic dependency character as a dependency node of the relationship graph model;
the relational graph reasoning module is used for initializing the link weights among all the nodes and converging the characteristics of the neighbor nodes by using a graph convolution neural network so as to enable the characteristics among all the nodes to be mutually fused; and
and the entity relationship judging module is used for acquiring the characteristics of two entity nodes corresponding to the head entity and the tail entity and judging the relationship between the head entity and the tail entity.
CN202110643706.XA 2021-06-09 2021-06-09 Document level relation extraction method based on associated sentence selection and relation graph reasoning Pending CN113468325A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110643706.XA CN113468325A (en) 2021-06-09 2021-06-09 Document level relation extraction method based on associated sentence selection and relation graph reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110643706.XA CN113468325A (en) 2021-06-09 2021-06-09 Document level relation extraction method based on associated sentence selection and relation graph reasoning

Publications (1)

Publication Number Publication Date
CN113468325A true CN113468325A (en) 2021-10-01

Family

ID=77869684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110643706.XA Pending CN113468325A (en) 2021-06-09 2021-06-09 Document level relation extraction method based on associated sentence selection and relation graph reasoning

Country Status (1)

Country Link
CN (1) CN113468325A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610903A (en) * 2022-03-29 2022-06-10 科大讯飞(苏州)科技有限公司 Text relation extraction method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209836A (en) * 2019-05-17 2019-09-06 北京邮电大学 Remote supervisory Relation extraction method and device
CN111444713A (en) * 2019-01-16 2020-07-24 清华大学 Method and device for extracting entity relationship in news event
CN111831783A (en) * 2020-07-07 2020-10-27 北京北大软件工程股份有限公司 Chapter-level relation extraction method
US20210019370A1 (en) * 2019-07-19 2021-01-21 Siemens Aktiengesellschaft Neural relation extraction within and across sentence boundaries
CN112347761A (en) * 2020-11-27 2021-02-09 北京工业大学 Bert-based drug relationship extraction method
CN112883199A (en) * 2021-03-09 2021-06-01 重庆大学 Collaborative disambiguation method based on deep semantic neighbor and multi-entity association

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444713A (en) * 2019-01-16 2020-07-24 清华大学 Method and device for extracting entity relationship in news event
CN110209836A (en) * 2019-05-17 2019-09-06 北京邮电大学 Remote supervisory Relation extraction method and device
US20210019370A1 (en) * 2019-07-19 2021-01-21 Siemens Aktiengesellschaft Neural relation extraction within and across sentence boundaries
CN111831783A (en) * 2020-07-07 2020-10-27 北京北大软件工程股份有限公司 Chapter-level relation extraction method
CN112347761A (en) * 2020-11-27 2021-02-09 北京工业大学 Bert-based drug relationship extraction method
CN112883199A (en) * 2021-03-09 2021-06-01 重庆大学 Collaborative disambiguation method based on deep semantic neighbor and multi-entity association

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUOSHUN NAN 等: "Reasoning with Latent Structure Refinement for Document-Level Relation Extraction", 《ARXIV:2005.06312V3》 *
武晓阳: "面向海量互联网中文文本的实体关系抽取研究与实现", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610903A (en) * 2022-03-29 2022-06-10 科大讯飞(苏州)科技有限公司 Text relation extraction method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107133213B (en) Method and system for automatically extracting text abstract based on algorithm
CN108416058B (en) Bi-LSTM input information enhancement-based relation extraction method
CN102799577B (en) A kind of Chinese inter-entity semantic relation extraction method
Wu et al. Chinese micro-blog sentiment analysis based on multiple sentiment dictionaries and semantic rule sets
CN106096664B (en) A kind of sentiment analysis method based on social network data
CN106776562A (en) A kind of keyword extracting method and extraction system
CN110532328B (en) Text concept graph construction method
CN108415953A (en) A kind of non-performing asset based on natural language processing technique manages knowledge management method
US20170052950A1 (en) Extracting information from structured documents comprising natural language text
CN107463658A (en) File classification method and device
CN106776548A (en) A kind of method and apparatus of the Similarity Measure of text
CN102214166A (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN113707339B (en) Method and system for concept alignment and content inter-translation among multi-source heterogeneous databases
WO2024036840A1 (en) Open-domain dialogue reply method and system based on topic enhancement
CN111476031A (en) Improved Chinese named entity recognition method based on L attice-L STM
CN108491581A (en) A kind of design process knowledge reuse method and system based on design concept model
CN115688776A (en) Relation extraction method for Chinese financial text
CN105117386A (en) Semantic association method based on book content structures
CN114997288A (en) Design resource association method
Chen et al. TRG-DAtt: The target relational graph and double attention network based sentiment analysis and prediction for supporting decision making
Yu et al. Student sentiment classification model based on GRU neural network and TF-IDF algorithm
CN113468325A (en) Document level relation extraction method based on associated sentence selection and relation graph reasoning
Fischbach et al. Fine-grained causality extraction from natural language requirements using recursive neural tensor networks
Ding et al. A Knowledge-Enriched and Span-Based Network for Joint Entity and Relation Extraction.
Luo et al. Multi-featured cyberbullying detection based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211001