CN113449084A

CN113449084A - Relationship extraction method based on graph convolution

Info

Publication number: CN113449084A
Application number: CN202111021201.6A
Authority: CN
Inventors: 陶建华; 张华�; 张大伟; 杨国花; 刘通
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2021-09-28

Abstract

The invention provides a relation extraction method based on graph convolution, which comprises the following steps: language analysis preprocessing: performing word segmentation and dependency syntax analysis on an original sentence in a data set by means of a natural language analysis tool to obtain a word segmentation result of the original sentence, constructing and obtaining a dependency syntax tree which represents semantic dependency relationship between words in the original sentence, and generating an adjacency matrix according to topological relationship between nodes in the dependency syntax tree; and (3) searching the word vector: each word of the original sentence can be converted into a corresponding word vector by querying the word vector table, so that vectorization representation of the original sentence is obtained; extracting features by a graph convolution neural network: inputting the adjacent matrix and the vectorization representation of each word into a graph convolution network, and learning to obtain feature representation; and (4) relation classification: and splicing the feature representations, sending the spliced feature representations into a learning neural network to obtain final representations, and obtaining probability distribution of the entity pairs on each relation according to the feature representations, wherein the relation with the maximum prediction probability is the relation type of the subject entity and the object entity in the model prediction sentence.

Description

Relationship extraction method based on graph convolution

Technical Field

The invention relates to the field of text data relation extraction, in particular to a relation extraction method based on graph convolution.

Background

In the era of information explosion, a great amount of text data, such as news reports, blogs, research documents, social media comments and the like, appear on the internet every day, and how to quickly and effectively dig out valuable information from the massive text data becomes a challenge to be solved urgently. The relationship extraction is to identify semantic relationships between named entities for a given text sentence and the named entities marked therein.

The existing relation extraction technology generally takes the characteristics of sentences and words near entities as the input characteristics of a model, obtains an integral representation after a series of processing, and finally obtains the relation classification probability after a trained classifier.

Disadvantages of the prior art

The traditional method based on characteristics needs to convert the relationship examples into characteristic vectors which can be received by a classifier in a display mode, and research focuses on how to extract characteristics with distinctiveness, and characteristics including vocabularies, syntaxes, semantics and the like are generally integrated, so that various local characteristics and global characteristics for describing the relationship examples are generated. The kernel function based method directly takes the structure tree as a processing object, and uses the kernel function to calculate the distance between the relations. The deep learning-based method generally converts an input sentence into a word vector through a word vector matrix and then inputs the word vector as a model, then further extracts and fuses local vocabulary characteristics and global sentence characteristics, and finally obtains representation characteristics for relation classification. The statistical learning method based on the feature engineering and the kernel function has the defects of small expandability of a model, meanwhile, the extraction of the manually designed features depends on a natural language processing tool, the feature extraction process is also a tandem (Pipeline) process, and the result of the natural language processing of the previous step is used as the input of the next step, so the natural language processing tools are easy to cause error accumulation and transmission, and the extracted features are not accurate. Meanwhile, when the language is spoken, the method is greatly limited due to the lack of related natural language processing tools.

Disclosure of Invention

In view of the above, the present invention provides a relationship extraction method based on graph convolution, including:

language analysis preprocessing: performing word segmentation and dependency syntax analysis on an original sentence in a data set by means of a natural language analysis tool to obtain a word segmentation result of the original sentence, constructing and obtaining a dependency syntax tree which represents semantic dependency relationship between words in the original sentence, and generating an adjacency matrix according to topological relationship among nodes in the dependency syntax tree;

and (3) searching the word vector: each word of the original sentence can be converted into a corresponding word vector by querying a word vector table, so that vectorization representation of the original sentence is obtained;

extracting features by a graph convolution neural network: inputting the vectorization representation of each word in the adjacency matrix and the sentence into a graph convolution network, and learning to obtain feature representation;

and (4) relation classification: and splicing the characteristic representations, sending the spliced characteristic representations to a learning neural network to obtain final representations, and obtaining probability distribution of the entity pairs on each relation according to the characteristic representations, wherein the relation with the maximum prediction probability is the relation type of the subject entity and the object entity in the model prediction sentence.

In some embodiments, in particular, the method further comprises:

and carrying out entity recognition on the original sentence by using an entity recognition tool, and calling the obtained entities as a subject entity and an object entity according to the sequence of the appearance of the entities in the original sentence.

In some embodiments, in particular, the method further comprises:

dependency syntax tree pruning: and pruning in the dependency syntax tree according to subtrees formed by the subject entities, the object entities and the nearest common ancestors of the subject entities and the object entities in the sentences, and generating a pruned adjacency matrix according to the topological relation among the nodes in the pruned dependency syntax tree.

In some embodiments, in particular, the feature representation comprises:

the sentence integral representation, the subject entity representation and the object entity representation which fuse the context and the semantic dependency relationship.

In some embodiments, in particular, the learning neural network is a feed-forward neural network.

In some embodiments, specifically, the specific method for obtaining the probability distribution of the entity pairs on the relationships according to the feature representation includes:

and inputting the obtained final representation into a linear layer, and finally obtaining the probability distribution of the entity pairs on each relation through Softmax operation.

In some embodiments, specifically, the specific method of inputting the vectorized representation of each word in the adjacency matrix and the sentence into the graph convolutional network to learn and obtain the sentence overall representation fusing the context and the semantic dependency relationship is as follows:

wherein the content of the first and second substances,

h ^L()an overall implicit representation representing the L-th layer output of the graph convolution neural network;

GCN (.) denotes L-layer map convolutional neural network;

h ⁽⁰⁾representing an input layer of the graph convolutional neural network;

f(.) represents the maximum pooling function.

In some embodiments, the specific method for obtaining the subject entity representation specifically is:

wherein the content of the first and second substances,s1:s2, representing the index interval of all word sequences of the word sequences forming the subject entity after the words are segmented in the original sentence;h ^L() _{s s1:2}the index interval of all word sequences of the subject entity after the word sequence of the original sentence is segmented is used as input, and the subject entity output by the L-th layer of the graph convolution neural network is implicitly represented.

In some embodiments, the specific method for obtaining the object entity representation is:

wherein the content of the first and second substances,o1:o2, an index section of all word sequences of the word sequences forming the object entity after the words are segmented in the original sentence;h ^L() _{o o1:2}and the index interval of all the word sequences of the object entity after the word sequence of the original sentence is divided is used as input, and the object entity output by the L-th layer of the graph convolution neural network is implicitly represented.

In some embodiments, in particular, each node in the dependency syntax tree has a range affected by the neighborhood that is no more than L edges apart in the dependency tree. Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

the dependency syntax tree can help the relationship extraction model to capture long-distance semantic relationships between entities, and meanwhile, compared with the traditional statistical learning method and the model based on the sequence deep learning, the relationship extraction based on the graph convolution network and the dependency tree pruning technology can learn better entity representation and sentence representation in a context at a higher semantic level, and the final model has more advantages when extracting the semantic relationships between entities far away from each other in a sentence.

Drawings

Fig. 1 is a flowchart of a relationship extraction method based on graph convolution according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

As shown in fig. 1, the method for extracting a relationship based on graph convolution according to the embodiment of the present application includes:

segmenting the original sentence in the data set by means of a natural language analysis tool to obtain a segmentation result of the original sentence, so as to obtain a sentence represented by segmentationX=[X ₁,…X _n]Carrying out entity recognition on the original sentence by using an entity recognition tool, and calling the obtained entities as a subject entity and an object entity according to the sequence of the entities appearing in the original sentence; performing dependency syntax analysis on original sentences in a data set by means of a natural language analysis tool, representing each word as a node, taking semantic dependency relationship among the words as edges among corresponding nodes of related words, constructing and obtaining a dependency syntax tree representing the semantic dependency relationship among the words in the original sentences, and generating an adjacency matrix according to the topological relationship among the nodes in the dependency syntax tree, wherein the specific method comprises the following steps:

assuming that the original sentence is cut into n words during word segmentation, an n-row and n-column adjacency matrix is constructed corresponding to n nodes in the dependency syntax treeAWhereinA _ijWhen the value is 1, the dependency syntax tree shows that edges exist from the node i to the node j, namely, the semantic dependency relationship exists between the ith word and the jth word in the corresponding original sentence; otherwiseA _ijWhen the value is 0, the dependency syntax tree shows that no edge exists between the node i and the node j, namely, no semantic dependency relationship exists between the ith word and the jth word in the corresponding original sentence;

dependency syntax tree pruning: pruning in a dependency syntax tree according to subtrees formed by the subject entities, the object entities and the nearest common ancestors of the subject entities and the object entities in the sentences, and generating a pruned adjacency matrix according to the topological relation among nodes in the pruned dependency syntax tree;

extracting features by a graph convolution neural network: most of the information for judging the relation between the entity pair semantics is usually contained in a subtree taking the nearest common ancestor of the subject entity and the object entity as a root, and the subject entity representation and the object entity representation can be learned by aggregating neighborhood information by using a graph convolution network; inputting the adjacency matrix and vectorized representation of each word in the sentenceLIn the layered graph convolution network, the sentence integral representation, the subject entity representation and the object entity representation which are integrated with the context and the semantic dependency relationship are obtained through learning; the distance of the range of each node affected by the neighborhood in the dependency syntax tree does not exceed L edges, and L takes a value of 2 or 3;

the expression method of the L-layer graph convolution network comprises the following steps:

wherein, respectively useh _i ^l-1()Andh _i ¹⁽⁾representing an input vector and an output vector of an ith node in the graph volume network of the ith layer;

the matrix being a contiguous matrixAAnd identity matrixIA result matrix of the addition;

is the degree of node i in the graph;W ¹⁽⁾andb ¹⁽⁾model parameters, namely a first layer weight matrix and an offset item, which are respectively learned by the graph volume network; σ is a nonlinear activation function;

the concrete method for learning and obtaining the sentence integral representation fusing the context and the semantic dependency relationship by inputting the vectorized representation of each word in the adjacency matrix and the sentence into the graph convolution network is as follows:

wherein the content of the first and second substances,

GCN (.) denotes L-layer map convolutional neural network;

f(.) represents a maximum pooling function;

the specific method for obtaining the subject entity representation comprises the following steps:

wherein the content of the first and second substances,s1:s2, representing the index interval of all word sequences of the word sequences forming the subject entity after the words are segmented in the original sentence;h ^L() _{s s1:2}the index interval of all word sequences of the subject entity after the word sequence of the original sentence is segmented is used as input, and the subject entity output by the L-th layer of the graph convolution neural network is implicitly represented;

the specific method for obtaining the object entity representation comprises the following steps:

wherein the content of the first and second substances,o1:o2, an index section of all word sequences of the word sequences forming the object entity after the words are segmented in the original sentence;h ^L() _{o o1:2}the index interval of all word sequences of the object entity after the word sequence of the original sentence is divided is used as input, and the object entity output by the L-th layer of the graph convolution neural network is implicitly represented;

and (4) relation classification: the sentence integral representation, the subject entity representation and the object entity representation which are fused with the context and the semantic dependency relationship are spliced and then sent to a feedforward neural network to obtain the final representation, and the specific method comprises the following steps:

inputting the obtained final representation into a linear layer, and finally obtaining the probability distribution of the entity pairs on each relation through Softmax operation, wherein the specific formula of Softmax is as follows:

the relationship with the highest prediction probability is the relationship type of the subject entity and the object entity in the sentence predicted by the model.

Examples

The input data set: the ACE 2005 Data set is downloaded into a Linguistic Data Consortium office network, the Data in the ACE 2005 corpus folder comprises three languages of Arabic, English and Chinese, and multiple Data sources are arranged in each language.

suppose thatThe original sentence is cut into n words during word segmentation, and an n-row and n-column adjacency matrix is constructed corresponding to n nodes in the dependency syntax treeAWhereinA _ijWhen the value is 1, the dependency syntax tree shows that edges exist from the node i to the node j, namely, the semantic dependency relationship exists between the ith word and the jth word in the corresponding original sentence; otherwiseA _ijWhen the value is 0, the dependency syntax tree shows that no edge exists between the node i and the node j, namely, no semantic dependency relationship exists between the ith word and the jth word in the corresponding original sentence;

wherein the content of the first and second substances,

GCN (.) denotes L-layer map convolutional neural network;

f(.) represents a maximum pooling function;

The application also discloses a readable storage medium, which stores one or more programs, and the one or more programs can be executed by one or more processors to implement the graph convolution-based relationship extraction method described in the above embodiment.

The application also discloses computer equipment, which comprises a processor and a memory, wherein the memory is used for storing computer programs; the processor is configured to implement the steps of the graph convolution-based relationship extraction method when executing the computer program stored in the memory.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Embodiments of the subject matter and the functional operations described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and their structural equivalents, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., an internal hard disk or a removable disk), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for extracting relationships based on graph convolution, the method comprising:

2. The graph convolution-based relationship extraction method according to claim 1, further comprising:

3. The graph convolution-based relationship extraction method according to claim 2, further comprising:

4. The graph convolution-based relationship extraction method according to claim 2, wherein the feature representation includes:

5. The graph convolution-based relationship extraction method of claim 1, wherein the learning neural network is a feed-forward neural network.

6. The graph convolution-based relationship extraction method according to claim 5, wherein the specific method for obtaining the probability distribution of the entity pairs on each relationship according to the feature representation includes:

7. The graph convolution-based relationship extraction method according to claim 4, wherein a specific method for learning the vectorized representation of each word in the adjacency matrix and the sentence into the graph convolution network to obtain the sentence overall representation fused with context and semantic dependency relationships comprises:

wherein the content of the first and second substances,

GCN (.) denotes L-layer map convolutional neural network;

f(.) represents the maximum pooling function.

8. The graph convolution-based relationship extraction method according to claim 7, wherein the specific method for obtaining the subject entity representation is:

9. The graph convolution-based relationship extraction method according to claim 8, wherein a specific method for obtaining the object entity representation is:

10. The graph convolution-based relationship extraction method of claim 9, wherein a distance of a range of each node in the dependency syntax tree affected by a neighborhood does not exceed L edges in the dependency tree.