CN112131879A

CN112131879A - Relationship extraction system, method and device

Info

Publication number: CN112131879A
Application number: CN201910556214.XA
Authority: CN
Inventors: 张鹏
Original assignee: Potevio Information Technology Co Ltd
Current assignee: Potevio Information Technology Co Ltd
Priority date: 2019-06-25
Filing date: 2019-06-25
Publication date: 2020-12-25

Abstract

The embodiment of the invention provides a relationship extraction system, a relationship extraction method and a relationship extraction device. The system comprises: the BERT layer is used for classifying a vector matrix of input sentences, and the classified categories comprise entity-unrelated noise sentences and entity-related sentences; a Convolutional Neural Network (CNN) layer for converting a vector matrix of sentences classified as entity-related sentences into sentence vectors; the attention layer is used for calculating the similarity between the sentence vectors and the relation vectors, setting weights for the sentence vectors based on the similarity, and obtaining an output matrix of the relation vectors based on the weighted summation result of the sentence vectors; the linear layer is used for reducing the dimension of the output matrix to obtain an output vector; and the Softmax layer is used for executing probability planning calculation on the output vector to obtain the relation probability distribution of the input sentence.

Description

Relationship extraction system, method and device

Technical Field

The invention belongs to the technical field of information extraction, and particularly relates to a relationship extraction system, a relationship extraction method and a relationship extraction device.

Background

Information extraction aims at extracting structured information from large-scale unstructured or semi-structured natural language text. Relationship extraction is one of the important subtasks, and the main purpose is to identify entities from text and extract semantic relationships between entities. The relation extraction refers to a task of judging the relation between two entities given by two entities and sentences where the two entities are located. For example, the sentence "belgium is founder of microsoft corporation" contains one entity pair (belgium, microsoft corporation), and the relationship between the two entities is "founder".

Deep learning is one of the hot fields of machine learning research, and the main idea is to simulate a human brain neural network to establish a learning model and learn useful information from different data such as voice, images or texts. Typical deep learning methods include Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), and these models have significant effects in text classification, machine translation, intelligent question answering, and other directions.

Currently, the conventional relationship extraction model is a CNN + Attention (Attention) model. Based on multi-instance hypothesis, the method extracts the features of each input sentence to obtain sentence-level features, and then scores each sentence through an Attention mechanism, so that wrong sentences marked by relations are removed, and the accuracy of overall relation classification is improved. FIG. 1 is a schematic diagram of a prior art CNN + Attention model-based relationship extraction system.

However, this method is limited by the size of the corpus and cannot learn more linguistic features. Meanwhile, noise in the input sentence cannot be effectively removed only by using an attention mechanism, and the final classification accuracy is influenced.

Disclosure of Invention

The embodiment of the invention provides a relationship extraction system, a method and a device.

The technical scheme of the embodiment of the invention is as follows:

a relationship extraction system, comprising:

a Bidirectional Encoder tokens (BERT) layer from the transformer for classifying a vector matrix of input sentences, the classified classes including entity-independent noise sentences and entity-dependent sentences;

a CNN layer for converting the vector matrix of sentences classified as entity-related sentences into sentence vectors;

the attention layer is used for calculating the similarity between the sentence vectors and the relation vectors, setting weights for the sentence vectors based on the similarity, and obtaining an output matrix of the relation vectors based on the weighted summation result of the sentence vectors;

the linear layer is used for reducing the dimension of the output matrix to obtain an output vector;

and the Softmax layer is used for executing probability planning calculation on the output vector to obtain the relation probability distribution of the input sentence.

In one embodiment, further comprising:

a word vector calculation tool for dividing the input sentence into 100 words, wherein redundant parts of more than 100 words in the input sentence are cut off, and insufficient parts of the input sentence with less than 100 words are complemented with zero; converting the 100 words into word vectors with the dimension of 50 respectively; a vector matrix of the input sentence is generated based on the word vector for each word.

In one embodiment, the CNN layer is configured to convert a vector matrix of sentences classified as entity-related sentences into a sentence vector with dimension 230; the number of the relation vectors is N; the output matrix is an Nx 230 matrix;

the linear layer is used for converting the Nx 230 matrix into an N-dimensional output vector.

In one embodiment, the BERT layer is configured to append a label value 0 to a vector matrix of a sentence with a noise category, and append a label value 1 to a vector matrix of a sentence with a relationship between entities;

and the CNN layer is used for converting the vector matrix of the label value 1 into a sentence vector.

In one embodiment, the loss function of the relational extraction system is L, where L is L₁+L₂；

Wherein L is₁As sentencesClass loss function, L₂A loss function is classified for the relationship.

A relationship extraction method, comprising:

enabling a BERT layer to classify a vector matrix of input sentences, wherein the classified categories comprise entity-unrelated noise sentences and entity-related sentences;

enabling the CNN layer to convert the vector matrix of the sentences classified into entity relation into sentence vectors;

enabling an attention layer to calculate similarity between the sentence vectors and the relation vectors, setting weights for the sentence vectors based on the similarity, and obtaining an output matrix of the relation vectors based on weighted summation results of the sentence vectors;

enabling a linear layer to reduce the dimension of the output matrix to obtain an output vector;

enabling a Softmax layer to perform a probability planning calculation on the output vector to obtain a relational probability distribution of the input sentence.

In one embodiment, the method further comprises:

enabling a word vector calculation tool to divide the input sentence into 100 words, wherein redundant parts of more than 100 words in the input sentence are cut off, and insufficient parts of the input sentence with less than 100 words are complemented by zero; converting the 100 words into word vectors with the dimension of 50 respectively; a vector matrix of the input sentence is generated based on the word vector for each word.

In one embodiment, enabling the BERT layer to classify a vector matrix of input sentences includes: enabling the BERT layer to attach a label value 0 to a vector matrix of a sentence with a noise category and attach a label value 1 to a vector matrix of a sentence with an entity relationship category;

the enabling of the CNN layer to convert the vector matrix of sentences classified as entity-related sentences into a sentence vector comprises: enabling the CNN layer to convert a vector matrix of tag values 1 into a sentence vector.

A relationship extraction apparatus comprising a processor and a memory;

the memory has stored therein an application executable by the processor for causing the processor to execute the relationship extraction method as defined in any one of the above.

A computer readable storage medium having computer readable instructions stored therein for performing a relationship extraction method as described in any one of the above.

According to the technical scheme, the system comprises: the BERT layer is used for classifying a vector matrix of input sentences, and the classified categories comprise entity-unrelated noise sentences and entity-related sentences; the convolutional neural network CNN layer is used for converting the vector matrix of the sentences classified into entity relationships into sentence vectors; the attention layer is used for calculating the similarity between the sentence vectors and the relation vectors, setting weights for the sentence vectors based on the similarity, and obtaining an output matrix of the relation vectors based on the weighted summation result of the sentence vectors; the linear layer is used for reducing the dimension of the output matrix to obtain an output vector; and the Softmax layer is used for executing probability planning calculation on the output vector to obtain the relation probability distribution of the input sentence. Therefore, the BERT layer is adopted to classify the vector matrix of the input sentence, the noise in the input sentence is effectively removed, and the classification accuracy is improved.

Moreover, the BERT layer contains more linguistic features, further improving the classification accuracy.

In addition, the irrelevant sentences of the two entities are randomly introduced to serve as noise, so that the performance of relation extraction can be improved.

Drawings

FIG. 1 is a schematic diagram of a prior art CNN + Attention model-based relationship extraction system.

FIG. 2 is a schematic diagram of a relationship extraction system according to the present invention.

FIG. 3 is a flow chart of a relationship extraction method according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.

For simplicity and clarity of description, the invention will be described below by describing several representative embodiments. Numerous details of the embodiments are set forth to provide an understanding of the principles of the invention. It will be apparent, however, that the invention may be practiced without these specific details. Some embodiments are not described in detail, but rather are merely provided as frameworks, in order to avoid unnecessarily obscuring aspects of the invention. Hereinafter, "including" means "including but not limited to", "according to … …" means "at least according to … …, but not limited to … … only". In view of the language convention of chinese, the following description, when it does not specifically state the number of a component, means that the component may be one or more, or may be understood as at least one.

In the embodiment of the invention, a pre-training mechanism is introduced, and the model structure of relation extraction is BERT + CNN + Attention, wherein the BERT is pre-trained and used for classifying sentences, the function of the BERT is equivalent to a filter, and noise sentences in input sentences (bag) are filtered. In addition, in order to improve the classification performance of the BERT, a negative sampling technology can be adopted, the sentence without the relationship of the entity is randomly introduced into the input sentence to be used as noise, the accuracy rate of the BERT classification is detected, and finally the loss of the item is added into the loss function, so that the classification accuracy rate of the BERT is effectively improved, and the performance of relationship extraction is improved.

The relationship extraction system of the present invention includes:

the BERT layer is used for classifying the vector matrix of the input sentences, and the classified categories comprise entity-unrelated noise sentences and entity-related sentences;

an Attention (Attention) layer, configured to calculate similarity between a sentence vector and a relationship vector, set a weight for the sentence vector based on the similarity, and obtain an output matrix of the relationship vector based on a weighted summation result of the sentence vector;

In one embodiment, the relationship extraction system further comprises:

a word vector calculation tool for dividing the input sentence into 100 words, wherein the redundant part of more than 100 words in the input sentence is cut off, and the insufficient part of the input sentence with less than 100 words is complemented by zero; converting the 100 words into word vectors with the dimension of 50 respectively; a vector matrix of the input sentence is generated based on the word vector for each word.

It can be seen that based on the word vector computation tool, a vector matrix can be generated for each input sentence. The vector matrix for each input sentence constitutes the input of the BERT layer.

Preferably, the Word vector computation tool may be implemented as Word2 vec. Word2vec is a group of related models for generating Word vectors, which are shallow two-layer neural networks used for training to reconstruct linguistic Word text. Under the assumption of the bag of words model in word2vec, the order of words is not important, and after training is completed, the word2vec model can be used to map each word to a vector, which can be used to represent the word-to-word relationship, and the vector is a hidden layer of the neural network. Word2vec relies on skip-grams or continuous Word bag (CBOW) to establish neural Word embedding.

While the foregoing exemplary description describes Word2vec as a Word vector computation tool, those skilled in the art will appreciate that this description is exemplary only and is not intended to limit the scope of embodiments of the present invention.

In one embodiment, a CNN layer for converting a vector matrix of sentences classified as entity-related into sentence vectors of dimension 230; the number of relationship vectors is N; the output matrix is an Nx 230 matrix; a linear layer for converting the N × 230 matrix into an N-dimensional output vector.

The loss function of the relation extraction system is L, and L is L₁+L₂(ii) a Wherein L is₁For sentence classification loss function, L₂A loss function is classified for the relationship. Preferably, the sentence classification and the relationship classification each employ a cross-entropy loss function, where L₁As a loss function of the BERT layer, L₂Is a loss function of the attention layer.

L₁＝-(p1logq1+(1-p1)log(1-q1))；L₂＝-(p2logq2+(1-p2)log(1-q2))；

Wherein p1 represents the probability distribution of correct samples in the training process of the BERT layer, and q1 represents the probability distribution of predicted samples output by the BERT layer; p2 represents the probability distribution of the correct samples in the attention layer training process, and q represents the probability distribution of the predicted samples of the attention layer output.

In fig. 2, X1, X2, and X3 … XN represent N vector matrices corresponding to N input sentences. The process of generating a vector matrix for each input sentence includes: and converting each word in the input sentence into a word vector by adopting a word2vec tool, wherein the dimension is 50, each input sentence is forcibly cut into 100 words, the excess part is cut off, and the insufficient part is filled with 0.

The N vector matrices are input to the pre-trained BERT layer, respectively.

BERT essentially learns a good feature representation for a word by running an auto-supervised learning method on the basis of a large amount of corpora, so-called auto-supervised learning refers to supervised learning that runs on data without artificial labels. In a later specific NLP task, the characteristics of BERT can be used directly as the word embedding characteristics of the task. BERT provides a model for migratory learning of other tasks, which may be fine-tuned or fixed according to the task and then used as a feature extractor. The network architecture of the BERT uses a multi-layer Transformer structure proposed in the Attention all you needed, and the maximum characteristic is that the traditional RNN and CNN are abandoned, and the distance between two words at any position is converted into 1 through an Attention mechanism, thereby effectively solving the problem of troublesome long-term dependence in NLP. The structure of the Transformer has been widely applied in the field of NLP.

The BERT layer uses the same two-phase model as GPT. Firstly, pre-training a language model; secondly, the Fine-Tuning mode is used for solving the downstream task. It is developed by google, and trains with a large number of unlabeled texts to learn linguistic knowledge in unstructured texts to obtain a pre-training network. After pre-training is complete, the BERT layer may be used to complete various natural language processing problems. The BERT layer further increases the generalization capability of a word vector model and fully describes the character level, the word level, the sentence level and even the inter-sentence relation characteristics. BERT transfers a large amount of operations traditionally done in downstream concrete NLP tasks to pre-training word vectors, and finally only a simple MLP or linear classifier is added to the word vectors after the used BERT word vectors are obtained.

Here, the BERT layer has already completed pre-training and is used to complete a classification task, i.e., whether the corresponding sentence is a noise sentence is determined based on each input vector matrix, if the sentence is useful for relation classification and the relation between two entities is truly expressed, then the BERT layer outputs a label 1, otherwise the BERT layer outputs a label 0.

In order to further detect the performance of the BERT layer in the whole network architecture, entity-independent labeled sentences can be randomly introduced into training data to serve as noise, so that the accuracy of the BERT layer classification can be detected. On the other hand, the part of the labeled sentences is also included in the final loss function, and the network can be further forced to be adjusted so as to improve the accuracy.

Thus, the output of the BERT layer is a series of labels of 0 and 1 corresponding to the vector matrix of the input sentence, and if the label corresponding to the vector matrix is 0, the vector matrix will not participate in the following steps, i.e., will be filtered and will not be input to the CNN layer. Thus, the input to the CNN layer is a vector matrix labeled 1. Assume that the label of m vector matrices is 1, i.e., the BERT layer filters (n-m) vector matrices.

In the CNN layer, the vector matrixes with the labels of 1 are subjected to convolution, maximum pooling, nonlinear activation functions and other steps to obtain sentence vector output. At the CNN layer, the vector matrix of each sentence is converted into a 230-dimensional vector, namely a sentence vector. Therefore, m sentence vectors corresponding to m sentences, namely S1, S2 and S3 … Sm, are obtained after the CNN layer.

The m sentence vectors S1, S2, S3 … Sm are to be input into the attention layer, respectively. The attribute layer stores the relationship vector of the defined relationship category. The number of relationship vectors may be plural. Here, for each relation vector, the similarity between each sentence vector and the relation vector is calculated respectively to set a weight for each sentence vector, and finally the vector output of the relation vector is obtained by weighted summation. For example, when the CNN converts a vector matrix classified as an entity-related sentence into a sentence vector of dimension 230, if the number of relationship vectors is N, the output matrix of the CNN is an N × 230 matrix.

Then, the output vector output by the attribute layer is subjected to dimensionality reduction through the linear layer (for example, when the output matrix of the CNN is an N × 230 matrix, the CNN is converted into an N-dimensional output vector), and probability planning calculation is performed on the output vector by the softmax layer to obtain a relationship probability distribution of the input sentence, so as to form classification finally. Parameter optimization adopts Adam algorithm to optimize a loss function, meanwhile, the learning rate of every 3000 steps is exponentially attenuated, the attenuation base number is 0.1, and other parameters adopt default settings

Therefore, the loss function comprises two parts, one is a loss function of relational classification, and the other is a loss function brought by sentence classification in a BERT layer.

Based on the above description, the embodiment of the invention also provides a relationship extraction method.

As shown in fig. 3, the method includes:

step 301: enabling a BERT layer to classify a vector matrix of input sentences, wherein the classified categories comprise entity-unrelated noise sentences and entity-related sentences;

step 302: enabling the CNN layer to convert the vector matrix of the sentences classified into entity relation into sentence vectors;

step 303: enabling an attention layer to calculate similarity between the sentence vectors and the relation vectors, setting weights for the sentence vectors based on the similarity, and obtaining an output matrix of the relation vectors based on weighted summation results of the sentence vectors;

step 304: enabling a linear layer to reduce the dimension of the output matrix to obtain an output vector;

step 305: enabling a Softmax layer to perform a probability planning calculation on the output vector to obtain a relational probability distribution of the input sentence.

In one embodiment, the method further comprises:

The embodiment of the invention also provides a relation extraction device. The part-of-speech tagging device comprises a processor and a memory; the memory has stored therein an application executable by the processor for causing the processor to execute the relationship extraction apparatus method as defined in any one of the above.

The memory may be embodied as various storage media such as an Electrically Erasable Programmable Read Only Memory (EEPROM), a Flash memory (Flash memory), and a Programmable Read Only Memory (PROM). The processor may be implemented to include one or more central processors or one or more field programmable gate arrays, wherein the field programmable gate arrays integrate one or more central processor cores. In particular, the central processor or central processor core may be implemented as a CPU or MCU.

To sum up, the system according to the embodiment of the present invention includes: the BERT layer is used for classifying a vector matrix of input sentences, and the classified categories comprise entity-unrelated noise sentences and entity-related sentences; the convolutional neural network CNN layer is used for converting the vector matrix of the sentences classified into entity relationships into sentence vectors; the attention layer is used for calculating the similarity between the sentence vectors and the relation vectors, setting weights for the sentence vectors based on the similarity, and obtaining an output matrix of the relation vectors based on the weighted summation result of the sentence vectors; the linear layer is used for reducing the dimension of the output matrix to obtain an output vector; and the Softmax layer is used for executing probability planning calculation on the output vector to obtain the relation probability distribution of the input sentence. Therefore, the BERT layer is adopted to classify the vector matrix of the input sentence, the noise in the input sentence is effectively removed, and the classification accuracy is improved.

Moreover, the BERT layer contains more linguistic features, further improving the classification accuracy. In addition, the irrelevant sentences of the two entities are randomly introduced to serve as noise, so that the performance of relation extraction can be improved.

It should be noted that not all steps and modules in the above flows and system structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The system structure described in the above embodiments may be a physical structure or a logical structure, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities, or some components in a plurality of independent devices may be implemented together.

The hardware modules in the various embodiments may be implemented mechanically or electronically. For example, a hardware module may include a specially designed permanent circuit or logic device (e.g., a special purpose processor such as an FPGA or ASIC) for performing specific operations. A hardware module may also include programmable logic devices or circuits (e.g., including a general-purpose processor or other programmable processor) that are temporarily configured by software to perform certain operations. The implementation of the hardware module in a mechanical manner, or in a dedicated permanent circuit, or in a temporarily configured circuit (e.g., configured by software), may be determined based on cost and time considerations.

The present invention also provides a machine-readable storage medium storing instructions for causing a machine to perform a method as described herein. Specifically, a system or an apparatus equipped with a storage medium on which a software program code that realizes the functions of any of the embodiments described above is stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program code stored in the storage medium. Further, part or all of the actual operations may be performed by an operating system or the like operating on the computer by instructions based on the program code. The functions of any of the above-described embodiments may also be implemented by writing the program code read out from the storage medium to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causing a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on the instructions of the program code.

Examples of the storage medium for supplying the program code include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs, DVD + RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or the cloud by a communication network.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A relationship extraction system, comprising:

a bidirectional encoder token BERT layer from the transformer, for classifying a vector matrix of input sentences, the classified categories including entity-independent noise sentences and entity-related sentences;

the convolutional neural network CNN layer is used for converting the vector matrix of the sentences classified into entity relationships into sentence vectors;

2. The relationship extraction system according to claim 1, further comprising:

3. The relationship extraction system according to claim 2,

the CNN layer is used for converting a vector matrix of sentences classified as entity relation into a sentence vector with the dimension of 230; the number of the relation vectors is N; the output matrix is an Nx 230 matrix;

4. The relationship extraction system according to claim 1,

the BERT layer is used for attaching a label value 0 to a vector matrix of a sentence with a noise category and attaching a label value 1 to a vector matrix of a sentence with an entity relationship category;

5. The relationship extraction system according to claim 1,

the loss function of the relation extraction system is L, and L is L₁+L₂；

Wherein L is₁For sentence classification loss function, L₂A loss function is classified for the relationship.

6. A method of relational extraction, comprising:

enabling a bidirectional encoder token BERT layer from a converter to classify a vector matrix of input sentences, wherein the classified categories comprise entity-independent noise sentences and entity-related sentences;

7. The relationship extraction method as claimed in claim 6, further comprising:

8. The relationship extraction method according to claim 6,

enabling the BERT layer to classify a vector matrix of input sentences includes: enabling the BERT layer to attach a label value 0 to a vector matrix of a sentence with a noise category and attach a label value 1 to a vector matrix of a sentence with an entity relationship category;

9. A relationship extraction apparatus comprising a processor and a memory;

the memory has stored therein an application executable by the processor for causing the processor to perform the relationship extraction method as claimed in any one of claims 6-8.

10. A computer-readable storage medium having computer-readable instructions stored therein for performing the relationship extraction method of any one of claims 6-8.