CN108829722B

CN108829722B - Remote supervision Dual-Attention relation classification method and system

Info

Publication number: CN108829722B
Application number: CN201810432079.3A
Authority: CN
Inventors: 贺敏; 毛乾任; 王丽宏; 李晨
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2020-10-02
Anticipated expiration: 2038-05-08
Also published as: CN108829722A

Abstract

The invention relates to a remote supervision Dual-Attention relation classification method and a system, comprising the following steps: aligning entity pairs in a knowledge base to news corpora through remote supervision, and constructing an entity-to-sentence set; carrying out word-level vector coding on the sentence based on a Bi-LSTM model of a word-level attention mechanism to obtain a semantic feature coding vector of the sentence; the method comprises the steps that semantic features of a sentence are coded and denoised based on a Bi-LSTM model of a sentence level attention mechanism, and a sentence set feature coding vector is obtained; and packaging the sentence set feature coding vector and the entity pair translation vector, and carrying out entity pair relation classification on the obtained packet features. The technical scheme provided by the invention reduces the noise data of model training and avoids manual marking data and error transmission caused by the manual marking data. And the entity alignment is carried out by using the open domain text and the large-scale knowledge base, so that the problem of scale of the labeled data extracted by the relation is effectively solved.

Description

Remote supervision Dual-Attention relation classification method and system

Technical Field

The invention belongs to the field of relation classification, and particularly relates to a remote-supervised Dual-attribute relation classification method and system.

Background

With the development of internet technology, a great deal of text information on the world wide web is rapidly growing, and a technology for automatically extracting knowledge from the text information is receiving more and more attention and becomes a current hotspot. The current mainstream relation extraction method is a relation classification method based on neural network learning, and mainly faces three problems: difficulty in representation and mining of semantic features, error transmission caused by manual labeling, and noise influence of model training. At present, in relation classification methods based on neural network learning, a relation classification method achieving the optimal effect appears in supervised learning and remote supervision. By taking the two learning methods as approaches, corresponding improved models appear aiming at three problems, wherein the three problems mainly comprise: a Bi-directional long and short memory network (Bi-LSTM) method is extracted by a supervised learning relation; a remote supervised relationship classification method of Convolutional Neural Network (CNN); a method of relational classification based on a sentence set level attention mechanism for convolutional networks (CNN).

In the face of three major problems of relation classification, the mainstream neural network relation classification method makes better improvement effect on a certain specific problem. However, certain problems exist, the method depends on knowledge in a specific field, and the robustness and the application scene of the model are relatively limited.

Firstly, the relation classification method is carried out by Bi-LSTM alone, although the effective coding of the long-distance semantic features existing in the text is realized. However, the method still depends on manually labeled data sets, and the model only selects one sentence for learning and prediction, does not consider the noisy sentence, and is limited to knowledge in a specific field.

Secondly, the remote supervision method of weak supervision is premised on that: assuming that two entities have a certain relationship in the knowledge base, all sentences containing the two entities in the knowledge will express the relationship. Such an assumption is not completely correct, so that in the process, the automatic generation of the training data has wrong labeling data, which brings noise to the training process. And when the model is trained, selecting the sentence with the highest probability of the entity to the sentence with the relationship as training. The method for selecting the maximum probability does not fully utilize all sentences containing the two entities as the training corpus, and a large amount of information is lost.

In addition, the remote supervision relation classification method based on the attention mechanism of the Convolutional Neural Networks (CNNs) can effectively classify the local semantic features of the text although the influence of wrong labeling is reduced. However, each layer in the CNNs model adopted in the method is fixed in span, and naturally, the layer can only model semantic information with limited distance and is only suitable for the relation extraction tasks in some short texts. Although the partially improved convolutional network model has a modeling for realizing larger span information by overlapping K-segment maximum pooling structures, such as an experiment for performing three-segment pooling through pcnns (piewin cnns), the maximum pooling method has higher cost and relatively weaker performance when extracting semantic features with long dependency relationship in long texts, compared with Bi-LSTM.

Therefore, it is necessary to provide a remote supervised Dual-Attention relationship classification method and system to solve the deficiencies of the prior art.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a remote supervision Dual-attribute relation classification method and system, which automatically acquire a labeled corpus from a knowledge base WikiData and find a commonly occurring sentence of the entity pair from an open domain as a training corpus. The neural network learning model is used for finishing the task of relation extraction by taking the classification of the predefined relation as a target.

A remote supervised Dual-Attention relationship classification method comprises the following steps:

aligning entity pairs in a knowledge base to news corpora through remote supervision, and constructing an entity-to-sentence set;

carrying out word-level vector coding on the sentence based on a Bi-LSTM model of a word-level attention mechanism to obtain a semantic feature coding vector of the sentence;

the method comprises the steps that semantic features of a sentence are coded and denoised based on a Bi-LSTM model of a sentence level attention mechanism, and a sentence set feature coding vector is obtained;

and packaging the sentence set feature coding vector and the entity pair translation vector, and carrying out entity pair relation classification on the obtained packet features.

Further, performing word-level vector coding on the sentence based on a Bi-LSTM model of a word-level attention mechanism to obtain a semantic feature coding vector of the sentence, including:

processing the sentence by adopting a text depth representation model to obtain a word vector of each word in the sentence;

inputting the word vector into a Bi-LSTM model to obtain a coding vector of the word vector;

and adding a word level attention mechanism into the coding vector of the word vector to obtain a semantic feature coding vector of each sentence.

Further, inputting the word vector into a Bi-LSTM model to obtain an encoded vector of the word vector, including:

inputting the word vector into a Bi-LSTM model;

the forward LSTM of the model obtains the above feature information of the word vector, and the backward LSTM of the model obtains the below feature information of the word vector;

and finally, obtaining the context coding vector of the word vector.

Further, adding a word-level attention mechanism to the coding vector of the word vector to obtain a semantic feature coding vector of each sentence, including:

said adding a word-level attention mechanism to said encoded vector;

connecting each time node in the LSTM by a weight vector by calculating attention probability distribution;

and obtaining a semantic feature coding vector of each sentence.

Further, the method for coding and denoising semantic features of a sentence based on a Bi-LSTM model of a sentence level attention mechanism to obtain a sentence set feature coding vector includes:

inputting the semantic feature coding vector of the sentence into a Bi-LSTM model to obtain a feature coding vector of a sentence set;

and adding a sentence level attention mechanism into the feature coding vector of the sentence set to obtain the noise-reduced sentence set feature coding vector.

Further, adding a sentence level attention mechanism to the feature coding vector of the sentence set to obtain a noise-reduced sentence set feature coding vector, including:

adding sentence level attention mechanism weight to each sentence, so that the weight of an effective sentence is great, and the weight of a noise sentence is small;

and obtaining the noise-reduced sentence set feature coding vector.

Further, the sentence set feature encoding vector and the entity pair translation vector are packed, and the obtained packet features are subjected to entity pair relationship classification, including:

introducing a translation vector of an entity pair translation model, giving different weights to sentences with different confidence degrees, and reducing the noise of a sentence set;

introducing the difference value of the entity pair vector as another feature of the similarity measurement sentence into a sentence set to obtain a packet feature;

and carrying out relation classification on the packet features by using a multi-example learning method.

Further, the relationship classification of the packet features by using a multi-example learning method comprises the following steps:

if at least one example of a sentence in a packet is judged to be positive by the classifier, the sentence in the packet is positive example data; if all sentences in a packet are judged to be negative by the classifier, the sentences in the packet are negative example data;

carrying out multi-example learning on the sentence with the tag to obtain a feature representation containing multiple feature relation information;

and predicting which relationship of the entity pair is given by a Softmax relationship classification method to obtain the probability sequence of each relationship.

A remotely supervised Dual-Attention relationship classification system, comprising:

the building module is used for aligning the entity pairs in the knowledge base to news corpora through remote supervision and building an entity-to-sentence set;

the first vector module is used for carrying out word-level vector coding on the sentence based on a Bi-LSTM model of a word-level attention mechanism to obtain a semantic feature coding vector of the sentence;

the second vector module is used for coding and denoising the semantic features of the sentence based on a Bi-LSTM model of the sentence level attention mechanism to obtain a sentence set feature coding vector;

and the relation classification module is used for packing the sentence set feature coding vector and the entity pair translation vector and carrying out entity pair relation classification on the obtained packet features.

Compared with the closest prior art, the technical scheme provided by the invention has the following advantages:

the technical scheme provided by the invention reduces the noise data of model training and avoids manual marking data and error transmission caused by the manual marking data. And the entity alignment is carried out by using the open domain text and the large-scale knowledge base, so that the problem of scale of the labeled data extracted by the relation is effectively solved.

The technical scheme provided by the invention combines the Bi-LSTM word and sentence level feature coding to construct a packet feature training method, and adds an attention weight mechanism of sentence sets and an entity pair translation vector R_relationAnd reducing the weight of invalid sentences, and constructing the packet feature codes of possible relations to carry out multi-example learning. The attention weight of the sentence is combined with the packet feature multi-example learning training of the translation vector, so that the effective vector representation of the relation semantic information is realized, and the accuracy of the relation extraction task is improved.

The technical scheme provided by the invention constructs an end-to-end relation extraction task, and does not depend on complex labeling characteristics such as part of speech, dependency syntax and the like of manual labeling, and the probability value of the relation and the corresponding relation between the word vector input into a sentence from the model and the entity pair output from the model is obtained. The whole process is an end-to-end process, and the coding method of the Dual-Attention mechanism effectively codes the important word characteristics of a sentence at the sentence level; the influence of noise problems brought by a remote supervision method on model training is reduced at the sentence set level, and the accuracy of the trained model is higher.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a detailed flow chart of an embodiment of the present invention;

FIG. 3 is a diagram of the Dual-Attention relationship classification model in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Examples 1,

As shown in fig. 1, an embodiment of the present invention provides a remote-supervised Dual-attribute relationship classification method, including:

Fig. 2 shows a detailed flowchart of an embodiment of the present invention.

Preferably, aligning the entity pairs in the knowledge base to a corpus by remote supervision, and constructing the entity pair sentence set includes:

the premise hypothesis of the remote supervision method is as follows: if two entities have a certain relationship in the knowledge base, then an unstructured sentence containing the two entities can both represent the relationship. For example, "Jack Ma" and "Alibaba" have a relationship of "provider" in WikiData, and the unstructured text "Jack Ma is the provider of Alibaba" containing these two entities can be used as a training example to train the model. The specific implementation steps of the data construction method are as follows:

the method comprises the following steps: pairs of entities that have relationships, such as "Jack Ma", "Alibaba" herein, are extracted from the knowledge base. And the relation in the knowledge base is known as R₁＝“founder”,R₂＝“CEO”,R₃＝“Boss”,R₄…R_pAnd so on.

Step two: extracting sentences containing entity pairs from unstructured texts as training samples, and crawling a sentence set S containing entity pairs in news texts₁,S₂…S_n-1,S_n}. Forming the initial corpus.

Preferably, the performing word-level vector coding on the sentence based on a Bi-LSTM model of the word-level attention mechanism to obtain a semantic feature coding vector of the sentence includes:

in the main corpus of relational classification, such as unstructured news text, sentences are generally long, wherein entity pairs and their relations are far apart from each other when viewed from the position of a word, i.e., semantic relations show long-distance dependency characteristics. Therefore, the Bi-LSTM model is selected to effectively mine the strong feature semantics of the sentences, so that the semantic feature coding of the sentences containing the entities in the sentence set is realized, and the model structure is shown in FIG. 3. Wherein the content of the first and second substances,

the embedding's characteristic of the 1 st word representing the nth sentence,

the 1 st word of the nth sentence is represented by a hidden vector encoding of contextual features,

the context coding vector of the 1 st word of the nth sentence is obtained by combining hidden vector coding after adding the Attention weight. The detailed processing steps are as follows:

the method comprises the following steps: a sentence with co-occurring entities is used as input. A word embedding processing mode of word2vec is selected to map each word in the sentence into a low-dimensional vector, and a character embedding vector of each word is obtained.

Step two: taking the vector of the word obtained in the step one as input, and obtaining the semantic strong features of the sentence from the input vector by utilizing a Bi-LSTM model, wherein the strong features refer to some long textsRemote dependent semantic features in the present document. The bidirectional long-short term memory network is provided with a forward LSTM and a backward LSTM at the hidden layer, the forward LSTM captures the characteristic information of the context, and the backward LSTM captures the characteristic information of the context to obtain the context coding vector

Wherein l_nRepresenting the number of word vectors for the length of the sentence.

Step three: coding the context vector obtained in the step two

The Attention mechanism is added, and each time node in the LSTM is connected by the weight vector by calculating the Attention probability distribution. The step mainly highlights the influence of a certain key input on output, captures words of important features in a sentence, and acquires the output features of the bidirectional LSTM according to attention probability.

Step four: deriving semantic feature vector coding [ S 'for each sentence'₁,S′₂...S'_n]。

Preferably, the method for coding and denoising semantic features of a sentence based on a Bi-LSTM model of a sentence-level attention mechanism to obtain a sentence set feature coding vector includes:

for the training sentence corpus, assuming that at least one sentence in all sentences of each entity pair reflects the relationship of the entity pair, we select the sentences containing the entity pair and package the sentences, but need to filter the noisy sentences corresponding to the entity pair during training, for example, we need to extract the relationship of "found", but the sentence of "Jack Ma" and "aiba" does not show the relationship of "Jack Ma" and "aiba" as "found", and the sentence of "found" is a noisy sentence. This problem is solved by a neural network model based on a subset of sentences level attention mechanism, which can assign weights to each sentence of an entity pair according to a specific relationship, enabling valid sentences to obtain higher weights and noisy sentences to obtain lower weights by continuous learning.

The model is shown in the upper part of FIG. 3, where S_i' sentence S output for first model_iThe feature code vector h of the possible relationship of different sentences in the Bi-LSTM training model_iVector R_relation＝e₁–e₂The features of the relationship R are included. If a sentence instance expresses the relation R, it should be related to the vector R_relationHas higher similarity, and can be used as the similarity constraint of the training positive example. A. the_iRepresenting the corresponding weights of the different sentences. The method comprises the following specific steps:

the method comprises the following steps: all sentence feature vectors [ S 'containing entity pairs'₁,S'₂...S'_n]And inputting as a Bi-LSTM model to obtain the feature codes at the sentence set level. For example, when a fountain relational classification model is trained, a relational triple "(Jack Ma, fountain, Alibaba)" exists in a relational database, and according to the assumption of remote supervision, (S)_iJack Ma, foundation, Alibaba) is a positive example of the relation, the vector weight of the sentence should be high, and feature codes at the sentence set level are obtained by continuously learning positive sentences.

Step two: each Sentence is assigned a sequence-level Attention weight, so that valid sentences get higher weight and noisy sentences get lower weight by continuous learning. Because the core assumption of remote supervision is wrong, sentences which do not express the relationship between the entities can be wrongly marked, after the characteristics of the entities to the sentences are obtained, different weights are added to the sentences by using a selective attribution mechanism, and the influence of invalid sentences is reduced.

Preferably, the step of packing the sentence set feature encoding vector and the entity pair translation vector and performing entity pair relationship classification on the obtained packet features includes:

introducing a translation vector R_relationSentence vector S obtained from the previous model_iContaining entity pair implication relationThe semantic information of R is the feature code of a sentence. For each according to entity pair (e)₁,e₂) Each instance sentence in the set of packed sentences may express a relationship R or other relationship. Then the sentence vector encoded for the features containing this relationship should have a correlation with the translation vector R during model training_relationVery high similarity. Here, the sequence-level Attention weight is associated with the translation vector R_relationActing on each sentence together to reduce the coding impact of invalid sentences.

The multi-example learning training method of the packet features comprises the steps of packaging semantic features obtained by all encoding in the steps, initializing samples in each label packet B into labels of the packets by continuously learning as the multi-example learning method of the packet features, initializing a set U to be empty, and adding all the samples into a sample set U. Repeating the following processes, sampling the data and performing label training to obtain a classification function fB; predicting the marks of all samples by using f, and emptying U; for each positive marker packet, selecting a sample with the highest fB prediction score and adding the sample into a set U; for each negative marker packet, selecting a sample with the highest fB prediction score and adding the sample into the set U; until an end condition is satisfied; returning to fB. The packet features obtained by multi-instance learning contain semantic coding information of the possible relation R, namely semantic feature implicit representation of the possible relation.

And performing Softmax classification on the obtained packet features, and after continuous learning, corresponding to Softmax to several candidate relation categories in bag containing sentence set level features. The goal of the training here is to maximize the accuracy of the classification.

Model training, including relationship class (relationship. txt), training data (train. txt), test data (test. txt) and word vector (vec. txt). The training data and the test data may be raw data randomly ordered, separated by 80% training and 20% testing. And (4) realizing the optimal prediction of the predefined relationship by adjusting the hyper-parameters until different probability values of different relationships of the same entity pair are obtained finally.

Examples 2,

Based on the same inventive concept, the invention also provides a remote-supervised Dual-Attention relation classification system, which comprises:

Preferably, the building block comprises:

and processing the sentence by adopting a text depth representation model to obtain a word vector of each word in the sentence.

Inputting the word vector into a Bi-LSTM model to obtain a coding vector of the word vector, and inputting the word vector into the Bi-LSTM model;

and finally, obtaining the context coding vector of the word vector.

Adding a word-level attention mechanism into the coding vector of the word vector to obtain a semantic feature coding vector of each sentence, wherein the word-level attention mechanism is added into the coding vector;

and obtaining a semantic feature coding vector of each sentence.

Preferably, the first vector module comprises:

adding a sentence level attention mechanism into the feature coding vector of the sentence set to obtain a denoised sentence set feature coding vector, and adding a sentence level attention mechanism weight into each sentence to ensure that the weight of an effective sentence is great and the weight of a noise sentence is small;

and obtaining the noise-reduced sentence set feature coding vector.

The second vector module includes:

carrying out relation classification on the packet features by using a multi-example learning method, wherein if at least one example that the label is judged to be positive by the classifier exists in one packet, the sentence in the packet is positive example data; if all sentences in a packet are judged to be negative by the classifier, the sentences in the packet are negative example data;

Examples 3,

Other relationships such as the entity pair "Jack Ma", the "Alibaba" and the corresponding relationship set "found", "CEO" and the like are known in the knowledge base WikiData, and several sentences containing the entity pair "Jack Ma" and "Alibaba" are classified from internet data, and here, sentences in which four entities coexist are exemplified.

Sentence 1: "Fe executives are Alibaba's secret sugar, found Jack Mass.

Sentence 2: "At a conference hosted by All Things D last week, Alibaba CEOJack Ma said that he was intested in Yahoo.

Sentence 3: "Internet entrepreneurial Jack Ma started a chip version of the yellow Pages that way as Alibaba's recursor in Hanzhou, China.") "

Sentence 4: "Alibaba has bright more small U.S. bussiness on to the company's sites, but this is the first time Ma has divided specific targets"

Sentences 3, 4 do not express the predefined relation of the knowledge base. One purpose of the invention is to train sentences with a large number of entity co-occurrence through a model, and realize classification and probability calculation of the corresponding relation of the co-occurrence entities in the sentences. And (3) outputting a result by the model, wherein the probability of the relation result 'fountain' extracted from the sentence 1 is the maximum, and other relations are the second order. The probability of the relation classification result "CEO" for sentence 2 is certainly the largest, and the probabilities of the other relations are the second order. The training corpus is sufficient enough, and the model can judge which relation is the relation according to the probability maximum value of the obtained possible relation for the sentences 3 and 4, so that the relation classification of the co-occurrence entity pair in the sentences 3 and 4 is realized.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A remote supervised Dual-Attention relationship classification method is characterized by comprising the following steps:

packing the sentence set feature coding vector and the entity pair translation vector, and carrying out entity pair relation classification on the obtained packet features;

the method for obtaining the semantic feature coding vector of the sentence comprises the following steps of carrying out word-level vector coding on the sentence based on a Bi-LSTM model of a word-level attention mechanism, wherein the word-level vector coding comprises the following steps:

2. The remotely supervised Dual-Attention relationship classification method of claim 1, wherein inputting the word vector into a Bi-LSTM model to obtain an encoding vector of the word vector, comprises:

inputting the word vector into a Bi-LSTM model;

and finally, obtaining the context coding vector of the word vector.

3. The remotely supervised Dual-Attention relationship classification method of claim 1, wherein adding a word level Attention mechanism to the coding vector of the word vector to obtain a semantic feature coding vector of each sentence, comprises:

said adding a word-level attention mechanism to said encoded vector;

and obtaining a semantic feature coding vector of each sentence.

4. The remotely supervised Dual-Attention relationship classification method of claim 1, wherein the Bi-LSTM model based on the sentence level Attention mechanism encodes and de-noises semantic features of the sentence to obtain a sentence set feature encoding vector, comprising:

5. The remotely supervised Dual-Attention relationship classification method of claim 4, wherein adding a sentence level Attention mechanism to the feature coding vectors of the sentence set to obtain denoised sentence set feature coding vectors comprises:

and obtaining the noise-reduced sentence set feature coding vector.

6. The remotely supervised Dual-Attention relationship classification method of claim 1, wherein the sentence set feature coding vector and the entity pair translation vector are packed, and the obtained package features are subjected to entity pair relationship classification, comprising:

7. The remotely supervised Dual-Attention relationship classification method of claim 6, wherein the relationship classification of the package features using multi-instance learning comprises:

8. A remotely supervised Dual-Attention relationship classification system, comprising:

the relation classification module is used for packing the sentence set feature coding vector and the entity pair translation vector and carrying out entity pair relation classification on the obtained packet features;

wherein, the first vector module is specifically configured to: