Background
With the development of artificial intelligence and technical explosion in the field of information extraction, entity relationship extraction has received attention from more and more scholars as an important research topic in the field of information extraction. The main purpose is to extract semantic relations between labeled entity pairs in sentences, namely to determine relation categories between entity pairs in unstructured texts on the basis of entity recognition and form structured data for storage and retrieval. The result of the entity relation extraction can be used for constructing a knowledge map or an ontology knowledge base, and data support can be provided for the construction of an automatic question-answering system. Besides, the entity relationship extraction has important research significance in the aspects of semantic network annotation, chapter understanding and machine translation.
Early relation extraction was mainly based on grammatical rules, by analyzing grammatical structures in sentences as the basis for relation generation. Although the method achieves good results, the recall rate is difficult to promote due to strict rules, professional grammatical knowledge and literature foundation are needed, and the applicability is not high. With the continuous development of the technology, the relation extraction method is divided into a supervised method, a semi-supervised method and an unsupervised method. Based on the content related to the invention, a supervised relationship extraction method is mainly researched. Supervised relational extraction can be mostly regarded as a classification problem, and there are two main methods in summary: a shallow structure model and a deep learning model.
The shallow structure generally has only one layer or no hidden layer nodes, such as a support vector machine, maximum entropy and the like. Shallow structures in relational extraction often use methods of feature engineering or kernel functions. The traditional method based on feature engineering mainly relies on a skillful design of feature set output through a language processing process. Most of these methods rely on either a large number of manually designed features or on carefully designed kernel functions. Despite the assistance of many excellent NLP tools, there is still a risk of performance degradation due to errors such as word segmentation inaccuracy and syntax parsing error. More importantly, the low portability of these carefully designed features or kernel functions greatly affects their scalability.
In recent years, a great deal of research has been made on relationship extraction based on deep learning. The method for extracting the relationships is based on models such as CNN and RNN, and achieves excellent effects. Many neural network-based methods show the advantages of neural networks over traditional shallow structures, but most of these results are achieved on a distributed balanced english dataset and use many external features as aids. The Chinese grammar structure is complex, and the language fuzzy phenomenon is more serious.
Disclosure of Invention
The invention provides a Chinese relation extraction method based on a neural network. The method can automatically extract abstract features with different dimensionalities and dependent information from original input by arranging hidden layers with different sizes for the long-term and short-term memory model, and captures global information by using an attention mechanism. Experiments show that compared with a multi-core convolutional neural network and a single long-short term memory-attention model, the method can obviously improve the Chinese relation extraction effect, and obtains a better result on the ACE RDC2005 Chinese data set, so that the effectiveness of the method is proved. The model frame is shown in figure 1.
The technical scheme of the invention is as follows: a Chinese relation extraction method based on a neural network comprises the following steps: constructing a BilSTMA unit, and extracting deep semantic information and global dependency information of a sentence; step two, constructing a Multi-BilSTMA model, and acquiring semantic information with dependency relationship of different granularities; and step three, verifying the validity of the method by using the real data.
The step 1 fully utilizes the advantages of a bidirectional long-term and short-term memory model (BilSTM) in the aspect of processing long-term dependence problems and the characteristic that an Attention mechanism (Attention) can capture global dependence information, and constructs a BilSTM unit (BilSTM-Attention) to extract deep semantic information and dependence information of sentences.
Step 2, setting hidden layers with different sizes in the BilSTA units, combining the BilSTA units with different sizes, and constructing a Multi-BilSTA model, wherein the model can obtain semantic information with dependency relationship with different granularities.
In the step 3, in order to verify the effectiveness of the method, an ACE RDC200 Chinese data set is used for verifying the identification effect of the method, so that the effectiveness of the method is verified.
Advantageous effects
The invention has the beneficial effects that:
in the invention, the key point is that the characteristics that the Multi-core CNN can learn different granularity characteristics are used for reference, a Multi-BilSTMA model is constructed by setting different sizes of BilSTMs by using a BilSTM and an Attention mechanism, and experiments prove that the method has excellent effect on an ACE RDC2005 Chinese data set.
The invention provides a Chinese relation extraction method based on a Multi-BilSTM-Attention neural network model. Experiments prove that the method shows higher performance on an ACE data set, and the effectiveness of the method is proved. The method provided by the invention effectively utilizes the characteristic that different granularity characteristics can be learned in the multi-core CNN neural network, and combines the characteristic with the BilSTM, thereby fully playing the characteristic of automatic characteristic extraction of the neural network model. A plurality of hidden layers with different sizes are arranged in a bidirectional BILSTM channel, so that feature sparsity can be prevented to a certain extent, semantic information of characters can be effectively acquired and utilized, and abstract features with different dimensions can be automatically acquired. On the basis, an Attention mechanism is added, local features and global features of sentences are utilized, and the weight is adjusted through the features, so that the noise is reduced, and the accuracy is improved.
The method provided by the invention combines the characteristic that a single long-short term memory model can only learn a certain specific dimensionality with the characteristic that a plurality of convolution kernels in a convolution neural network can learn different dimensionalities, provides a Multi-BilSTM-Attention model, obtains excellent performance in the aspect of Chinese relation extraction, and obtains good use effect.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
For a sentence with two entities, the task of relationship extraction is to extract candidate relationships between the two entities. The bidirectional long-short term memory neural network (BilSTM) model belongs to a variant of a Recurrent Neural Network (RNN), can effectively process long-distance information and avoid gradient explosion, and is combined to use due to the fact that the BilSTM and the Attention have good complementarity. However, a single, fixed BilSTM can only learn information of a particular dimension, and thus a Multi-BilSTM model is constructed by setting different BilSTMs. The model can learn features with dependent information in multiple dimensions.
First, the input layer of the model consists of word vectors mapped to the lookup table obtained by initialization. If the sentence length is L, the sentence mapped into the vector can be represented as: x ═ X1,x2,···,xL]Wherein x isi∈RDIs the ith word wiD is the dimension of the vector. If the dictionary size is V, the Embedding layer can be expressed as X ∈ RV×H. This process can be expressed as: x ═ embedding(s).
Next, the Multi-BiLSTMA layer in the present invention is composed of three BiLSTMA units. Wherein each BilSTMA unit consists of one layer of BilSTM and one layer of Attention. As shown in fig. 1(b), the BiLSTMA receives the data of the Embedding layer, uses a forward LSTM and a backward LSTM, to form a BiLSTM layer, which is used to extract features of deeper layers of the Embedding. This process is summarized as follows:
representing an element-by-element addition. Information on each time step in the Attention layer is laminated and combined with the BilsTM layer, and a signal with large influence on the extraction result is obtained through calculationAnd (4) information. This process can be summarized as: a ═ extension (h).
The next step is the fully connected layer of the model. After the outputs of the three BilSTMA units are spliced together, the information learned by the model is classified through a layer of full connection (Dense) layer, wherein the size of the hidden layer is the number of the relational types, namely 7. This process is summarized as follows: d ═ sense (a).
And finally, in order to obtain a better experimental effect, performing normalization processing on the output result of the full connection layer by using a softmax layer to obtain a final classification result. In general, this process can be summarized as: y ═ softmax (d).
The effectiveness of the method is verified by adopting real data, the selected data is an ACE RDC2005 standard Chinese data set, and the data is firstly preprocessed.
The invention adopts a publicly issued ACE RDC2005 Chinese data set to perform relationship extraction. After screening out irregular documents, the experiment totally uses 628 documents. This data set contains 6 entity relationship types (collectively, the positive examples), which are: "PART-WHOLE", "PHYS", "ORG-AFF", "GEN-AFF", "PER-SOC", and "ART". Since the relationships in the data set are directional, for example: if the entity pair (A, B) has an "ART" relationship in the dataset, but there is no relationship type labeled by any dataset between the entity pair (B, A), all such cases are collectively called negative examples, and the relationship type is labeled as "Other". Since the relationship extraction is mainly performed at the sentence level, ". ","! ","? ",": "these 5 Chinese punctuation marks cut the text in the dataset into sentences. Sentences without entity pairs are discarded, and sentences repeated between positive examples and negative examples are removed (because the same sentence cannot be both positive examples and negative examples), so that 1010056 sentences are obtained in total, wherein the 1010056 sentences comprise 9244 positive example sentences and 91812 negative example sentences. ACE RDC2005 chinese dataset is a dataset with an unbalanced distribution, each relationship type is not evenly distributed, especially with negative cases up to 90.85% in percentage. In order to more closely approach the real situation and reduce the influence caused by a large amount of negative example data, only the positive example results are evaluated in the evaluation.
Secondly, on the aspect of word vector processing, a method of randomly initializing the Lookup Table is adopted, the Lookup Table can be continuously adjusted in the training process, and the dimension of the word vector is set to be 100 dimensions. Because the neural network needs fixed input, the average length of sentences corresponding to each relationship type is analyzed. In order to balance the extraction effect and the training cost, a sentence length equal to 50 is selected as the maximum input length, sentences with sentence lengths lower than 50 are filled with '0' to 50, and the cuts with sentence lengths higher than 50 are cut to 50. An AdaDelta function is selected as an optimization function, and the learning rate is 1.0 of the default of the optimization function. Further, the batch size was set to 50 and the number of iterations was 100. Experimentally, three BilSTA cells were selected, with hidden layers of sizes 100, 200, and 300, respectively.
Finally, three tasks were designed on the same data in order to demonstrate the effectiveness of the method of the invention. The first task is to use the multi-core CNN for relationship extraction, which can be regarded as a reference model; the second task is to use single-layer BilSTM to extract the relation, and experiments prove that the effect is superior to that of the simple multi-core CNN method through the combination of the BilSTM and the Attention; the third task is to use a Multi-BilSTMA model to carry out relationship extraction, prove that the model has the effect similar to the Multi-core CNN, can fully utilize the advantages of the BilSTM and the Attention, and obviously improve the experimental result compared with the former two.
After 5-fold cross-validation experiments, the properties are shown in table 1 (F values for the three models have been shown in bold).
TABLE 1 relationship extraction task Performance
The number distribution of each relationship type is not balanced, and the results are also directly shown in table 1. The overall appearance is that the result of the high number of types is also high, which also accords with the characteristics of the neural network. In general, the larger the data volume, the more sufficient the training, the less likely the overfitting, and the better the result, for the same data quality and the same model. It can also be seen from the results that the F values of the three classes "PART-white", "ORG-AFF" and "GEN-AFF" are significantly higher than those of the other three normal types, which is also determined by the large data volume of the three classes.
Meanwhile, as can be seen from table 1, the performance of single-layer BiLSTMA is superior to that of simple multi-core CNN, because BiLSTMA can capture dependency information and key features in sentences more effectively than CNN, thereby obtaining better extraction effect. And the Multi-BilSTMA has the characteristics of both, so the performance of the Multi-BilSTMA is obviously superior to that of the two. In conclusion, the Chinese relation extraction method based on the neural network provided by the invention has excellent performance.
The present invention is not described in detail, but is known to those skilled in the art. Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.