CN111291556B - Chinese entity relation extraction method based on character and word feature fusion of entity meaning item - Google Patents
Chinese entity relation extraction method based on character and word feature fusion of entity meaning item Download PDFInfo
- Publication number
- CN111291556B CN111291556B CN201911298675.8A CN201911298675A CN111291556B CN 111291556 B CN111291556 B CN 111291556B CN 201911298675 A CN201911298675 A CN 201911298675A CN 111291556 B CN111291556 B CN 111291556B
- Authority
- CN
- China
- Prior art keywords
- word
- entity
- vector
- sense
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 53
- 230000004927 fusion Effects 0.000 title claims abstract description 32
- 239000013598 vector Substances 0.000 claims abstract description 362
- 239000011159 matrix material Substances 0.000 claims abstract description 76
- 238000000034 method Methods 0.000 claims abstract description 56
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 24
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims description 42
- 230000008569 process Effects 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000007246 mechanism Effects 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 8
- 230000017105 transposition Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 6
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 239000004576 sand Substances 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 abstract description 7
- 238000010276 construction Methods 0.000 abstract 1
- 235000019580 granularity Nutrition 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 241000233855 Orchidaceae Species 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 206010037660 Pyrexia Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to a Chinese entity relation extraction method based on the character and word feature fusion of an entity semantic item. The method introduces the entity meaning item to expand the sentence into a triple < sentence, an entity 1 meaning item and an entity 2 meaning item >, enriches the input fine granularity, and respectively maps three sequences in the triple into a word vector matrix. And (3) inputting the sentences in the triples into two models in parallel, wherein one model learns word features through an attention-based bidirectional long-short time memory network (Att-BLSTM), and the other model learns local features through a Convolutional Neural Network (CNN) and then learns the word features through the Att-BLSTM. Learning the word-based entity 1 and word-based entity 2 semantic features using Att-BLSTM, respectively. And fusing the four features into a feature capable of comprehensively representing semantic information for relationship extraction. The method can avoid word segmentation errors, solve the problem of word ambiguity, effectively improve the accuracy of Chinese entity relation extraction, and can be widely applied to construction of knowledge maps.
Description
Technical Field
The invention belongs to the technical field and relates to a Chinese entity relation extraction method based on the character and word feature fusion of an entity semantic item.
Background
With the development of network technology, the informatization era relying on the forms of characters, images and the like is strong, and the acquisition of useful information from a large amount of unstructured text data is particularly important. The main purpose of entity relation extraction is to determine the relation between entity pairs in unstructured text based on entity recognitionCategories, and form structured data for storage and retrieval. For example, for a sample "[ orchid]e1In the valley]e2This is self-unknown. "the entity with two marks" orchid "and" valley ", the task of relation extraction is to obtain the semantic information of the sample by machine learning, to complete the identification of the relation between the entity pairs, to form the structured triple form<Orchid, localized, valley>And the method is used for constructing a large-scale knowledge map. The knowledge graph is a semantic network composed of concepts, entities, entity attributes and entity relationships, is a structured representation of the real world, and is widely applied to search systems. For Chinese, semantic relations are more complex, and the role of entity relation extraction is more obvious. Therefore, research is essential to extraction of chinese entity relationships.
Conventional relational extraction mainly includes a feature-based extraction method and a kernel-based extraction method. The feature-based approach is as its name implies to mine a large number of lexical, syntactic and semantic features and then identify relationships between entities in the text by selecting appropriate devices. However, the kernel-based approach is an effort on kernel design, and these approaches are usually based on dependency structure. Although both methods have proven to have good performance to some extent, the operation of both feature extraction and kernel design is overly dependent on the output of the NLP tool, which inevitably introduces some errors and degrades the model performance.
In recent years, deep learning is applied to relation extraction, Zeng et al firstly propose a Convolutional Neural Network (CNN) to be applied to semantic learning, and the research of deep learning enters a stage of white fever. However, due to the lack of the Chinese data set, the Chinese entity relation extraction research is not much, the existing Chinese entity relation extraction method is mainly realized by improving a model under the input of a word vector matrix, and the model excessively depends on the word segmentation quality. The mainstream network frameworks that currently exist are: multi-scale convolutional neural networks (Multi-scale CNNs), bidirectional long-short term memory networks (BLSTMs), and improved GRU networks, etc., and meanwhile, attention mechanism is widely applied thereto and achieves certain effect. However, these methods only focus on the improvement of the model itself, and ignore the fact that different input granularities will have a significant impact on the relational extraction model. Word-based models fail to utilize information of words, so the captured features are less than word-based models, and at the same time the word-based model performance is too dependent on word-segmentation quality. At present, some methods have been proposed to combine information of words and words in other natural language processing tasks, for example, Tai et al propose a tree-like LSTM model to improve semantic representation, and widely used in various tasks such as human action recognition, voice tagging, etc. In addition to the incomplete character and word feature representation of the Chinese text, the ambiguity of the Chinese word still seriously affects the task of relationship extraction. In other words, the above extraction methods cannot deal with word ambiguity as the language environment changes. Therefore, the invention provides the method for supporting the semantic information of the entity in the sentence by introducing the entity semantic item as external language knowledge, helps to solve the problem of the polysemous one word of the entity, and simultaneously constructs different networks for the input word vector matrix to respectively learn the word characteristics and the word characteristics, thereby enriching the fine input granularity.
Disclosure of Invention
The purpose of the invention is: aiming at a SanWen data set proposed by Beijing university, in order to reduce the dependency of the existing entity relationship extraction model on word segmentation quality and improve the performance of the model for correctly identifying entity semantic information, a word vector matrix is used for input, simultaneously entity semantic items are introduced to enrich input fine granularity, semantic information of sentences is expressed from a plurality of layers, and a relationship extraction device capable of simultaneously learning word characteristics, word characteristics and entity semantic item characteristics is constructed.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the Chinese entity relation extraction method based on the character and word feature fusion of the entity semantic item is characterized by comprising the following steps of:
A. training;
taking m sentences in the SanWen data set as training samples, wherein the m sentences cover ten relations in the SanWen data set;
processing each of m sentences into a sequence S which exists separately in units of wordsjJ is 1,2, … m; the processing into the sequence which exists independently by taking the word as a unit means that each word and punctuation in the sentence are regarded as an individual and are arranged in rows in turn; the set of sequences of m statements is denoted S1,S2,…,Sm};
Sequentially adding sequence numbers to the sequence set of m sentences from the first word of the first sentence to 1 before the individual; the repeated individuals do not need to be given serial numbers repeatedly, and the individuals are marked according to the serial numbers;
calculating the word length of each sentence sequence for the sequence set of the m sentences, counting to obtain the maximum word length, and marking the maximum word length as n for defining the word lengths of the m sentence sequences; the regulation means that in m sentence sequences, sequences with the word length smaller than n are supplemented with a number 0 to the word length n;
step 2, acquiring entity 1 meaning items and entity 2 meaning items corresponding to the sentences;
in m sentences, taking an entity 1 in each sentence as a search entry of an encyclopedia website;
1) if the search entry of the entity 1 is not recorded in the encyclopedia website, taking the entity 1 as an entity 1 meaning item corresponding to the entity 1;
2) if the search terms of the entity 1 are recorded in an encyclopedia website, acquiring all entity 1 semantic items corresponding to the entity 1 by utilizing a web crawler technology;
respectively calculating semantic similarity of each sentence and each entity 1 meaning item corresponding to the sentence, and reserving an entity 1 meaning item with highest similarity;
in m sentences, the entity 1 of each sentence corresponds to an entity 1 meaning item, and the corresponding entity 1 meaning item is an entity 1 meaning item with the highest similarity or the entity 1 itself;
processing an entity 1 meaning item corresponding to the entity 1 in each statement into a sequence Sense (e) which exists independently in the unit of word1)jJ is 1,2, … m; is processed intoThe sequence with the word as the unit to exist independently refers to that each word and punctuation in the sentence are regarded as an individual and are arranged in rows in sequence; m entities 1 in m sentences correspond to m entity 1-Sense sequence sets, which are marked as { Sense (e)1)1,Sense(e1)2,…,Sense(e1)m};
For a sequence set of m entity 1 sense items, sequence numbers are sequentially added from the first word of the first entity 1 sense item to 1 in front of the individual, the repeated individual does not need to give the sequence numbers repeatedly, and the individual is marked according to the numbered number;
for the m entity 1 meaning item sequence sets, calculating the word length of each entity 1 meaning item sequence, counting to obtain the maximum word length, and marking as m1For specifying the word size of a sequence of m entity 1-significands; the definition refers to that in m entity 1 meaning item sequences, the word length is less than m1The sequence of sense items of entity 1 of (1) is complemented with a number 0 to a word length m1;
According to the mode of the entity 1, obtaining an entity 2 meaning item corresponding to the entity 2 in each statement, wherein the corresponding entity 2 meaning item is an entity 2 meaning item with highest similarity or the entity 2;
according to the mode of the entity 1, an entity 2 meaning item corresponding to each sentence entity 2 is processed into a sequence Sense (e) which exists independently in the unit of word2)jJ is 1,2, … m; m entities 2 in m sentences correspond to m entity 2-Sense sequence sets, which are marked as { Sense (e)2)1,Sense(e2)2,…,Sense(e2)m};
For a sequence set of m entity 2-sense items, sequence numbers are sequentially added from the first word of the first entity 2-sense item to 1 in front of the individuals, the repeated individuals do not need to give the sequence numbers repeatedly, and the individuals are marked according to the numbered numbers;
obtaining the maximum word length of m entity 2-meaning item sequence sets according to the mode of the entity 1, and marking as m2For specifying the word size of a sequence of m entity 2-sense items; the definition refers to that in m entity 2 meaning item sequences, the word length is less than m2The entity 2 sense sequence of (2) is supplemented with the number 0To a word length of m2;
Step 3, expanding the triple < statement, entity 1 meaning item and entity 2 meaning item >;
for each sentence sequence SjExtended as triplets<Sj,Sense(e1)j,Sense(e2)j>;
said SjThe word vector matrix in (1) is formed by splicing word self vectors and distance vectors, Sense (e)1)jThe word vector matrix in (1), i.e. the word itself vector, Sense (e)2)jThe word vector matrix in (1), namely the vector of the word itself;
the distance vector is a distance vector from a word to an entity 1 and a distance vector from the word to an entity 2;
the splicing refers to adding dimensions of specified vectors to synthesize a vector;
step 5, sequence S in the triplejObtaining sentence characteristic vector based on characters by using Att-BLSTM learning, and recording the sentence characteristic vector as hc *;
Step 6, sequence S in the triplejFirstly, CNN is used for learning local features, then Att-BLSTM is used for learning to obtain sentence feature vector based on words, and the sentence feature vector is recorded as hw *;
Learning S with CNNjThe word vector matrix obtains a local feature vector, and the local feature vector represents semantic information between words in a sentence and is regarded as the feature of the word;
step 7, the sequence Sense (e) in the triple is processed1)jObtaining an entity 1-meaning item feature vector based on words by utilizing Att-BLSTM learning and marking as he1 *(ii) a For sequence Sense (e) in triplets2)jObtaining an entity 2-sense item feature vector based on words by utilizing Att-BLSTM learning and marking as he2 *;
Step 8, feature fusion;
concatenating the word-based sentence feature vectors and basesThe sentence characteristic vector of the word obtains the characteristic vector of the sentence semantic information, and the characteristic vector is marked as hs *:
hs *=[hc *;hw *];
Splicing the word-based entity 1-meaning item feature vector and the word-based entity 2-meaning item feature vector to obtain a feature vector of entity semantic information, and marking the feature vector as he *:
he *=[he1 *;he2 *]
H is to bes *Inputting the result into the hidden layer of the full-connection network to obtain a new sentence characteristic vector os;
H is to bee *Inputting the data into the hidden layer of the full-connection network to obtain a new semantic item feature vector oe;
To o issAnd oeWeighting and summing to obtain final characteristic vector o, wherein weights are eta and1-η。
step 9, extracting the relation;
inputting the final feature vector o into a softmax layer to obtain probability values of each class, wherein the class corresponding to the maximum probability value is a relation extraction result;
B. inputting a target Chinese sentence, and identifying a relation;
1) if a target Chinese sentence contains two marked entities, identifying the relationship between the entities in the target Chinese sentence;
2) if a target Chinese statement contains less than 2 marked entities, reporting errors;
3) if a target Chinese statement contains more than three marked entities, reporting an error;
if more than two target Chinese sentences exist, automatically cutting sentences and then identifying the relation between the entities in each target Chinese sentence according to the steps 1) -3).
The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 1 comprises:
the m is 17227 and is all training samples in the san wen dataset.
The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 2 comprises:
calculating semantic similarity refers to calculating similarity by using a cosine similarity algorithm;
the cosine similarity algorithm is to use Word2Vec method to make sentence sequence SjEach word in the sequence is mapped into a word vector, corresponding elements of the word vectors of all the words in the sequence are added and then divided by the total number of the word vectors to obtain a sequence SjThe vector of (a); the entity 1 sequence Sense (e) was obtained as described above1)jThe vector of (a); and calculating a cosine value between included angles of two vectors in a vector space to serve as a measure of the difference between the two sequences, wherein the cosine value is close to 1, the included angle tends to 0, which indicates that the two sequences are more similar, the cosine value is close to 0, and the included angle tends to 90 degrees, which indicates that the two sequences are more dissimilar.
The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 4 comprises:
1) for the SjMapping into a basic word vector matrix, wherein the basic word vector of each word is formed by splicing the word vector and the distance vector; the splicing refers to adding dimensions of specified vectors to synthesize a vector;
the Word vectors are used for mapping each Word into a low-dimensional real number vector tx by using a Word2Vec methodiThe vector dimension is dw. Wherein, txiDenotes SjA word itself vector of the ith word; dwRepresenting the dimensions of the vector.
The distance vector is a distance vector from a word to an entity 1 and a distance vector from a word to an entity 2.
We define the distance of the ith word to entity 1 as pi 1The distance from the ith word to the entity 2 is defined as pi 2。pi 1And pi 2Same calculation method, pi 1The calculation formula is defined as follows:
where i represents the position index of the ith word, b1Initial position index representing entity 1, e1Indicating the end position index of entity 1.
P obtained by calculationi 1And pi 2Mapping into low-dimensional vectors, respectively denoted as xi p1And xi p2Both vector dimensions are dd. Wherein x isi p1Represents the distance vector from the ith word to the entity 1; x is the number ofi p2Representing the distance vector of the ith word to entity 2.
Splicing the word vectors and the distance vectors to obtain a basic word vector of the ith word, and recording the basic word vector as vi=[txi;xi p1;xi p2]D is dw+2*dd. For the SjWe map it into a basic word vector matrix, denoted Sjv=[v1,v2,…,vi,…,vn]T. Wherein v is1Denotes SjThe basic word vector of the 1 st word; v. ofiDenotes SjThe basic word vector of the ith word; v. ofnDenotes SjThe basic word vector of the nth word; t represents the transpose of the matrix, and since a basic word vector is a column vector with dimension d, the dimension of the matrix after transpose is n × d.
2) For the Sense (e)1)jMapping into a matrix of basic word vectors, the basic word vector of each word, i.e. the word itself vector.
According to 1) the word itself vector, for the Sense (e)1)jWe map this to a basic word vector matrix, denoted as Sense (e)1)jv=[sx1,sx2,…,sxi,…,sxm1]T. Wherein, sx1Represents Sense (e)1)jThe basic word vector of the 1 st word; sxiRepresents Sense (e)1)jThe basic word vector of the ith word; sxm1Represents Sense (e)1)jM in1A base word vector of words; t represents the transpose of the matrix, since a word vector is of dimension dwSo that the dimension of the matrix after transposition is m1*dw。
3) For the Sense (e)2)jMapping into a matrix of basic word vectors, the basic word vector of each word, i.e. the word itself vector.
According to 1) the word itself vector, for the Sense (e)2)jWe map this to a basic word vector matrix, denoted as Sense (e)2)jv=[vx1,vx2,…,vxi,…,vxm2]T. Wherein, vx1Represents Sense (e)2)jThe basic word vector of the 1 st word; vxiRepresents Sense (e)2)jThe basic word vector of the ith word; vxm2Represents Sense (e)2)jM in2A base word vector of words; t represents the transpose of the matrix, since a basic word vector is of dimension dwSo that the dimension of the matrix after transposition is m2*dw。
The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 5 comprises:
1) for the SjIs a basic word vector matrix Sjv=[v1,v2,…,vi,…,vn]TWe learn word features using Att-BLSTM. The Att-BLSTM refers to a bidirectional long-time memory network based on an attention mechanism.
At time t (t ═ 1,2, …, n), we input a basic word vector vtTo BLSTM, learning the word direction from the forward and reverse directionsMeasuring to obtain a forward implicit characteristic vector and a reverse implicit characteristic vector, and respectively recording the forward implicit characteristic vector and the reverse implicit characteristic vector asAndto pairAndadding each element in the two implicit characteristic vectors in a one-to-one correspondence manner to obtain a bidirectional implicit characteristic vector which is recorded asThe word vector matrix Sjv=[v1,v2,…,vi,…,vn]TObtaining the bidirectional implicit characteristic vector of each word after BLSTM and recording the bidirectional implicit characteristic vector asWherein,a bi-directional implicit feature vector representing word 1;a bi-directional implicit feature vector representing the ith word;a bi-directional implicit feature vector representing the nth word.
2) The attention mechanism is automatically HSjvThe bidirectional implicit characteristic vector of each word is distributed with a weight coefficient, the bidirectional implicit characteristic vector of each word is combined with the corresponding distributed weight coefficient to obtain a sentence characteristic vector based on the words through weighted summation operation, and the sentence characteristic vector is recorded as hc *。
The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 6 comprises:
1) for the SjIs a basic word vector matrix Sjv=[v1,v2,…,vi,…,vn]TThe CNN is used for learning the words, local feature vectors are obtained, the vectors represent semantic information between the words in the sentences and are regarded as the features of the words. K different local feature vectors can be obtained through k different CNNs and are marked as Hw=[h1,h2,…,hi,…,hk]. Wherein h is1Features representing the 1 st CNN derived word; h isiFeatures representing words derived by the ith CNN; h iskRepresenting the characteristics of the word derived by the kth CNN.
2) To Hw=[h1,h2,…,hi,…,hk]Using the Att-BLSTM learning word feature in step 5, a sentence feature vector based on words is obtained and is recorded as hw *
The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 7 comprises:
1) for the Sense (e)1)jWord vector matrix Sense (e)1)jv=[sx1,sx2,…,sxi,…,sxm1]TObtaining the entity 1 meaning item feature vector based on the words by using the Att-BLSTM learning entity 1 meaning item feature in the step 5, and marking as he1 *。
2) For the Sense (e)2)jWord vector matrix Sense (e)2)jv=[vx1,vx2,…,vxi,…,vxm2]TObtaining the entity 2-meaning item feature vector based on the words by using the Att-BLSTM learning entity 2-meaning item feature in the step 5, and marking as he2 *。
The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic item as claimed in claim 1, wherein the step 8 comprises:
the weight eta is 0.9, and is a hyperparameter obtained through training and continuous adjustment.
The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic item as claimed in claim 1, wherein the training comprises:
and (4) building a relation extraction device according to the steps 1-9, and randomly initializing all parameters in the model. During the whole training process of the model, 10 triples are input at a time from m triples<Sj,Sense(e1)j,Sense(e2)j>Training (i.e. batch _ size ═ 10), and completing one training for all triples is recorded as one training process, and a total of 100 such training processes are performed (i.e. epoch ═ 100); continuously training the model by using a random gradient descent method to update parameters by taking the prediction output of the model and the cross entropy of the real relation label as a loss function; meanwhile, in order to prevent overfitting, a dropout mechanism is adopted in the training process, and each neuron is closed at a probability of 50% in the training process (namely, half of hidden layer nodes are randomly calculated in each training process and do not participate in calculation); and after the training is finished, obtaining a trained entity relation extraction device.
The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic terms as claimed in claim 1, wherein the input of the target Chinese sentence and the relationship identification comprises:
given a target Chinese sentence: if a target Chinese sentence contains two marked entities, directly identifying the relationship between the entities in the target Chinese sentence; if a target Chinese statement contains less than 2 marked entities, reporting errors; if a target Chinese statement contains more than three marked entities, reporting an error; if there are more than two target Chinese sentences, the relation between the entities in each target Chinese sentence is identified according to the step of one target Chinese sentence after the sentence is automatically broken.
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention can avoid errors caused by Chinese word segmentation by utilizing the input of the word vector, constructs a network framework for simultaneously capturing word characteristics, word characteristics and entity meaning item characteristics based on the input of the word vector in order to comprehensively capture sentence characteristics and solve the problem of ambiguity of an entity word, and represents semantic information from a plurality of layers. The method is practical, pays attention to the fact that fine granularity of text input has great influence on relation extraction, and obtains high precision.
Drawings
FIG. 1 is a system framework diagram of an entity relationship extraction model proposed in the present invention.
Fig. 2 is a flowchart of an entity relationship extraction method proposed in the present invention.
FIG. 3 is a flow diagram of a Baidu encyclopedia import entity-sense item.
FIG. 4 is a schematic diagram of an Att-BLSTM network.
Fig. 5 is a schematic illustration of an attention mechanism.
FIG. 6 is a diagram of sentence feature vectors at the learning level.
Fig. 7 is a schematic diagram of a CNN network.
Fig. 8 is a schematic diagram of sentence feature vectors at the learning word level.
FIG. 9 is a graph of performance comparison experiments for word features and word features as set forth in the present invention.
Detailed Description
The invention will be further illustrated with reference to specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The Chinese entity relation extraction method based on the character and word feature fusion of the entity semantic item is characterized by comprising the following steps of: learning character features for the input character vector matrix by using an Att-BLSTM neural network; meanwhile, performing convolution operation on the input word vector matrix by using a CNN network to generate word vectors, and learning word characteristics by using an Att-BLSTM neural network; and introducing entity semantic items, and automatically learning semantic item features by using an Att-BLSTM neural network. Meanwhile, the character characteristics, the word characteristics and the meaning item characteristics are fused, the input fine granularity can be enriched, the semantic information can be fully represented, the influence caused by Chinese word segmentation errors can be avoided by character vector input, and ambiguity caused by entity ambiguity can be eliminated by introducing the entity meaning item (see fig. 1 and fig. 2). The method comprises the following steps:
1) M sentences were taken as training samples from the SanWen dataset, covering ten relationships in the SanWen dataset (see table 1). Of the m statements, each with a known relationship label and two marked entities, the m takes 17227.
TABLE 1
2) In m sentences, each word and punctuation mark in each sentence are regarded as an individual and are arranged in rows in turn to obtain a sequence S which independently exists by taking the word as a unitjThe set of sequences of m statements is denoted as { S1,S2,…,S m1,2, … m, the sequence set is used to build a word table.
3) And sequentially adding sequence numbers to the sequence set of the sentences from 1 before the individuals, and marking the individuals which repeatedly appear according to the coded numbers to obtain a word list of the sequence set. And calculating the word length of each statement sequence in the statement sequence set, counting to obtain the maximum word length, and marking as n for specifying the word lengths of the m statement sequences. The stated regulation means that in m sentence sequences, the word length less than n is supplemented with the number 0 to the word length n.
Step 2, acquiring entity 1 meaning item and entity 2 meaning item corresponding to the statement
1) The invention creatively provides the method for introducing the entity meaning item into the relation extraction task, provides additional support information for the entity in the sentence and can help solve the problem of the entity word ambiguity. The definition of a meaning term refers to the division of the rational meaning of a word, and a word often has multiple meanings, and each meaning is a meaning term.
2) In m sentences, taking an entity 1 in each sentence as a search entry of an encyclopedia website: if the search entry of the entity 1 is not recorded in the encyclopedia website, taking the entity 1 as an entity 1 meaning item corresponding to the entity 1; if the search term of the entity 1 is recorded in the encyclopedia website, all entity 1 semantic items corresponding to the entity 1 are obtained by using a web crawler technology, semantic similarity between each sentence and each entity 1 semantic item corresponding to the sentence is calculated, and the entity 1 semantic item with the highest similarity is reserved (as shown in fig. 3). In m sentences, the entity 1 of each sentence corresponds to one entity 1 meaning item, and m entity 1 in m sentences correspond to m entity 1 meaning items.
3) For the m entity 1 meaning items, each word and punctuation mark in each entity 1 meaning item are regarded as an individual and are arranged in rows in turn, and the individual word and punctuation mark are processed into a sequence Sense (e) which exists independently by taking the word as a unit1)jThe set of sequences of m entity 1-Sense items is denoted as { Sense (e)1)1,Sense(e1)2,…,Sense(e1)mJ is 1,2, … m. And adding sequence numbers to the sequence set of the entity 1 sense items from 1 in sequence before the individuals, and marking the repeated individuals according to the coded numbers to obtain a word table of the sequence set. For the sequence set of the entity 1 sense items, calculating the word length of each entity 1 sense item sequence, counting to obtain the maximum word length, and marking as m1For defining the word size of a sequence of m entity 1-significands. The stated rule is that in m entity 1 meaning item sequences, the word length is less than m1Is complemented by a number 0 to a word length m1。
4) According to the method of 2), obtaining m sentences, wherein the entity 2 of each sentence corresponds to one entity 2-sense item, and m entities 2 in the m sentences correspond to m entity 2-sense items.
5) According to the method of 3), obtaining a sequence Sense (e) of the entity 2-Sense2)jThe set of sequences of m entity 2-Sense items is denoted as { Sense (e)2)1,Sense(e2)2,…,Sense(e2)m1,2, … m; obtaining a word table of an entity 2 meaning item sequence set; obtaining the maximum word length of the entity 2 meaning item sequence set and marking as m2For defining the word size of a sequence of m entity 2-sense entries.
Step 3, extending the triple < statement, entity 1 meaning item, entity 2 meaning item >
In m sentences, for each sentence sequence SjExtended as triplets<Sj,Sense(e1)j,Sense(e2)j>In which S isjHas a word length of n, Sense (e)1)jHas a word length of m1,Sense(e2)jHas a word length of m2。
Each individual in the sentence is mapped into a low-dimensional vector, so that the problem of word segmentation errors can be avoided.
1) For the SjMapping into a basic word vector matrix, wherein the basic word vector of each word is formed by splicing the word vector and the distance vector. The splicing refers to adding dimensions of specified vectors to synthesize a vector;
the Word itself vector, we use Word2Vec method to map each Word into a low dimensional real vector txiThe vector dimension is dw. Wherein, txiDenotes SjA word itself vector of the ith word; dwRepresenting the dimensions of the vector.
The distance vector refers to a distance vector from a word to an entity 1 and a distance vector from a word to an entity 2.
We define pi 1And pi 2Representing the relative distance of the ith word to entities 1 and 2, respectively. p is a radical ofi 1And pi 2Same calculation method, pi 1The calculation formula is defined as follows:
wherein i represents a position index of the ith word; b1An initial position index representing entity 1; e.g. of the type1Indicating the end position index of entity 1.
Obtaining the relative distances p from the ith word to the entity 1 and the entity 2 respectivelyi 1And pi 2Then, the two values are mapped by using low-dimensional vectors, and are respectively marked as xi p1And xi p2Both vector dimensions are dd. Wherein x isi p1Represents the distance vector from the ith word to the entity 1; x is the number ofi p2Representing the distance vector of the ith word to entity 2.
Splicing the word vector and two distance vectors to obtain a basic word vector of the ith word, and marking as vi=[txi;xi p1;xi p2]D is dw+2*dd. Wherein, txiDenotes SjA word itself vector of the ith word; x is the number ofi p1Represents the distance vector from the ith word to the entity 1; x is the number ofi p2Represents the distance vector of the ith word to the entity 2; v. ofiThe basic word vector representing the ith word is a column vector of dimension d.
For the SjWe map it into a basic word vector matrix, denoted Sjv=[v1,v2,…,vi,…,vn]T. Wherein v is1Denotes SjThe basic word vector of the 1 st word; v. ofiDenotes SjThe basic word vector of the ith word; v. ofnDenotes SjThe basic word vector of the nth word; t represents the transpose of the matrix, and since a basic word vector is a column vector with dimension d, the dimension of the matrix after transpose is n × d.
2) For the Sense (e)1)jMapping into a matrix of basic word vectors, the basic word vector of each word, i.e. the word itself vector.
The Word itself vector, we use the Word2Vec methodConverting each word into a low-dimensional real vector sxiThe vector dimension is denoted as dw. For the Sense (e)1)jWe map this to a basic word vector matrix, denoted as Sense (e)1) jv=[sx1,sx2,…,sxi,…,sxm1]T. Wherein, sx1Represents Sense (e)1)jWord itself vector of the 1 st word; sxiRepresents Sense (e)1)jA word itself vector of the ith word; sxm1Represents Sense (e)1)jM in1A word itself vector of words; t represents the transpose of the matrix, since a word vector is of dimension dwSo that the dimension of the matrix after transposition is m1*dw。
3) For the Sense (e)2)jMapping into a matrix of basic word vectors, the basic word vector of each word, i.e. the word itself vector.
The Word itself vector, we use Word2Vec method to convert each Word into a low-dimensional real vector vxiThe vector dimension is denoted as dw. For the Sense (e)2)jWe map this to a basic word vector matrix, denoted as Sense (e)2) jv=[vx1,vx2,…,vxi,…,vxm2]T. Wherein, vx1Represents Sense (e)2)jWord itself vector of the 1 st word; vxiRepresents Sense (e)2)jA word itself vector of the ith word; vxm2Represents Sense (e)2)jM in2A word itself vector of words; t represents the transpose of the matrix, since a word vector is of dimension dwSo that the dimension of the matrix after transposition is m2*dw。
Step 5, sequence S in the triplejObtaining sentence characteristic vector based on characters by using Att-BLSTM learning, and recording the sentence characteristic vector as hc *
1) The Att-BLSTM refers to a bidirectional long-short time memory network based on an attention mechanism (as shown in FIG. 4).
The existing LSTM is utilized to learn long-distance semantic information to generate implicit feature vectors, and the specific calculation formula is as follows:
ct=itgt+ftct-1
ht=ottanh(ct)
wherein x istIndicating input of LSTM at time t, ht-1Representing implicit feature vectors, c, corresponding to the LSTM output at the previous instantt-1Indicating the state of the cells corresponding to the LSTM at the previous time; i.e. itInput gate, Wx, representing LSTMi,Whi,WciIs a weight matrix corresponding to the input gate, biIs the offset parameter corresponding to the input gate, sigma represents sigmoid function; f. oftForgetting gate, Wx, representing LSTMf,Whf,WcfIs the weight matrix corresponding to the forgetting gate, bfIs a bias parameter corresponding to the forgetting gate; wxc,Whc,WccIs a weight matrix corresponding to the candidate gate, bcIs a bias parameter corresponding to the candidate gate, tanh represents a hyperbolic tangent function, ctIs the current cell state; wxo,Who,WcoIs a weight matrix corresponding to the output gate, boIs the offset parameter, h, corresponding to the output gatetIs the implicit feature vector output at time t.
The sentence sequence is actually a time sequence for which S isjWord vector matrix Sjv=[v1,v2,…,vi,…,vn]TAt time t (t ═ 1,2, …, n), we input a basic word vector vtThe implicit feature vector corresponding to the word vector, i.e. h corresponding to the formula, can be obtained from the forward direction in the LSTMt. Wherein v istInput x corresponding to time t of LSTMt;htAnd (4) representing the implicit characteristic vector learned by the t word.
2) In order to capture past and future semantic information of a sequence, learning is respectively carried out from the forward direction and the reverse direction to obtain a forward implicit characteristic vector and a reverse implicit characteristic vector of a word vector, and the forward implicit characteristic vector and the reverse implicit characteristic vector are respectively marked asAndto pairAndadding each element in the two vectors in a one-to-one correspondence manner to obtain a bidirectional implicit characteristic vector which is recorded asThe word vector matrix Sjv=[v1,v2,…,vi,…,vn]TObtaining the bidirectional implicit characteristic vector of each word after BLSTM and recording the bidirectional implicit characteristic vector asWherein,to representA bidirectional implicit feature vector of word 1;a bi-directional implicit feature vector representing the ith word;a bi-directional implicit feature vector representing the nth word.
3) In a sequence, each word is different in the degree of importance to the sequence semantics, some words play a critical role, and some words hardly play a role. The attention mechanism can learn the importance degree of each word to the sequence semantics, and automatically allocates a weight coefficient to each word to measure the importance degree of the word, wherein the attention mechanism has the following calculation formula:
M=tanh(H)
α=softmax(ωTM)
r=HαT
h*=tanh(r)
wherein H represents the output of BLSTM, i.e. said HSjvThe dimension of the matrix is daN. Wherein d isaRepresenting the dimension of the bidirectional implicit characteristic vector, and n represents the word length; ω is a randomly initialized vector with dimension da,ωTRepresenting a transpose of the vector; alpha represents a weight vector obtained by learning, and the dimension is n; r is a feature vector obtained by performing a linear weighted summation operation on the input matrix, and the dimension is da;h*Is a sentence feature vector obtained by the operation of tanh function on r, and the dimension is da。
With the attention mechanism (see FIG. 5), it is automatically HSjvThe bidirectional implicit characteristic vector of each word is distributed with a weight coefficient, the bidirectional implicit characteristic vector of each word is combined with the corresponding distributed weight coefficient to obtain a sentence characteristic vector based on the words through weighted summation operation, and the sentence characteristic vector is recorded as hc *(see FIG. 6).
Step 6, sequence S in the triplejFirstly using CNN to learn local characteristics, then using Att-BLSTM to learn so as to obtainWord-based sentence feature vector, denoted as hw *
1) For the sequence SjWord vector matrix Sjv=[v1,v2,…,vi,…,vn]TFirst pass a pass weight vectorThe parameterized filter (CNN) performs a convolution operation (see fig. 7).
Wherein, ω iskRepresents the kth filter; c x d denotes the size of the filter, d denotes the length of the filter, and corresponds to the dimension of the word vector, and c denotes the width of the filter. Input word vector matrix Sv=[v1,v2,…,vi,…,vn]TThe output of the kth filter convolution layer can be obtained by the following calculation formula:
hk=f(ωkvi:i+c-1+bk)
wherein v isi:i+c-1Denotes vi…vi+c-1Cascading feature vectors; i ═ 1,2, …, n-c + 1, f denotes the ReLu activation function;is a biased term, ωkAnd bkIs a parameter learned during training and all i ═ 1,2, …, n-c + 1 will remain the same for a k; h iskThe local feature vector, representing the word output by the kth filter, has a dimension of n-c + 1.
The sequence SjEvery time the word vector matrix is convoluted by a filter, a local feature vector can be obtained and is marked as hkThe vector characterizes local semantic information between words in the sentence and is regarded as a feature vector of the words. Each convolution is due to the learned ω of the respective filterkAnd bkDifferent parameters can learn different semantic information. Through k different filters, we can obtain k different local feature vectors, denoted as Hw=[h1,h2,…,hi,…,hk]. Wherein h is1A feature vector representing the word output by the 1 st filter; h isiA feature vector representing the word output by the ith filter; h iskA feature vector representing the word output by the k-th filter.
2) To Hw=[h1,h2,…,hi,…,hk]Then, we use the Att-BLSTM learning word feature in step 5 to obtain the sentence feature vector based on words, which is recorded as hw *As shown in fig. 8.
Step 7, for the sequence Sense in the triple (e1)jObtaining an entity 1-meaning item feature vector based on words by utilizing Att-BLSTM learning and marking as he1 *(ii) a For sequence in triplet Sense (e2)jObtaining an entity 2-sense item feature vector based on words by utilizing Att-BLSTM learning and marking as he2 *。
1) Sense (e1) for the entity 1 Sense sequencejWord vector matrix Sense (e1)jv=[sx1,sx2,…,sxi,…,sxm1]TObtaining an entity 1 meaning item feature vector based on words by using the Att-BLSTM learning entity 1 meaning item feature in the step 5, and marking the vector as he1 *。
2) Sense (e2) for the entity 2 Sense sequencejWord vector matrix Sense (e2)jv=[vx1,vx2,…,vxi,…,vxm2]TObtaining an entity 2-meaning item feature vector based on words by using the Att-BLSTM learning entity 2-meaning item feature in the step 5, and marking the vector as he2 *。
Step 8, feature fusion
The invention creatively constructs three models of the steps 5-7 to learn the character characteristics, the word characteristics and the entity meaning item characteristics, expresses semantic information on multiple sides, enriches the input fine granularity and effectively improves the accuracy of relation extraction.
Concatenating the word-based sentence feature vector hc *And word-based sentence feature vector hw *Obtaining the feature vector of the semantic information of the sentence, and marking as hs *=[hc *;hw *]. Concatenating the word-based entity 1-sense feature vector he1 *And a word-based entity 2-sense feature vector he2 *Obtaining the feature vector of entity semantic information, and marking as he *=[he1 *;he2 *]. The feature vector h of the sentence semantic information obtained by splicings *Inputting the result into the hidden layer of the full-connection network to obtain a new sentence characteristic vector osThe feature vector h of the entity semantic informatione *Input to the hidden layer of a fully-connected network, a new semantic item feature vector oeTo o, tosAnd oeThe weighted summation obtains the final characteristic vector o, and the weights are respectively recorded asηAnd 1- η; the above-mentionedη0.9 is taken.
Step 9, relation extraction
Inputting the final feature vector o into a softmax layer, and calculating the probability that the statement belongs to each class:
p(y)=softmax(o)
wherein p (y) refers to the probability value of each class of sentence;representing the maximum probability value. The category corresponding to the maximum probability value is the relationship category extracted by the relationship extracting means.
The loss function is defined as the cross entropy of the true class label and the predicted class:
wherein,is a one-hot code representation of the authentic tag,is the estimated probability of each category obtained by the relation extraction device; m represents the total number of categories, there being a total of 10 categories of relationships in the SanWen dataset; λ is the L2 regularization parameter, and θ represents all parameters in the model.
The invention comprises two stages: training and identifying:
and (4) constructing a relation extraction device according to the steps 1-9, and randomly initializing all parameters in the model. During the whole training process of the model, 10 triples are input at a time from m triples<Sj,Sense(e1)j,Sense(e2)j>Training is performed (i.e., batch _ size is 10), and one training process is performed for all triples, for a total of 100 such training processes (i.e., epoch is 100). And continuously training the model by using a random gradient descent method to update parameters by taking the prediction output of the model and the cross entropy of the real relation label as a loss function. Meanwhile, in order to prevent overfitting, a dropout mechanism is adopted in the training process, and each neuron is closed with 50% of probability in the training process (namely, one half of hidden layer nodes are not involved in calculation in each training process). And after the training is finished, obtaining a trained entity relation extraction device.
In the recognition phase, a target Chinese sentence is given: if a target Chinese sentence contains two marked entities, directly identifying the relationship between the entities in the target Chinese sentence; if a target Chinese statement contains less than 2 marked entities, reporting errors; if a target Chinese statement contains more than three marked entities, an error is reported. If there are more than two target Chinese sentences, the relation between the entities in each target Chinese sentence is identified according to the step of one target Chinese sentence after the sentence is automatically broken.
The relation extraction device provided by the invention can learn different semantic information from the word level and the word level of the sentence, the added entity semantic item adds extra support information to the semantic of the entity in the sentence, and the word characteristics, the word characteristics and the semantic item characteristics are obtained by constructing different networks to learn, so that the fine granularity of input is enriched, the problem of word segmentation errors is avoided, the problem of multi-meaning stage of a word is solved, and the accuracy of relation extraction is improved.
The relation extraction device inputs a Chinese sentence marking two entities, and can identify the relation between the entities. The triple < entity 1, relationship, entity 2> is established for the entity and relationship, can be used to construct a knowledge graph, and is applied to a search system.
Example 1
In this embodiment, the model performance of learning the word feature and the word feature at the same time is studied in the relationship extraction device, and the results are compared with the model of learning only the word feature and the model of learning only the word feature. The experimental process of learning the word feature and the word feature model simultaneously is performed according to the relevant steps in the invention, and the comparison effect of the three is shown in table 2 and fig. 9.
As can be seen from table 2 and fig. 9, the model only learning the character features is more effective than the model only learning the word features, while we propose that the model learning both the character features and the word features is more effective than the model only learning a single feature. Because in the Chinese sentence, the words can represent the grammatical structure and the syntactic structure of the sentence, the model is established and the word characteristics are learned at the same time, so that the semantic information of the sentence can be learned more comprehensively, and the accuracy of relation extraction is further improved. The higher the F1 value in Table 2, the better the extraction effect of the entity relationship; the higher the curve in fig. 9, i.e. the larger the area contained in the two coordinate axes, the better the entity relationship extraction effect.
TABLE 2
Example 2
In this embodiment, a model of an entity meaning item is added to a relationship extraction device for word feature learning, and the presence or absence of the entity meaning item is tested to compare and explain the effect of introducing the entity meaning item. Meanwhile, the relation extraction device based on the fusion of the character and word characteristics of the entity semantic item provided by the invention is respectively compared with the model for simultaneously learning the character and word characteristics and the model for simultaneously learning the character and word characteristics. The experimental process of the character and word feature fusion based on the entity meaning item is carried out according to specific steps in the invention content. The effect of the comparison is shown in table 3.
As can be seen from table 3, the introduction of the entity meaning item has a better effect than a model without the introduction of the entity meaning item, which indicates that the introduction of the entity meaning item is helpful for the extraction of the entity relationship, and can improve the performance of the extraction of the entity relationship. Meanwhile, the entity relation extraction performance based on the character and word feature fusion of the entity meaning item is the best, the importance of input fine granularity to the entity relation extraction is explained, the character feature, the word feature and the entity meaning item feature are learned, and the feature fusion can effectively express the semantic information of the sentence.
TABLE 3
Claims (10)
1. The Chinese entity relation extraction method based on the character and word feature fusion of the entity semantic item is characterized by comprising the following steps of:
A. training;
step 1, sentence preprocessing;
taking m sentences in the SanWen data set as training samples, wherein the m sentences cover ten relations in the SanWen data set;
processing each of m sentences into a sequence S which exists separately in units of wordsjJ is 1,2, … m; processed into unit sheets of wordsThe unique sequence means that each word and punctuation in the sentence are regarded as an individual and are arranged in rows in turn; the set of sequences of m statements is denoted S1,S2,…,Sm};
Sequentially adding sequence numbers to the sequence set of m sentences from the first word of the first sentence to 1 before the individual; the repeated individuals do not need to be given serial numbers repeatedly, and the individuals are marked according to the serial numbers;
calculating the word length of each sentence sequence for the sequence set of the m sentences, counting to obtain the maximum word length, and marking the maximum word length as n for defining the word lengths of the m sentence sequences; the regulation means that in m sentence sequences, sequences with the word length smaller than n are supplemented with a number 0 to the word length n;
step 2, acquiring entity 1 meaning items and entity 2 meaning items corresponding to the sentences;
in m sentences, taking an entity 1 in each sentence as a search entry of an encyclopedia website;
1) if the search entry of the entity 1 is not recorded in the encyclopedia website, taking the entity 1 as an entity 1 meaning item corresponding to the entity 1;
2) if the search terms of the entity 1 are recorded in an encyclopedia website, acquiring all entity 1 semantic items corresponding to the entity 1 by utilizing a web crawler technology;
respectively calculating semantic similarity of each sentence and each entity 1 meaning item corresponding to the sentence, and reserving an entity 1 meaning item with highest similarity;
in m sentences, the entity 1 of each sentence corresponds to an entity 1 meaning item, and the corresponding entity 1 meaning item is an entity 1 meaning item with the highest similarity or the entity 1 itself;
processing an entity 1 meaning item corresponding to the entity 1 in each statement into a sequence Sense (e) which exists independently in the unit of word1)jJ is 1,2, … m; the processing into the sequence which exists independently by taking the word as a unit means that each word and punctuation in the sentence are regarded as an individual and are arranged in rows in turn; m entities 1 in m sentences correspond to m entity 1-Sense sequence sets, which are marked as { Sense (e)1)1,Sense(e1)2,…,Sense(e1)m};
For a sequence set of m entity 1 sense items, sequence numbers are sequentially added from the first word of the first entity 1 sense item to 1 in front of the individual, the repeated individual does not need to give the sequence numbers repeatedly, and the individual is marked according to the numbered number;
for the m entity 1 meaning item sequence sets, calculating the word length of each entity 1 meaning item sequence, counting to obtain the maximum word length, and marking as m1For specifying the word size of a sequence of m entity 1-significands; the definition refers to that in m entity 1 meaning item sequences, the word length is less than m1The sequence of sense items of entity 1 of (1) is complemented with a number 0 to a word length m1;
According to the mode of the entity 1, obtaining an entity 2 meaning item corresponding to the entity 2 in each statement, wherein the corresponding entity 2 meaning item is an entity 2 meaning item with highest similarity or the entity 2;
according to the mode of the entity 1, an entity 2 meaning item corresponding to each sentence entity 2 is processed into a sequence Sense (e) which exists independently in the unit of word2)jJ is 1,2, … m; m entities 2 in m sentences correspond to m entity 2-Sense sequence sets, which are marked as { Sense (e)2)1,Sense(e2)2,…,Sense(e2)m};
For a sequence set of m entity 2-sense items, sequence numbers are sequentially added from the first word of the first entity 2-sense item to 1 in front of the individuals, the repeated individuals do not need to give the sequence numbers repeatedly, and the individuals are marked according to the numbered numbers;
obtaining the maximum word length of m entity 2-meaning item sequence sets according to the mode of the entity 1, and marking as m2For specifying the word size of a sequence of m entity 2-sense items; the definition refers to that in m entity 2 meaning item sequences, the word length is less than m2The 2-sense sequence of entities is complemented by a number 0 to a word length m2;
Step 3, expanding the triple < statement, entity 1 meaning item and entity 2 meaning item >;
for each sentence sequence SjExtended as triplets<Sj,Sense(e1)j,Sense(e2)j>;
Step 4, mapping the three sequences in the triple into a word vector matrix;
said SjThe word vector matrix in (1) is formed by splicing word self vectors and distance vectors, Sense (e)1)jThe word vector matrix in (1), i.e. the word itself vector, Sense (e)2)jThe word vector matrix in (1), namely the vector of the word itself;
the distance vector is a distance vector from a word to an entity 1 and a distance vector from the word to an entity 2;
the splicing refers to adding dimensions of specified vectors to synthesize a vector;
step 5, sequence S in the triplejObtaining sentence characteristic vector based on characters by using Att-BLSTM learning, and recording the sentence characteristic vector as hc *;
Step 6, sequence S in the triplejFirstly, CNN is used for learning local features, then Att-BLSTM is used for learning to obtain sentence feature vector based on words, and the sentence feature vector is recorded as hw *;
Learning S with CNNjThe word vector matrix obtains a local feature vector, and the local feature vector represents semantic information between words in a sentence and is regarded as the feature of the word;
step 7, the sequence Sense (e) in the triple is processed1)jObtaining an entity 1-meaning item feature vector based on words by utilizing Att-BLSTM learning and marking as he1 *(ii) a For sequence Sense (e) in triplets2)jObtaining an entity 2-sense item feature vector based on words by utilizing Att-BLSTM learning and marking as he2 *;
Step 8, feature fusion;
splicing the sentence characteristic vector based on the characters and the sentence characteristic vector based on the words to obtain a characteristic vector of the sentence semantic information, and marking as hs *:
hs *=[hc *;hw *];
Splicing the word-basedThe entity 1 meaning item feature vector and the entity 2 meaning item feature vector based on the word obtain the feature vector of the entity semantic information, and the feature vector is marked as he *:
he *=[he1 *;he2 *]
H is to bes *Inputting the result into the hidden layer of the full-connection network to obtain a new sentence characteristic vector os;
H is to bee *Inputting the data into the hidden layer of the full-connection network to obtain a new semantic item feature vector oe;
To o issAnd oeThe final eigenvector o is obtained by weighted summation with weights ofηAnd 1- η;
step 9, extracting the relation;
inputting the final feature vector o into a softmax layer to obtain probability values of each class, wherein the class corresponding to the maximum probability value is a relation extraction result;
B. inputting a target Chinese sentence, and identifying a relation;
1) if a target Chinese sentence contains two marked entities, identifying the relationship between the entities in the target Chinese sentence;
2) if a target Chinese statement contains less than 2 marked entities, reporting errors;
3) if a target Chinese statement contains more than three marked entities, reporting an error;
if more than two target Chinese sentences exist, automatically cutting sentences and then identifying the relation between the entities in each target Chinese sentence according to the steps 1) -3).
2. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 1 comprises:
the m is 17227 and is all training samples in the san wen dataset.
3. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 2 comprises:
calculating the semantic similarity refers to calculating the similarity by using a cosine similarity algorithm;
the cosine similarity algorithm is to use Word2Vec method to make sentence sequence SjEach word in the sequence is mapped into a word vector, corresponding elements of the word vectors of all the words in the sequence are added and then divided by the total number of the word vectors to obtain a sequence SjThe vector of (a); the entity 1 sequence Sense (e) was obtained as described above1)jThe vector of (a); and calculating a cosine value between included angles of two vectors in a vector space to serve as a measure of the difference between the two sequences, wherein the cosine value is close to 1, the included angle tends to 0, which indicates that the two sequences are more similar, the cosine value is close to 0, and the included angle tends to 90 degrees, which indicates that the two sequences are more dissimilar.
4. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 4 comprises:
1) for the SjMapping into a basic word vector matrix, wherein the basic word vector of each word is formed by splicing the word vector and the distance vector; the splicing refers to adding dimensions of specified vectors to synthesize a vector;
the Word vectors are used for mapping each Word into a low-dimensional real number vector tx by using a Word2Vec methodiThe vector dimension is dw(ii) a Wherein, txiDenotes SjA word itself vector of the ith word; dwRepresenting the dimensions of the vector;
the distance vector is a distance vector from a word to an entity 1 and a distance vector from the word to an entity 2;
we define the distance of the ith word to entity 1 as pi 1The distance from the ith word to the entity 2 is defined as pi 2;pi 1And pi 2Same calculation method, pi 1The calculation formula is defined as follows:
where i represents the position index of the ith word, b1Initial position index representing entity 1, e1An end position index representing entity 1;
p obtained by calculationi 1And pi 2Mapping into low-dimensional vectors, respectively denoted as xi p1And xi p2Both vector dimensions are dd(ii) a Wherein x isi p1Represents the distance vector from the ith word to the entity 1; x is the number ofi p2Represents the distance vector of the ith word to the entity 2;
splicing the word vectors and the distance vectors to obtain a basic word vector of the ith word, and recording the basic word vector as vi=[txi;xi p1;xi p2]D is dw+2*dd(ii) a For the SjWe map it into a basic word vector matrix, denoted Sjv=[v1,v2,…,vi,…,vn]T(ii) a Wherein v is1Denotes SjThe basic word vector of the 1 st word; v. ofiDenotes SjThe basic word vector of the ith word; v. ofnDenotes SjThe basic word vector of the nth word; t represents the transposition of the matrix, since a basic word vector is a column vector with dimension d, the dimension of the matrix after transposition is n x d;
2) for the Sense (e)1)jMapping into a basic word vector matrix, wherein basic word vectors of each word are word vectors;
according to 1) the word itself vector, for the Sense (e)1)jWe map this to a basic word vector matrix, denoted as Sense (e)1)jv=[sx1,sx2,…,sxi,…,sxm1]T(ii) a Wherein, sx1Represents Sense (e)1)jBasic word vector of the 1 st word;sxiRepresents Sense (e)1)jThe basic word vector of the ith word; sxm1Represents Sense (e)1)jM in1A base word vector of words; t represents the transpose of the matrix, since a word vector is of dimension dwSo that the dimension of the matrix after transposition is m1*dw;
3) For the Sense (e)2)jMapping into a basic word vector matrix, wherein basic word vectors of each word are word vectors;
according to 1) the word itself vector, for the Sense (e)2)jWe map this to a basic word vector matrix, denoted as Sense (e)2)jv=[vx1,vx2,…,vxi,…,vxm2]T(ii) a Wherein, vx1Represents Sense (e)2)jThe basic word vector of the 1 st word; vxiRepresents Sense (e)2)jThe basic word vector of the ith word; vxm2Represents Sense (e)2)jM in2A base word vector of words; t represents the transpose of the matrix, since a basic word vector is of dimension dwSo that the dimension of the matrix after transposition is m2*dw。
5. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 5 comprises:
1) for the SjIs a basic word vector matrix Sjv=[v1,v2,…,vi,…,vn]TWe learn word features using Att-BLSTM; the Att-BLSTM is a bidirectional long-time and short-time memory network based on an attention mechanism;
at time t (t ═ 1,2, …, n), we input a basic word vector vtIn BLSTM, the word vector is learned from the forward direction and the reverse direction to obtain a forward implicit characteristic vector and a reverse implicit characteristic vector which are respectively marked asAndto pairAndadding each element in the two implicit characteristic vectors in a one-to-one correspondence manner to obtain a bidirectional implicit characteristic vector which is recorded asThe word vector matrix Sjv=[v1,v2,…,vi,…,vn]TObtaining the bidirectional implicit characteristic vector of each word after BLSTM and recording the bidirectional implicit characteristic vector asWherein,a bi-directional implicit feature vector representing word 1;a bi-directional implicit feature vector representing the ith word;a bi-directional implicit feature vector representing the nth word;
2) the attention mechanism is automatically HSjvThe bidirectional implicit characteristic vector of each word is distributed with a weight coefficient, the bidirectional implicit characteristic vector of each word is combined with the corresponding distributed weight coefficient to obtain a sentence characteristic vector based on the words through weighted summation operation, and the sentence characteristic vector is recorded as hc *。
6. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 6 comprises:
1) for the SjIs a basic word vector matrix Sjv=[v1,v2,…,vi,…,vn]TThe CNN is used for learning the words to obtain local feature vectors, and the vectors represent semantic information between the words in the sentences and are regarded as the features of the words; k different local feature vectors can be obtained through k different CNNs and are marked as Hw=[h1,h2,…,hi,…,hk](ii) a Wherein h is1Features representing the 1 st CNN derived word; h isiFeatures representing words derived by the ith CNN; h iskFeatures representing words derived by the kth CNN;
2) to Hw=[h1,h2,…,hi,…,hk]Using the Att-BLSTM learning word feature in step 5, a sentence feature vector based on words is obtained and is recorded as hw *。
7. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 7 comprises:
1) for the Sense (e)1)jWord vector matrix Sense (e)1)jv=[sx1,sx2,…,sxi,…,sxm1]TObtaining the entity 1 meaning item feature vector based on the words by using the Att-BLSTM learning entity 1 meaning item feature in the step 5, and marking as he1 *;
2) For the Sense (e)2)jWord vector matrix Sense (e)2)jv=[vx1,vx2,…,vxi,…,vxm2]TObtaining the entity 2-meaning item feature vector based on the words by using the Att-BLSTM learning entity 2-meaning item feature in the step 5, and marking as he2 *。
8. The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic item as claimed in claim 1, wherein the step 8 comprises:
the weight eta is 0.9, and is a hyperparameter obtained through training and continuous adjustment.
9. The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic item as claimed in claim 1, wherein the training comprises:
building a relation extraction device according to the steps 1-9, and randomly initializing all parameters in the model; during the whole training process of the model, 10 triples are input at a time from m triples<Sj,Sense(e1)j,Sense(e2)j>Training, wherein one training of all triples is finished and is recorded as a training process, and 100 training processes are carried out in total; continuously training the model by using a random gradient descent method to update parameters by taking the prediction output of the model and the cross entropy of the real relation label as a loss function; meanwhile, in order to prevent overfitting, a dropout mechanism is adopted in the training process, and each neuron is closed at a probability of 50% in the training process (namely, half of hidden layer nodes are randomly calculated in each training process and do not participate in calculation); and after the training is finished, obtaining a trained entity relation extraction device.
10. The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic terms as claimed in claim 1, wherein the input of the target Chinese sentence and the relationship identification comprises:
given a target Chinese sentence: if a target Chinese sentence contains two marked entities, directly identifying the relationship between the entities in the target Chinese sentence; if a target Chinese statement contains less than 2 marked entities, reporting errors; if a target Chinese statement contains more than three marked entities, reporting an error; if there are more than two target Chinese sentences, the relation between the entities in each target Chinese sentence is identified according to the step of one target Chinese sentence after the sentence is automatically broken.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911298675.8A CN111291556B (en) | 2019-12-17 | 2019-12-17 | Chinese entity relation extraction method based on character and word feature fusion of entity meaning item |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911298675.8A CN111291556B (en) | 2019-12-17 | 2019-12-17 | Chinese entity relation extraction method based on character and word feature fusion of entity meaning item |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111291556A CN111291556A (en) | 2020-06-16 |
CN111291556B true CN111291556B (en) | 2021-10-26 |
Family
ID=71021179
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911298675.8A Active CN111291556B (en) | 2019-12-17 | 2019-12-17 | Chinese entity relation extraction method based on character and word feature fusion of entity meaning item |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111291556B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112100346B (en) * | 2020-08-28 | 2021-07-20 | 西北工业大学 | Visual question-answering method based on fusion of fine-grained image features and external knowledge |
CN112364666B (en) * | 2020-11-12 | 2023-12-08 | 虎博网络技术(上海)有限公司 | Text characterization method and device and computer equipment |
CN112800756B (en) * | 2020-11-25 | 2022-05-10 | 重庆邮电大学 | Entity identification method based on PRADO |
CN112818683A (en) * | 2021-01-26 | 2021-05-18 | 山西三友和智慧信息技术股份有限公司 | Chinese character relationship extraction method based on trigger word rule and Attention-BilSTM |
CN112883738A (en) * | 2021-03-23 | 2021-06-01 | 西南交通大学 | Medical entity relation extraction method based on neural network and self-attention mechanism |
CN113392648B (en) * | 2021-06-02 | 2022-10-18 | 北京三快在线科技有限公司 | Entity relationship acquisition method and device |
CN113609846B (en) * | 2021-08-06 | 2022-10-04 | 首都师范大学 | Method and device for extracting entity relationship in statement |
CN113468344B (en) * | 2021-09-01 | 2021-11-30 | 北京德风新征程科技有限公司 | Entity relationship extraction method and device, electronic equipment and computer readable medium |
CN117610579B (en) * | 2024-01-19 | 2024-04-16 | 卓世未来(天津)科技有限公司 | Semantic analysis method and system based on long-short-term memory network |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107180247A (en) * | 2017-05-19 | 2017-09-19 | 中国人民解放军国防科学技术大学 | Relation grader and its method based on selective attention convolutional neural networks |
CN107194422A (en) * | 2017-06-19 | 2017-09-22 | 中国人民解放军国防科学技术大学 | A kind of convolutional neural networks relation sorting technique of the forward and reverse example of combination |
CN109344244A (en) * | 2018-10-29 | 2019-02-15 | 山东大学 | A kind of the neural network relationship classification method and its realization system of fusion discrimination information |
CN109710932A (en) * | 2018-12-22 | 2019-05-03 | 北京工业大学 | A kind of medical bodies Relation extraction method based on Fusion Features |
CN109918671A (en) * | 2019-03-12 | 2019-06-21 | 西南交通大学 | Electronic health record entity relation extraction method based on convolution loop neural network |
CN110334354A (en) * | 2019-07-11 | 2019-10-15 | 清华大学深圳研究生院 | A kind of Chinese Relation abstracting method |
CN110532549A (en) * | 2019-08-13 | 2019-12-03 | 青岛理工大学 | Text emotion analysis method based on dual-channel deep learning model |
-
2019
- 2019-12-17 CN CN201911298675.8A patent/CN111291556B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107180247A (en) * | 2017-05-19 | 2017-09-19 | 中国人民解放军国防科学技术大学 | Relation grader and its method based on selective attention convolutional neural networks |
CN107194422A (en) * | 2017-06-19 | 2017-09-22 | 中国人民解放军国防科学技术大学 | A kind of convolutional neural networks relation sorting technique of the forward and reverse example of combination |
CN109344244A (en) * | 2018-10-29 | 2019-02-15 | 山东大学 | A kind of the neural network relationship classification method and its realization system of fusion discrimination information |
CN109710932A (en) * | 2018-12-22 | 2019-05-03 | 北京工业大学 | A kind of medical bodies Relation extraction method based on Fusion Features |
CN109918671A (en) * | 2019-03-12 | 2019-06-21 | 西南交通大学 | Electronic health record entity relation extraction method based on convolution loop neural network |
CN110334354A (en) * | 2019-07-11 | 2019-10-15 | 清华大学深圳研究生院 | A kind of Chinese Relation abstracting method |
CN110532549A (en) * | 2019-08-13 | 2019-12-03 | 青岛理工大学 | Text emotion analysis method based on dual-channel deep learning model |
Non-Patent Citations (2)
Title |
---|
Attention-Based Bidirectional Long Short-Term Memory Networks for;Peng Zhou 等;《Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics》;20160812;207-212 * |
Combining Word-Level and Character-Level Representations for;Dongyun Liang 等;《Proceedings of the 2nd Workshop on Representation Learning for NLP》;20170803;43-47 * |
Also Published As
Publication number | Publication date |
---|---|
CN111291556A (en) | 2020-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111291556B (en) | Chinese entity relation extraction method based on character and word feature fusion of entity meaning item | |
CN111488734B (en) | Emotional feature representation learning system and method based on global interaction and syntactic dependency | |
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
CN109992783B (en) | Chinese word vector modeling method | |
CN108319686B (en) | Antagonism cross-media retrieval method based on limited text space | |
CN107992597B (en) | Text structuring method for power grid fault case | |
CN104834747B (en) | Short text classification method based on convolutional neural networks | |
CN112100346B (en) | Visual question-answering method based on fusion of fine-grained image features and external knowledge | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
CN109214006B (en) | Natural language reasoning method for image enhanced hierarchical semantic representation | |
CN107480132A (en) | A kind of classic poetry generation method of image content-based | |
CN113673254B (en) | Knowledge distillation position detection method based on similarity maintenance | |
CN106909537B (en) | One-word polysemous analysis method based on topic model and vector space | |
CN111027595A (en) | Double-stage semantic word vector generation method | |
CN113255366B (en) | Aspect-level text emotion analysis method based on heterogeneous graph neural network | |
CN114239585A (en) | Biomedical nested named entity recognition method | |
CN108470025A (en) | Partial-Topic probability generates regularization own coding text and is embedded in representation method | |
CN111723572B (en) | Chinese short text correlation measurement method based on CNN convolutional layer and BilSTM | |
CN115600597A (en) | Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium | |
CN110276396A (en) | Picture based on object conspicuousness and cross-module state fusion feature describes generation method | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
Li et al. | Multimodal fusion with co-attention mechanism | |
CN116720519B (en) | Seedling medicine named entity identification method | |
CN112100342A (en) | Knowledge graph question-answering method based on knowledge representation learning technology | |
CN114692615B (en) | Small sample intention recognition method for small languages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |