CN111291556B

CN111291556B - Chinese entity relation extraction method based on character and word feature fusion of entity meaning item

Info

Publication number: CN111291556B
Application number: CN201911298675.8A
Authority: CN
Inventors: 郝矿荣; 张江英; 唐雪嵩; 蔡欣; 陈磊; 王彤
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2021-10-26
Anticipated expiration: 2039-12-17
Also published as: CN111291556A

Abstract

The invention relates to a Chinese entity relation extraction method based on the character and word feature fusion of an entity semantic item. The method introduces the entity meaning item to expand the sentence into a triple < sentence, an entity 1 meaning item and an entity 2 meaning item >, enriches the input fine granularity, and respectively maps three sequences in the triple into a word vector matrix. And (3) inputting the sentences in the triples into two models in parallel, wherein one model learns word features through an attention-based bidirectional long-short time memory network (Att-BLSTM), and the other model learns local features through a Convolutional Neural Network (CNN) and then learns the word features through the Att-BLSTM. Learning the word-based entity 1 and word-based entity 2 semantic features using Att-BLSTM, respectively. And fusing the four features into a feature capable of comprehensively representing semantic information for relationship extraction. The method can avoid word segmentation errors, solve the problem of word ambiguity, effectively improve the accuracy of Chinese entity relation extraction, and can be widely applied to construction of knowledge maps.

Description

Chinese entity relation extraction method based on character and word feature fusion of entity meaning item

Technical Field

The invention belongs to the technical field and relates to a Chinese entity relation extraction method based on the character and word feature fusion of an entity semantic item.

Background

With the development of network technology, the informatization era relying on the forms of characters, images and the like is strong, and the acquisition of useful information from a large amount of unstructured text data is particularly important. The main purpose of entity relation extraction is to determine the relation between entity pairs in unstructured text based on entity recognitionCategories, and form structured data for storage and retrieval. For example, for a sample "[ orchid]_e1In the valley]_e2This is self-unknown. "the entity with two marks" orchid "and" valley ", the task of relation extraction is to obtain the semantic information of the sample by machine learning, to complete the identification of the relation between the entity pairs, to form the structured triple form<Orchid, localized, valley>And the method is used for constructing a large-scale knowledge map. The knowledge graph is a semantic network composed of concepts, entities, entity attributes and entity relationships, is a structured representation of the real world, and is widely applied to search systems. For Chinese, semantic relations are more complex, and the role of entity relation extraction is more obvious. Therefore, research is essential to extraction of chinese entity relationships.

Conventional relational extraction mainly includes a feature-based extraction method and a kernel-based extraction method. The feature-based approach is as its name implies to mine a large number of lexical, syntactic and semantic features and then identify relationships between entities in the text by selecting appropriate devices. However, the kernel-based approach is an effort on kernel design, and these approaches are usually based on dependency structure. Although both methods have proven to have good performance to some extent, the operation of both feature extraction and kernel design is overly dependent on the output of the NLP tool, which inevitably introduces some errors and degrades the model performance.

In recent years, deep learning is applied to relation extraction, Zeng et al firstly propose a Convolutional Neural Network (CNN) to be applied to semantic learning, and the research of deep learning enters a stage of white fever. However, due to the lack of the Chinese data set, the Chinese entity relation extraction research is not much, the existing Chinese entity relation extraction method is mainly realized by improving a model under the input of a word vector matrix, and the model excessively depends on the word segmentation quality. The mainstream network frameworks that currently exist are: multi-scale convolutional neural networks (Multi-scale CNNs), bidirectional long-short term memory networks (BLSTMs), and improved GRU networks, etc., and meanwhile, attention mechanism is widely applied thereto and achieves certain effect. However, these methods only focus on the improvement of the model itself, and ignore the fact that different input granularities will have a significant impact on the relational extraction model. Word-based models fail to utilize information of words, so the captured features are less than word-based models, and at the same time the word-based model performance is too dependent on word-segmentation quality. At present, some methods have been proposed to combine information of words and words in other natural language processing tasks, for example, Tai et al propose a tree-like LSTM model to improve semantic representation, and widely used in various tasks such as human action recognition, voice tagging, etc. In addition to the incomplete character and word feature representation of the Chinese text, the ambiguity of the Chinese word still seriously affects the task of relationship extraction. In other words, the above extraction methods cannot deal with word ambiguity as the language environment changes. Therefore, the invention provides the method for supporting the semantic information of the entity in the sentence by introducing the entity semantic item as external language knowledge, helps to solve the problem of the polysemous one word of the entity, and simultaneously constructs different networks for the input word vector matrix to respectively learn the word characteristics and the word characteristics, thereby enriching the fine input granularity.

Disclosure of Invention

The purpose of the invention is: aiming at a SanWen data set proposed by Beijing university, in order to reduce the dependency of the existing entity relationship extraction model on word segmentation quality and improve the performance of the model for correctly identifying entity semantic information, a word vector matrix is used for input, simultaneously entity semantic items are introduced to enrich input fine granularity, semantic information of sentences is expressed from a plurality of layers, and a relationship extraction device capable of simultaneously learning word characteristics, word characteristics and entity semantic item characteristics is constructed.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

the Chinese entity relation extraction method based on the character and word feature fusion of the entity semantic item is characterized by comprising the following steps of:

A. training;

step 1, sentence preprocessing;

taking m sentences in the SanWen data set as training samples, wherein the m sentences cover ten relations in the SanWen data set;

processing each of m sentences into a sequence S which exists separately in units of words_jJ is 1,2, … m; the processing into the sequence which exists independently by taking the word as a unit means that each word and punctuation in the sentence are regarded as an individual and are arranged in rows in turn; the set of sequences of m statements is denoted S₁,S₂,…,S_m}；

Sequentially adding sequence numbers to the sequence set of m sentences from the first word of the first sentence to 1 before the individual; the repeated individuals do not need to be given serial numbers repeatedly, and the individuals are marked according to the serial numbers;

calculating the word length of each sentence sequence for the sequence set of the m sentences, counting to obtain the maximum word length, and marking the maximum word length as n for defining the word lengths of the m sentence sequences; the regulation means that in m sentence sequences, sequences with the word length smaller than n are supplemented with a number 0 to the word length n;

step 2, acquiring entity 1 meaning items and entity 2 meaning items corresponding to the sentences;

in m sentences, taking an entity 1 in each sentence as a search entry of an encyclopedia website;

1) if the search entry of the entity 1 is not recorded in the encyclopedia website, taking the entity 1 as an entity 1 meaning item corresponding to the entity 1;

2) if the search terms of the entity 1 are recorded in an encyclopedia website, acquiring all entity 1 semantic items corresponding to the entity 1 by utilizing a web crawler technology;

respectively calculating semantic similarity of each sentence and each entity 1 meaning item corresponding to the sentence, and reserving an entity 1 meaning item with highest similarity;

in m sentences, the entity 1 of each sentence corresponds to an entity 1 meaning item, and the corresponding entity 1 meaning item is an entity 1 meaning item with the highest similarity or the entity 1 itself;

processing an entity 1 meaning item corresponding to the entity 1 in each statement into a sequence Sense (e) which exists independently in the unit of word₁)_jJ is 1,2, … m; is processed intoThe sequence with the word as the unit to exist independently refers to that each word and punctuation in the sentence are regarded as an individual and are arranged in rows in sequence; m entities 1 in m sentences correspond to m entity 1-Sense sequence sets, which are marked as { Sense (e)₁)₁,Sense(e₁)₂,…,Sense(e₁)_m}；

For a sequence set of m entity 1 sense items, sequence numbers are sequentially added from the first word of the first entity 1 sense item to 1 in front of the individual, the repeated individual does not need to give the sequence numbers repeatedly, and the individual is marked according to the numbered number;

for the m entity 1 meaning item sequence sets, calculating the word length of each entity 1 meaning item sequence, counting to obtain the maximum word length, and marking as m₁For specifying the word size of a sequence of m entity 1-significands; the definition refers to that in m entity 1 meaning item sequences, the word length is less than m₁The sequence of sense items of entity 1 of (1) is complemented with a number 0 to a word length m₁；

According to the mode of the entity 1, obtaining an entity 2 meaning item corresponding to the entity 2 in each statement, wherein the corresponding entity 2 meaning item is an entity 2 meaning item with highest similarity or the entity 2;

according to the mode of the entity 1, an entity 2 meaning item corresponding to each sentence entity 2 is processed into a sequence Sense (e) which exists independently in the unit of word₂)_jJ is 1,2, … m; m entities 2 in m sentences correspond to m entity 2-Sense sequence sets, which are marked as { Sense (e)₂)₁,Sense(e₂)₂,…,Sense(e₂)_m}；

For a sequence set of m entity 2-sense items, sequence numbers are sequentially added from the first word of the first entity 2-sense item to 1 in front of the individuals, the repeated individuals do not need to give the sequence numbers repeatedly, and the individuals are marked according to the numbered numbers;

obtaining the maximum word length of m entity 2-meaning item sequence sets according to the mode of the entity 1, and marking as m₂For specifying the word size of a sequence of m entity 2-sense items; the definition refers to that in m entity 2 meaning item sequences, the word length is less than m₂The entity 2 sense sequence of (2) is supplemented with the number 0To a word length of m₂；

Step 3, expanding the triple < statement, entity 1 meaning item and entity 2 meaning item >;

for each sentence sequence S_jExtended as triplets<S_j，Sense(e₁)_j，Sense(e₂)_j>；

Step 4, mapping the three sequences in the triple into a word vector matrix;

said S_jThe word vector matrix in (1) is formed by splicing word self vectors and distance vectors, Sense (e)₁)_jThe word vector matrix in (1), i.e. the word itself vector, Sense (e)₂)_jThe word vector matrix in (1), namely the vector of the word itself;

the distance vector is a distance vector from a word to an entity 1 and a distance vector from the word to an entity 2;

the splicing refers to adding dimensions of specified vectors to synthesize a vector;

step 5, sequence S in the triple_jObtaining sentence characteristic vector based on characters by using Att-BLSTM learning, and recording the sentence characteristic vector as h_c ^*；

Step 6, sequence S in the triple_jFirstly, CNN is used for learning local features, then Att-BLSTM is used for learning to obtain sentence feature vector based on words, and the sentence feature vector is recorded as h_w ^*；

Learning S with CNN_jThe word vector matrix obtains a local feature vector, and the local feature vector represents semantic information between words in a sentence and is regarded as the feature of the word;

step 7, the sequence Sense (e) in the triple is processed₁)_jObtaining an entity 1-meaning item feature vector based on words by utilizing Att-BLSTM learning and marking as h_e1 ^*(ii) a For sequence Sense (e) in triplets₂)_jObtaining an entity 2-sense item feature vector based on words by utilizing Att-BLSTM learning and marking as h_e2 ^*；

Step 8, feature fusion;

concatenating the word-based sentence feature vectors and basesThe sentence characteristic vector of the word obtains the characteristic vector of the sentence semantic information, and the characteristic vector is marked as h_s ^*：

h_s ^*＝[h_c ^*；h_w ^*]；

Splicing the word-based entity 1-meaning item feature vector and the word-based entity 2-meaning item feature vector to obtain a feature vector of entity semantic information, and marking the feature vector as h_e ^*：

h_e ^*＝[h_e1 ^*；h_e2 ^*]

H is to be_s ^*Inputting the result into the hidden layer of the full-connection network to obtain a new sentence characteristic vector o_s；

H is to be_e ^*Inputting the data into the hidden layer of the full-connection network to obtain a new semantic item feature vector o_e；

To o is_sAnd o_eWeighting and summing to obtain final characteristic vector o, wherein weights are eta and_1-η。

step 9, extracting the relation;

inputting the final feature vector o into a softmax layer to obtain probability values of each class, wherein the class corresponding to the maximum probability value is a relation extraction result;

B. inputting a target Chinese sentence, and identifying a relation;

1) if a target Chinese sentence contains two marked entities, identifying the relationship between the entities in the target Chinese sentence;

2) if a target Chinese statement contains less than 2 marked entities, reporting errors;

3) if a target Chinese statement contains more than three marked entities, reporting an error;

if more than two target Chinese sentences exist, automatically cutting sentences and then identifying the relation between the entities in each target Chinese sentence according to the steps 1) -3).

The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 1 comprises:

the m is 17227 and is all training samples in the san wen dataset.

The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 2 comprises:

calculating semantic similarity refers to calculating similarity by using a cosine similarity algorithm;

the cosine similarity algorithm is to use Word2Vec method to make sentence sequence S_jEach word in the sequence is mapped into a word vector, corresponding elements of the word vectors of all the words in the sequence are added and then divided by the total number of the word vectors to obtain a sequence S_jThe vector of (a); the entity 1 sequence Sense (e) was obtained as described above₁)_jThe vector of (a); and calculating a cosine value between included angles of two vectors in a vector space to serve as a measure of the difference between the two sequences, wherein the cosine value is close to 1, the included angle tends to 0, which indicates that the two sequences are more similar, the cosine value is close to 0, and the included angle tends to 90 degrees, which indicates that the two sequences are more dissimilar.

The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 4 comprises:

1) for the S_jMapping into a basic word vector matrix, wherein the basic word vector of each word is formed by splicing the word vector and the distance vector; the splicing refers to adding dimensions of specified vectors to synthesize a vector;

the Word vectors are used for mapping each Word into a low-dimensional real number vector tx by using a Word2Vec method_iThe vector dimension is d^w. Wherein, tx_iDenotes S_jA word itself vector of the ith word; d^wRepresenting the dimensions of the vector.

The distance vector is a distance vector from a word to an entity 1 and a distance vector from a word to an entity 2.

We define the distance of the ith word to entity 1 as p_i ¹The distance from the ith word to the entity 2 is defined as p_i ²。p_i ¹And p_i ²Same calculation method, p_i ¹The calculation formula is defined as follows:

where i represents the position index of the ith word, b¹Initial position index representing entity 1, e¹Indicating the end position index of entity 1.

P obtained by calculation_i ¹And p_i ²Mapping into low-dimensional vectors, respectively denoted as x_i ^p1And x_i ^p2Both vector dimensions are d^d. Wherein x is_i ^p1Represents the distance vector from the ith word to the entity 1; x is the number of_i ^p2Representing the distance vector of the ith word to entity 2.

Splicing the word vectors and the distance vectors to obtain a basic word vector of the ith word, and recording the basic word vector as v_i＝[tx_i；x_i ^p1；x_i ^p2]D is d^w+2*d^d. For the S_jWe map it into a basic word vector matrix, denoted S_jv＝[v₁,v₂,…,v_i,…,v_n]^T. Wherein v is₁Denotes S_jThe basic word vector of the 1 st word; v. of_iDenotes S_jThe basic word vector of the ith word; v. of_nDenotes S_jThe basic word vector of the nth word; t represents the transpose of the matrix, and since a basic word vector is a column vector with dimension d, the dimension of the matrix after transpose is n × d.

2) For the Sense (e)₁)_jMapping into a matrix of basic word vectors, the basic word vector of each word, i.e. the word itself vector.

According to 1) the word itself vector, for the Sense (e)₁)_jWe map this to a basic word vector matrix, denoted as Sense (e)₁)_jv＝[sx₁,sx₂,…,sx_i,…,sx_m1]^T. Wherein, sx₁Represents Sense (e)₁)_jThe basic word vector of the 1 st word; sx_iRepresents Sense (e)₁)_jThe basic word vector of the ith word; sx_m1Represents Sense (e)₁)_jM in₁A base word vector of words; t represents the transpose of the matrix, since a word vector is of dimension d^wSo that the dimension of the matrix after transposition is m₁*d^w。

3) For the Sense (e)₂)_jMapping into a matrix of basic word vectors, the basic word vector of each word, i.e. the word itself vector.

According to 1) the word itself vector, for the Sense (e)₂)_jWe map this to a basic word vector matrix, denoted as Sense (e)₂)_jv＝[vx₁,vx₂,…,vx_i,…,vx_m2]^T. Wherein, vx₁Represents Sense (e)₂)_jThe basic word vector of the 1 st word; vx_iRepresents Sense (e)₂)_jThe basic word vector of the ith word; vx_m2Represents Sense (e)₂)_jM in₂A base word vector of words; t represents the transpose of the matrix, since a basic word vector is of dimension d^wSo that the dimension of the matrix after transposition is m₂*d^w。

The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 5 comprises:

1) for the S_jIs a basic word vector matrix S_jv＝[v₁,v₂,…,v_i,…,v_n]^TWe learn word features using Att-BLSTM. The Att-BLSTM refers to a bidirectional long-time memory network based on an attention mechanism.

At time t (t ═ 1,2, …, n), we input a basic word vector v_tTo BLSTM, learning the word direction from the forward and reverse directionsMeasuring to obtain a forward implicit characteristic vector and a reverse implicit characteristic vector, and respectively recording the forward implicit characteristic vector and the reverse implicit characteristic vector as

And

to pair

And

adding each element in the two implicit characteristic vectors in a one-to-one correspondence manner to obtain a bidirectional implicit characteristic vector which is recorded as

The word vector matrix S_jv＝[v₁,v₂,…,v_i,…,v_n]^TObtaining the bidirectional implicit characteristic vector of each word after BLSTM and recording the bidirectional implicit characteristic vector as

Wherein,

a bi-directional implicit feature vector representing word 1;

a bi-directional implicit feature vector representing the ith word;

a bi-directional implicit feature vector representing the nth word.

2) The attention mechanism is automatically H_SjvThe bidirectional implicit characteristic vector of each word is distributed with a weight coefficient, the bidirectional implicit characteristic vector of each word is combined with the corresponding distributed weight coefficient to obtain a sentence characteristic vector based on the words through weighted summation operation, and the sentence characteristic vector is recorded as h_c ^*。

The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 6 comprises:

1) for the S_jIs a basic word vector matrix S_jv＝[v₁,v₂,…,v_i,…,v_n]^TThe CNN is used for learning the words, local feature vectors are obtained, the vectors represent semantic information between the words in the sentences and are regarded as the features of the words. K different local feature vectors can be obtained through k different CNNs and are marked as H^w＝[h¹,h²,…,hⁱ,…,h^k]. Wherein h is¹Features representing the 1 st CNN derived word; h isⁱFeatures representing words derived by the ith CNN; h is^kRepresenting the characteristics of the word derived by the kth CNN.

2) To H^w＝[h¹,h²,…,hⁱ,…,h^k]Using the Att-BLSTM learning word feature in step 5, a sentence feature vector based on words is obtained and is recorded as h_w ^*

The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 7 comprises:

1) for the Sense (e)₁)_jWord vector matrix Sense (e)₁)_jv＝[sx₁,sx₂,…,sx_i,…,sx_m1]^TObtaining the entity 1 meaning item feature vector based on the words by using the Att-BLSTM learning entity 1 meaning item feature in the step 5, and marking as h_e1 ^*。

2) For the Sense (e)₂)_jWord vector matrix Sense (e)₂)_jv＝[vx₁,vx₂,…,vx_i,…,vx_m2]^TObtaining the entity 2-meaning item feature vector based on the words by using the Att-BLSTM learning entity 2-meaning item feature in the step 5, and marking as h_e2 ^*。

The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic item as claimed in claim 1, wherein the step 8 comprises:

the weight eta is 0.9, and is a hyperparameter obtained through training and continuous adjustment.

The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic item as claimed in claim 1, wherein the training comprises:

and (4) building a relation extraction device according to the steps 1-9, and randomly initializing all parameters in the model. During the whole training process of the model, 10 triples are input at a time from m triples<S_j，Sense(e₁)_j，Sense(e₂)_j>Training (i.e. batch _ size ═ 10), and completing one training for all triples is recorded as one training process, and a total of 100 such training processes are performed (i.e. epoch ═ 100); continuously training the model by using a random gradient descent method to update parameters by taking the prediction output of the model and the cross entropy of the real relation label as a loss function; meanwhile, in order to prevent overfitting, a dropout mechanism is adopted in the training process, and each neuron is closed at a probability of 50% in the training process (namely, half of hidden layer nodes are randomly calculated in each training process and do not participate in calculation); and after the training is finished, obtaining a trained entity relation extraction device.

The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic terms as claimed in claim 1, wherein the input of the target Chinese sentence and the relationship identification comprises:

given a target Chinese sentence: if a target Chinese sentence contains two marked entities, directly identifying the relationship between the entities in the target Chinese sentence; if a target Chinese statement contains less than 2 marked entities, reporting errors; if a target Chinese statement contains more than three marked entities, reporting an error; if there are more than two target Chinese sentences, the relation between the entities in each target Chinese sentence is identified according to the step of one target Chinese sentence after the sentence is automatically broken.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention can avoid errors caused by Chinese word segmentation by utilizing the input of the word vector, constructs a network framework for simultaneously capturing word characteristics, word characteristics and entity meaning item characteristics based on the input of the word vector in order to comprehensively capture sentence characteristics and solve the problem of ambiguity of an entity word, and represents semantic information from a plurality of layers. The method is practical, pays attention to the fact that fine granularity of text input has great influence on relation extraction, and obtains high precision.

Drawings

FIG. 1 is a system framework diagram of an entity relationship extraction model proposed in the present invention.

Fig. 2 is a flowchart of an entity relationship extraction method proposed in the present invention.

FIG. 3 is a flow diagram of a Baidu encyclopedia import entity-sense item.

FIG. 4 is a schematic diagram of an Att-BLSTM network.

Fig. 5 is a schematic illustration of an attention mechanism.

FIG. 6 is a diagram of sentence feature vectors at the learning level.

Fig. 7 is a schematic diagram of a CNN network.

Fig. 8 is a schematic diagram of sentence feature vectors at the learning word level.

FIG. 9 is a graph of performance comparison experiments for word features and word features as set forth in the present invention.

Detailed Description

The invention will be further illustrated with reference to specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

The Chinese entity relation extraction method based on the character and word feature fusion of the entity semantic item is characterized by comprising the following steps of: learning character features for the input character vector matrix by using an Att-BLSTM neural network; meanwhile, performing convolution operation on the input word vector matrix by using a CNN network to generate word vectors, and learning word characteristics by using an Att-BLSTM neural network; and introducing entity semantic items, and automatically learning semantic item features by using an Att-BLSTM neural network. Meanwhile, the character characteristics, the word characteristics and the meaning item characteristics are fused, the input fine granularity can be enriched, the semantic information can be fully represented, the influence caused by Chinese word segmentation errors can be avoided by character vector input, and ambiguity caused by entity ambiguity can be eliminated by introducing the entity meaning item (see fig. 1 and fig. 2). The method comprises the following steps:

step 1, statement preprocessing

1) M sentences were taken as training samples from the SanWen dataset, covering ten relationships in the SanWen dataset (see table 1). Of the m statements, each with a known relationship label and two marked entities, the m takes 17227.

TABLE 1

2) In m sentences, each word and punctuation mark in each sentence are regarded as an individual and are arranged in rows in turn to obtain a sequence S which independently exists by taking the word as a unit_jThe set of sequences of m statements is denoted as { S₁,S₂,…,S _m1,2, … m, the sequence set is used to build a word table.

3) And sequentially adding sequence numbers to the sequence set of the sentences from 1 before the individuals, and marking the individuals which repeatedly appear according to the coded numbers to obtain a word list of the sequence set. And calculating the word length of each statement sequence in the statement sequence set, counting to obtain the maximum word length, and marking as n for specifying the word lengths of the m statement sequences. The stated regulation means that in m sentence sequences, the word length less than n is supplemented with the number 0 to the word length n.

Step 2, acquiring entity 1 meaning item and entity 2 meaning item corresponding to the statement

1) The invention creatively provides the method for introducing the entity meaning item into the relation extraction task, provides additional support information for the entity in the sentence and can help solve the problem of the entity word ambiguity. The definition of a meaning term refers to the division of the rational meaning of a word, and a word often has multiple meanings, and each meaning is a meaning term.

2) In m sentences, taking an entity 1 in each sentence as a search entry of an encyclopedia website: if the search entry of the entity 1 is not recorded in the encyclopedia website, taking the entity 1 as an entity 1 meaning item corresponding to the entity 1; if the search term of the entity 1 is recorded in the encyclopedia website, all entity 1 semantic items corresponding to the entity 1 are obtained by using a web crawler technology, semantic similarity between each sentence and each entity 1 semantic item corresponding to the sentence is calculated, and the entity 1 semantic item with the highest similarity is reserved (as shown in fig. 3). In m sentences, the entity 1 of each sentence corresponds to one entity 1 meaning item, and m entity 1 in m sentences correspond to m entity 1 meaning items.

3) For the m entity 1 meaning items, each word and punctuation mark in each entity 1 meaning item are regarded as an individual and are arranged in rows in turn, and the individual word and punctuation mark are processed into a sequence Sense (e) which exists independently by taking the word as a unit₁)_jThe set of sequences of m entity 1-Sense items is denoted as { Sense (e)₁)₁,Sense(e₁)₂,…,Sense(e₁)_mJ is 1,2, … m. And adding sequence numbers to the sequence set of the entity 1 sense items from 1 in sequence before the individuals, and marking the repeated individuals according to the coded numbers to obtain a word table of the sequence set. For the sequence set of the entity 1 sense items, calculating the word length of each entity 1 sense item sequence, counting to obtain the maximum word length, and marking as m₁For defining the word size of a sequence of m entity 1-significands. The stated rule is that in m entity 1 meaning item sequences, the word length is less than m₁Is complemented by a number 0 to a word length m₁。

4) According to the method of 2), obtaining m sentences, wherein the entity 2 of each sentence corresponds to one entity 2-sense item, and m entities 2 in the m sentences correspond to m entity 2-sense items.

5) According to the method of 3), obtaining a sequence Sense (e) of the entity 2-Sense₂)_jThe set of sequences of m entity 2-Sense items is denoted as { Sense (e)₂)₁,Sense(e₂)₂,…,Sense(e₂)_m1,2, … m; obtaining a word table of an entity 2 meaning item sequence set; obtaining the maximum word length of the entity 2 meaning item sequence set and marking as m₂For defining the word size of a sequence of m entity 2-sense entries.

Step 3, extending the triple < statement, entity 1 meaning item, entity 2 meaning item >

In m sentences, for each sentence sequence S_jExtended as triplets<S_j，Sense(e₁)_j，Sense(e₂)_j>In which S is_jHas a word length of n, Sense (e)₁)_jHas a word length of m₁，Sense(e₂)_jHas a word length of m₂。

Step 4, mapping the three sequences in the triple into a word vector matrix

Each individual in the sentence is mapped into a low-dimensional vector, so that the problem of word segmentation errors can be avoided.

1) For the S_jMapping into a basic word vector matrix, wherein the basic word vector of each word is formed by splicing the word vector and the distance vector. The splicing refers to adding dimensions of specified vectors to synthesize a vector;

the Word itself vector, we use Word2Vec method to map each Word into a low dimensional real vector tx_iThe vector dimension is d^w. Wherein, tx_iDenotes S_jA word itself vector of the ith word; d^wRepresenting the dimensions of the vector.

The distance vector refers to a distance vector from a word to an entity 1 and a distance vector from a word to an entity 2.

We define p_i ¹And p_i ²Representing the relative distance of the ith word to entities 1 and 2, respectively. p is a radical of_i ¹And p_i ²Same calculation method, p_i ¹The calculation formula is defined as follows:

wherein i represents a position index of the ith word; b¹An initial position index representing entity 1; e.g. of the type¹Indicating the end position index of entity 1.

Obtaining the relative distances p from the ith word to the entity 1 and the entity 2 respectively_i ¹And p_i ²Then, the two values are mapped by using low-dimensional vectors, and are respectively marked as x_i ^p1And x_i ^p2Both vector dimensions are d^d. Wherein x is_i ^p1Represents the distance vector from the ith word to the entity 1; x is the number of_i ^p2Representing the distance vector of the ith word to entity 2.

Splicing the word vector and two distance vectors to obtain a basic word vector of the ith word, and marking as v_i＝[tx_i；x_i ^p1；x_i ^p2]D is d^w+2*d^d. Wherein, tx_iDenotes S_jA word itself vector of the ith word; x is the number of_i ^p1Represents the distance vector from the ith word to the entity 1; x is the number of_i ^p2Represents the distance vector of the ith word to the entity 2; v. of_iThe basic word vector representing the ith word is a column vector of dimension d.

For the S_jWe map it into a basic word vector matrix, denoted S_jv＝[v₁,v₂,…,v_i,…,v_n]^T. Wherein v is₁Denotes S_jThe basic word vector of the 1 st word; v. of_iDenotes S_jThe basic word vector of the ith word; v. of_nDenotes S_jThe basic word vector of the nth word; t represents the transpose of the matrix, and since a basic word vector is a column vector with dimension d, the dimension of the matrix after transpose is n × d.

The Word itself vector, we use the Word2Vec methodConverting each word into a low-dimensional real vector sx_iThe vector dimension is denoted as d^w. For the Sense (e)₁)_jWe map this to a basic word vector matrix, denoted as Sense (e)₁) _jv＝[sx₁,sx₂,…,sx_i,…,sx_m1]^T. Wherein, sx₁Represents Sense (e)₁)_jWord itself vector of the 1 st word; sx_iRepresents Sense (e)₁)_jA word itself vector of the ith word; sx_m1Represents Sense (e)₁)_jM in₁A word itself vector of words; t represents the transpose of the matrix, since a word vector is of dimension d^wSo that the dimension of the matrix after transposition is m₁*d^w。

The Word itself vector, we use Word2Vec method to convert each Word into a low-dimensional real vector vx_iThe vector dimension is denoted as d^w. For the Sense (e)₂)_jWe map this to a basic word vector matrix, denoted as Sense (e)₂) _jv＝[vx₁,vx₂,…,vx_i,…,vx_m2]^T. Wherein, vx₁Represents Sense (e)₂)_jWord itself vector of the 1 st word; vx_iRepresents Sense (e)₂)_jA word itself vector of the ith word; vx_m2Represents Sense (e)₂)_jM in₂A word itself vector of words; t represents the transpose of the matrix, since a word vector is of dimension d^wSo that the dimension of the matrix after transposition is m₂*d^w。

Step 5, sequence S in the triple_jObtaining sentence characteristic vector based on characters by using Att-BLSTM learning, and recording the sentence characteristic vector as h_c ^*

1) The Att-BLSTM refers to a bidirectional long-short time memory network based on an attention mechanism (as shown in FIG. 4).

The existing LSTM is utilized to learn long-distance semantic information to generate implicit feature vectors, and the specific calculation formula is as follows:

c_t＝i_tg_t+f_tc_t-1

h_t＝o_ttanh(c_t)

wherein x is_tIndicating input of LSTM at time t, h_t-1Representing implicit feature vectors, c, corresponding to the LSTM output at the previous instant_t-1Indicating the state of the cells corresponding to the LSTM at the previous time; i.e. i_tInput gate, Wx, representing LSTM_i，Wh_i，Wc_iIs a weight matrix corresponding to the input gate, b_iIs the offset parameter corresponding to the input gate, sigma represents sigmoid function; f. of_tForgetting gate, Wx, representing LSTM_f，Wh_f，Wc_fIs the weight matrix corresponding to the forgetting gate, b_fIs a bias parameter corresponding to the forgetting gate; wx_c，Wh_c，Wc_cIs a weight matrix corresponding to the candidate gate, b_cIs a bias parameter corresponding to the candidate gate, tanh represents a hyperbolic tangent function, c_tIs the current cell state; wx_o，Wh_o，Wc_oIs a weight matrix corresponding to the output gate, b_oIs the offset parameter, h, corresponding to the output gate_tIs the implicit feature vector output at time t.

The sentence sequence is actually a time sequence for which S is_jWord vector matrix S_jv＝[v₁,v₂,…,v_i,…,v_n]^TAt time t (t ═ 1,2, …, n), we input a basic word vector v_tThe implicit feature vector corresponding to the word vector, i.e. h corresponding to the formula, can be obtained from the forward direction in the LSTM_t. Wherein v is_tInput x corresponding to time t of LSTM_t；h_tAnd (4) representing the implicit characteristic vector learned by the t word.

2) In order to capture past and future semantic information of a sequence, learning is respectively carried out from the forward direction and the reverse direction to obtain a forward implicit characteristic vector and a reverse implicit characteristic vector of a word vector, and the forward implicit characteristic vector and the reverse implicit characteristic vector are respectively marked as

And

to pair

And

adding each element in the two vectors in a one-to-one correspondence manner to obtain a bidirectional implicit characteristic vector which is recorded as

Wherein,

to representA bidirectional implicit feature vector of word 1;

a bi-directional implicit feature vector representing the ith word;

a bi-directional implicit feature vector representing the nth word.

3) In a sequence, each word is different in the degree of importance to the sequence semantics, some words play a critical role, and some words hardly play a role. The attention mechanism can learn the importance degree of each word to the sequence semantics, and automatically allocates a weight coefficient to each word to measure the importance degree of the word, wherein the attention mechanism has the following calculation formula:

M＝tanh(H)

α＝softmax(ω^TM)

r＝Hα^T

h^*＝tanh(r)

wherein H represents the output of BLSTM, i.e. said H_SjvThe dimension of the matrix is d^aN. Wherein d is^aRepresenting the dimension of the bidirectional implicit characteristic vector, and n represents the word length; ω is a randomly initialized vector with dimension d^a，ω^TRepresenting a transpose of the vector; alpha represents a weight vector obtained by learning, and the dimension is n; r is a feature vector obtained by performing a linear weighted summation operation on the input matrix, and the dimension is d^a；h^*Is a sentence feature vector obtained by the operation of tanh function on r, and the dimension is d^a。

With the attention mechanism (see FIG. 5), it is automatically H_SjvThe bidirectional implicit characteristic vector of each word is distributed with a weight coefficient, the bidirectional implicit characteristic vector of each word is combined with the corresponding distributed weight coefficient to obtain a sentence characteristic vector based on the words through weighted summation operation, and the sentence characteristic vector is recorded as h_c ^*(see FIG. 6).

Step 6, sequence S in the triple_jFirstly using CNN to learn local characteristics, then using Att-BLSTM to learn so as to obtainWord-based sentence feature vector, denoted as h_w ^*

1) For the sequence S_jWord vector matrix S_jv＝[v₁,v₂,…,v_i,…,v_n]^TFirst pass a pass weight vector

The parameterized filter (CNN) performs a convolution operation (see fig. 7).

Wherein, ω is^kRepresents the kth filter; c x d denotes the size of the filter, d denotes the length of the filter, and corresponds to the dimension of the word vector, and c denotes the width of the filter. Input word vector matrix S_v＝[v₁,v₂,…,v_i,…,v_n]^TThe output of the kth filter convolution layer can be obtained by the following calculation formula:

h^k＝f(ω^kv^i:i+c-1+b^k)

wherein v is^i:i+c-1Denotes vⁱ…v^i+c-1Cascading feature vectors; i ═ 1,2, …, n-c +1, f denotes the ReLu activation function;

is a biased term, ω^kAnd b^kIs a parameter learned during training and all i ═ 1,2, …, n-c +1 will remain the same for a k; h is^kThe local feature vector, representing the word output by the kth filter, has a dimension of n-c + 1.

The sequence S_jEvery time the word vector matrix is convoluted by a filter, a local feature vector can be obtained and is marked as h^kThe vector characterizes local semantic information between words in the sentence and is regarded as a feature vector of the words. Each convolution is due to the learned ω of the respective filter^kAnd b^kDifferent parameters can learn different semantic information. Through k different filters, we can obtain k different local feature vectors, denoted as H^w＝[h¹,h²,…,hⁱ,…,h^k]. Wherein h is¹A feature vector representing the word output by the 1 st filter; h isⁱA feature vector representing the word output by the ith filter; h is^kA feature vector representing the word output by the k-th filter.

2) To H^w＝[h¹,h²,…,hⁱ,…,h^k]Then, we use the Att-BLSTM learning word feature in step 5 to obtain the sentence feature vector based on words, which is recorded as h_w ^*As shown in fig. 8.

Step 7, for the sequence Sense in the triple (e1)_jObtaining an entity 1-meaning item feature vector based on words by utilizing Att-BLSTM learning and marking as h_e1 ^*(ii) a For sequence in triplet Sense (e2)_jObtaining an entity 2-sense item feature vector based on words by utilizing Att-BLSTM learning and marking as h_e2 ^*。

1) Sense (e1) for the entity 1 Sense sequence_jWord vector matrix Sense (e1)_jv＝[sx₁,sx₂,…,sx_i,…,sx_m1]^TObtaining an entity 1 meaning item feature vector based on words by using the Att-BLSTM learning entity 1 meaning item feature in the step 5, and marking the vector as h_e1 ^*。

2) Sense (e2) for the entity 2 Sense sequence_jWord vector matrix Sense (e2)_jv＝[vx₁,vx₂,…,vx_i,…,vx_m2]^TObtaining an entity 2-meaning item feature vector based on words by using the Att-BLSTM learning entity 2-meaning item feature in the step 5, and marking the vector as h_e2 ^*。

Step 8, feature fusion

The invention creatively constructs three models of the steps 5-7 to learn the character characteristics, the word characteristics and the entity meaning item characteristics, expresses semantic information on multiple sides, enriches the input fine granularity and effectively improves the accuracy of relation extraction.

Concatenating the word-based sentence feature vector h_c ^*And word-based sentence feature vector h_w ^*Obtaining the feature vector of the semantic information of the sentence, and marking as h_s ^*＝[h_c ^*；h_w ^*]. Concatenating the word-based entity 1-sense feature vector h_e1 ^*And a word-based entity 2-sense feature vector h_e2 ^*Obtaining the feature vector of entity semantic information, and marking as h_e ^*＝[h_e1 ^*；h_e2 ^*]. The feature vector h of the sentence semantic information obtained by splicing_s ^*Inputting the result into the hidden layer of the full-connection network to obtain a new sentence characteristic vector o_sThe feature vector h of the entity semantic information_e ^*Input to the hidden layer of a fully-connected network, a new semantic item feature vector o_eTo o, to_sAnd o_eThe weighted summation obtains the final characteristic vector o, and the weights are respectively recorded as_ηAnd 1- η; the above-mentioned_η0.9 is taken.

Step 9, relation extraction

Inputting the final feature vector o into a softmax layer, and calculating the probability that the statement belongs to each class:

p(y)＝softmax(o)

wherein p (y) refers to the probability value of each class of sentence;

representing the maximum probability value. The category corresponding to the maximum probability value is the relationship category extracted by the relationship extracting means.

The loss function is defined as the cross entropy of the true class label and the predicted class:

wherein,

is a one-hot code representation of the authentic tag,

is the estimated probability of each category obtained by the relation extraction device; m represents the total number of categories, there being a total of 10 categories of relationships in the SanWen dataset; λ is the L2 regularization parameter, and θ represents all parameters in the model.

The invention comprises two stages: training and identifying:

and (4) constructing a relation extraction device according to the steps 1-9, and randomly initializing all parameters in the model. During the whole training process of the model, 10 triples are input at a time from m triples<S_j，Sense(e1)_j，Sense(e2)_j>Training is performed (i.e., batch _ size is 10), and one training process is performed for all triples, for a total of 100 such training processes (i.e., epoch is 100). And continuously training the model by using a random gradient descent method to update parameters by taking the prediction output of the model and the cross entropy of the real relation label as a loss function. Meanwhile, in order to prevent overfitting, a dropout mechanism is adopted in the training process, and each neuron is closed with 50% of probability in the training process (namely, one half of hidden layer nodes are not involved in calculation in each training process). And after the training is finished, obtaining a trained entity relation extraction device.

In the recognition phase, a target Chinese sentence is given: if a target Chinese sentence contains two marked entities, directly identifying the relationship between the entities in the target Chinese sentence; if a target Chinese statement contains less than 2 marked entities, reporting errors; if a target Chinese statement contains more than three marked entities, an error is reported. If there are more than two target Chinese sentences, the relation between the entities in each target Chinese sentence is identified according to the step of one target Chinese sentence after the sentence is automatically broken.

The relation extraction device provided by the invention can learn different semantic information from the word level and the word level of the sentence, the added entity semantic item adds extra support information to the semantic of the entity in the sentence, and the word characteristics, the word characteristics and the semantic item characteristics are obtained by constructing different networks to learn, so that the fine granularity of input is enriched, the problem of word segmentation errors is avoided, the problem of multi-meaning stage of a word is solved, and the accuracy of relation extraction is improved.

The relation extraction device inputs a Chinese sentence marking two entities, and can identify the relation between the entities. The triple < entity 1, relationship, entity 2> is established for the entity and relationship, can be used to construct a knowledge graph, and is applied to a search system.

Example 1

In this embodiment, the model performance of learning the word feature and the word feature at the same time is studied in the relationship extraction device, and the results are compared with the model of learning only the word feature and the model of learning only the word feature. The experimental process of learning the word feature and the word feature model simultaneously is performed according to the relevant steps in the invention, and the comparison effect of the three is shown in table 2 and fig. 9.

As can be seen from table 2 and fig. 9, the model only learning the character features is more effective than the model only learning the word features, while we propose that the model learning both the character features and the word features is more effective than the model only learning a single feature. Because in the Chinese sentence, the words can represent the grammatical structure and the syntactic structure of the sentence, the model is established and the word characteristics are learned at the same time, so that the semantic information of the sentence can be learned more comprehensively, and the accuracy of relation extraction is further improved. The higher the F1 value in Table 2, the better the extraction effect of the entity relationship; the higher the curve in fig. 9, i.e. the larger the area contained in the two coordinate axes, the better the entity relationship extraction effect.

TABLE 2

Example 2

In this embodiment, a model of an entity meaning item is added to a relationship extraction device for word feature learning, and the presence or absence of the entity meaning item is tested to compare and explain the effect of introducing the entity meaning item. Meanwhile, the relation extraction device based on the fusion of the character and word characteristics of the entity semantic item provided by the invention is respectively compared with the model for simultaneously learning the character and word characteristics and the model for simultaneously learning the character and word characteristics. The experimental process of the character and word feature fusion based on the entity meaning item is carried out according to specific steps in the invention content. The effect of the comparison is shown in table 3.

As can be seen from table 3, the introduction of the entity meaning item has a better effect than a model without the introduction of the entity meaning item, which indicates that the introduction of the entity meaning item is helpful for the extraction of the entity relationship, and can improve the performance of the extraction of the entity relationship. Meanwhile, the entity relation extraction performance based on the character and word feature fusion of the entity meaning item is the best, the importance of input fine granularity to the entity relation extraction is explained, the character feature, the word feature and the entity meaning item feature are learned, and the feature fusion can effectively express the semantic information of the sentence.

TABLE 3

Claims

1. The Chinese entity relation extraction method based on the character and word feature fusion of the entity semantic item is characterized by comprising the following steps of:

A. training;

step 1, sentence preprocessing;

processing each of m sentences into a sequence S which exists separately in units of words_jJ is 1,2, … m; processed into unit sheets of wordsThe unique sequence means that each word and punctuation in the sentence are regarded as an individual and are arranged in rows in turn; the set of sequences of m statements is denoted S₁,S₂,…,S_m}；

processing an entity 1 meaning item corresponding to the entity 1 in each statement into a sequence Sense (e) which exists independently in the unit of word₁)_jJ is 1,2, … m; the processing into the sequence which exists independently by taking the word as a unit means that each word and punctuation in the sentence are regarded as an individual and are arranged in rows in turn; m entities 1 in m sentences correspond to m entity 1-Sense sequence sets, which are marked as { Sense (e)₁)₁,Sense(e₁)₂,…,Sense(e₁)_m}；

obtaining the maximum word length of m entity 2-meaning item sequence sets according to the mode of the entity 1, and marking as m₂For specifying the word size of a sequence of m entity 2-sense items; the definition refers to that in m entity 2 meaning item sequences, the word length is less than m₂The 2-sense sequence of entities is complemented by a number 0 to a word length m₂；

Step 4, mapping the three sequences in the triple into a word vector matrix;

Step 8, feature fusion;

splicing the sentence characteristic vector based on the characters and the sentence characteristic vector based on the words to obtain a characteristic vector of the sentence semantic information, and marking as h_s ^*：

h_s ^*＝[h_c ^*；h_w ^*]；

Splicing the word-basedThe entity 1 meaning item feature vector and the entity 2 meaning item feature vector based on the word obtain the feature vector of the entity semantic information, and the feature vector is marked as h_e ^*：

h_e ^*＝[h_e1 ^*；h_e2 ^*]

To o is_sAnd o_eThe final eigenvector o is obtained by weighted summation with weights of_ηAnd 1- η;

step 9, extracting the relation;

B. inputting a target Chinese sentence, and identifying a relation;

2. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 1 comprises:

the m is 17227 and is all training samples in the san wen dataset.

3. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 2 comprises:

calculating the semantic similarity refers to calculating the similarity by using a cosine similarity algorithm;

4. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 4 comprises:

the Word vectors are used for mapping each Word into a low-dimensional real number vector tx by using a Word2Vec method_iThe vector dimension is d^w(ii) a Wherein, tx_iDenotes S_jA word itself vector of the ith word; d^wRepresenting the dimensions of the vector;

we define the distance of the ith word to entity 1 as p_i ¹The distance from the ith word to the entity 2 is defined as p_i ²；p_i ¹And p_i ²Same calculation method, p_i ¹The calculation formula is defined as follows:

where i represents the position index of the ith word, b¹Initial position index representing entity 1, e¹An end position index representing entity 1;

p obtained by calculation_i ¹And p_i ²Mapping into low-dimensional vectors, respectively denoted as x_i ^p1And x_i ^p2Both vector dimensions are d^d(ii) a Wherein x is_i ^p1Represents the distance vector from the ith word to the entity 1; x is the number of_i ^p2Represents the distance vector of the ith word to the entity 2;

splicing the word vectors and the distance vectors to obtain a basic word vector of the ith word, and recording the basic word vector as v_i＝[tx_i；x_i ^p1；x_i ^p2]D is d^w+2*d^d(ii) a For the S_jWe map it into a basic word vector matrix, denoted S_jv＝[v₁,v₂,…,v_i,…,v_n]^T(ii) a Wherein v is₁Denotes S_jThe basic word vector of the 1 st word; v. of_iDenotes S_jThe basic word vector of the ith word; v. of_nDenotes S_jThe basic word vector of the nth word; t represents the transposition of the matrix, since a basic word vector is a column vector with dimension d, the dimension of the matrix after transposition is n x d;

2) for the Sense (e)₁)_jMapping into a basic word vector matrix, wherein basic word vectors of each word are word vectors;

according to 1) the word itself vector, for the Sense (e)₁)_jWe map this to a basic word vector matrix, denoted as Sense (e)₁)_jv＝[sx₁,sx₂,…,sx_i,…,sx_m1]^T(ii) a Wherein, sx₁Represents Sense (e)₁)_jBasic word vector of the 1 st word；sx_iRepresents Sense (e)₁)_jThe basic word vector of the ith word; sx_m1Represents Sense (e)₁)_jM in₁A base word vector of words; t represents the transpose of the matrix, since a word vector is of dimension d^wSo that the dimension of the matrix after transposition is m₁*d^w；

3) For the Sense (e)₂)_jMapping into a basic word vector matrix, wherein basic word vectors of each word are word vectors;

according to 1) the word itself vector, for the Sense (e)₂)_jWe map this to a basic word vector matrix, denoted as Sense (e)₂)_jv＝[vx₁,vx₂,…,vx_i,…,vx_m2]^T(ii) a Wherein, vx₁Represents Sense (e)₂)_jThe basic word vector of the 1 st word; vx_iRepresents Sense (e)₂)_jThe basic word vector of the ith word; vx_m2Represents Sense (e)₂)_jM in₂A base word vector of words; t represents the transpose of the matrix, since a basic word vector is of dimension d^wSo that the dimension of the matrix after transposition is m₂*d^w。

5. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 5 comprises:

1) for the S_jIs a basic word vector matrix S_jv＝[v₁,v₂,…,v_i,…,v_n]^TWe learn word features using Att-BLSTM; the Att-BLSTM is a bidirectional long-time and short-time memory network based on an attention mechanism;

at time t (t ═ 1,2, …, n), we input a basic word vector v_tIn BLSTM, the word vector is learned from the forward direction and the reverse direction to obtain a forward implicit characteristic vector and a reverse implicit characteristic vector which are respectively marked as

And

to pair

And

Wherein,

a bi-directional implicit feature vector representing word 1;

a bi-directional implicit feature vector representing the ith word;

a bi-directional implicit feature vector representing the nth word;

6. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 6 comprises:

1) for the S_jIs a basic word vector matrix S_jv＝[v₁,v₂,…,v_i,…,v_n]^TThe CNN is used for learning the words to obtain local feature vectors, and the vectors represent semantic information between the words in the sentences and are regarded as the features of the words; k different local feature vectors can be obtained through k different CNNs and are marked as H^w＝[h¹,h²,…,hⁱ,…,h^k](ii) a Wherein h is¹Features representing the 1 st CNN derived word; h isⁱFeatures representing words derived by the ith CNN; h is^kFeatures representing words derived by the kth CNN;

2) to H^w＝[h¹,h²,…,hⁱ,…,h^k]Using the Att-BLSTM learning word feature in step 5, a sentence feature vector based on words is obtained and is recorded as h_w ^*。

7. The method for extracting Chinese entity relationship based on the fusion of the character and the word feature of the entity semantic item as claimed in claim 1, wherein the step 7 comprises:

1) for the Sense (e)₁)_jWord vector matrix Sense (e)₁)_jv＝[sx₁,sx₂,…,sx_i,…,sx_m1]^TObtaining the entity 1 meaning item feature vector based on the words by using the Att-BLSTM learning entity 1 meaning item feature in the step 5, and marking as h_e1 ^*；

8. The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic item as claimed in claim 1, wherein the step 8 comprises:

9. The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic item as claimed in claim 1, wherein the training comprises:

building a relation extraction device according to the steps 1-9, and randomly initializing all parameters in the model; during the whole training process of the model, 10 triples are input at a time from m triples<S_j，Sense(e₁)_j，Sense(e₂)_j>Training, wherein one training of all triples is finished and is recorded as a training process, and 100 training processes are carried out in total; continuously training the model by using a random gradient descent method to update parameters by taking the prediction output of the model and the cross entropy of the real relation label as a loss function; meanwhile, in order to prevent overfitting, a dropout mechanism is adopted in the training process, and each neuron is closed at a probability of 50% in the training process (namely, half of hidden layer nodes are randomly calculated in each training process and do not participate in calculation); and after the training is finished, obtaining a trained entity relation extraction device.

10. The method for extracting Chinese entity relationship based on the character and word feature fusion of entity semantic terms as claimed in claim 1, wherein the input of the target Chinese sentence and the relationship identification comprises: