CN113627192A

CN113627192A - Relation extraction method and device based on two-layer convolutional neural network

Info

Publication number: CN113627192A
Application number: CN202110864354.0A
Authority: CN
Inventors: 王功明
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-11-09

Abstract

The invention discloses a relation extraction method and device based on a two-layer convolutional neural network, and belongs to the fields of relation extraction, convolutional neural networks and the like. The technical problems to be solved by the invention are that the construction of deep learning characteristic vectors by adopting a pre-training model in the current relation extraction process cannot reflect the ambiguity of words in different contexts, and the processing of samples without entity relations can seriously waste resources and does not fully refer to the optimization strategy in the deep learning field. The invention comprises the following steps: s1: generating a sentence feature vector based on the entity boundary; s2: training a relation existence judgment model based on a convolutional neural network; s3: screening entity pairs and sentences which possibly have relations by using a relation existence judgment model; s4: generating sentence feature vectors based on the ELMO network; s5: generating a relationship type vector; s6: training a relation classification model based on a convolutional neural network; s7: the relationship type of the entity pair is predicted using a relationship classification model.

Description

Relation extraction method and device based on two-layer convolutional neural network

Technical Field

The invention relates to the fields of relation extraction, word vectors, context features, convolutional neural networks and the like, in particular to a relation extraction method and a device based on a two-layer convolutional neural network.

Background

The relationship is a descriptionA triplet of semantic links between pairs of entities of the form (e)₁，r，e₂) Wherein e is₁、e₂Is an entity and r is a semantic relationship between entities. Relationships exist in large numbers in natural text, for example: the sentence "country a president B goes to country C" includes the relations (country a, president, B) and (country a, coming, C). The relation extraction is an important research content in the field of information extraction, and can establish the relation between different entities, convert unstructured texts into structured or semi-structured knowledge, form a relation network formed by the knowledge and be used for intelligent services such as intelligent question answering, semantic search, community discovery and the like.

Currently, the commonly used relationship extraction methods include four methods: rule template method, dependency analysis method, machine learning method and deep learning method. The first two methods belong to the traditional methods, the core of the method is template matching and syntax analysis, the processing rule is limited, expansibility and universality are lacked, and the method is suitable for relation extraction of a closed domain. The machine learning method considers the relation extraction task as a multi-classification problem, and predicts through a training model, so that the method has self-learning capability, can break through the limit of the processing rules of the former two methods, and is suitable for relation extraction of different scenes; however, most of the features used by the training model are the linguistic features of the entity itself, and the features of the context environment where the entity is located are lacked, so that the polysemous relation of the entity in different contexts cannot be represented; in addition, the characteristics are designed according to practical application, the skill is strong, and no fixed rule can be followed. The deep learning method takes vectorization (word embedding, part of speech embedding and the like) results of sentences in which the entities are located as characteristics, predicts by training a deep network model capable of accurately describing nonlinear characteristics among data, adopts migration learning to adjust model parameters, can flexibly match application scenes, has good expansibility and universality, and is suitable for relation extraction of an open domain; the method uses the characteristics of ordered sequences of vocabulary individual vectorization results, can directly generate the results through various vectorization models, does not need special design according to actual application, and reduces the complexity of characteristic design. However, the feature vectors used for training the model by the deep learning method are usually from pre-training models, and the numerical values of the feature vectors are solidified and cannot reflect the ambiguity of the vocabulary in different contexts; the deep network has complex structure and numerous parameters, the cost for predicting the relationship is high, and if the relationship between the entities does not exist, the method for predicting the relationship can bring great resource waste. In addition, deep learning is in continuous development and perfection, and optimization strategies in the field need to be used for reference, so that the relation extraction effect is improved.

Therefore, the following problems in the current relationship extraction process need to be solved: the ambiguity of the vocabulary in different contexts cannot be reflected by constructing the deep learning characteristic vector by adopting the pre-training model, the resources can be seriously wasted by processing a sample without an entity relationship, and the optimization strategy in the deep learning field is not fully used for reference.

Disclosure of Invention

The invention provides a relation extraction method and a device based on a two-layer convolutional neural network, which are used for solving the problems that the ambiguity of vocabularies in different contexts cannot be reflected by constructing a deep learning characteristic vector by adopting a pre-training model, resources are seriously wasted by processing a sample without an entity relation per se, and an optimization strategy in the deep learning field is not fully used.

The technical task of the invention is realized in the following way, and the relation extraction method based on the two-layer convolutional neural network comprises the following specific steps:

s1: generating a sentence feature vector based on the entity boundary;

s2: training a relation existence judgment model based on a convolutional neural network;

s3: screening entity pairs and sentences which possibly have relations by using a relation existence judgment model;

s4: generating sentence feature vectors based on the ELMO network;

s5: generating a relationship type vector;

s6: training a relation classification model based on a convolutional neural network;

s7: the relationship type of the entity pair is predicted using a relationship classification model.

Preferably, the step S1 of generating sentence feature vectors based on entity boundaries means adding boundary symbols at two ends of an entity, performing static vectorization on sentence words containing the boundary symbols, and then embedding the word static words in sequence to form sentence feature vectors based on entity boundaries; the method comprises the following specific steps:

s11: adding boundary symbols at two ends of the entity; the method specifically comprises the following steps: in sentence S, W_StartAnd W_EndIndicating a beginning and an end vocabulary, E_AAnd E_BRepresenting any two entities, at E_AAnd E_BAdding boundary symbols at both ends<e_A>、<\e_A>、<e_B>、<\e_B>Form a new sentence S_Bord；

S12: generating vocabulary static word embedding containing boundary symbols;

s13: splicing static words according to the vocabulary sequence of sentences for embedding;

s14: returning a static word embedding sequence;

in the step S2, the training of the convolutional neural network-based relationship existence determination model is to send the sentence feature vectors and the relationship existence identifications based on the entity boundaries to the convolutional neural network, and obtain the relationship existence determination model through training; the method comprises the following specific steps:

s21: generating entity-to-relationship existence identification: with R_Ext(E_A,E_B) Representing an entity E_AAnd E_BWhether or not there is a relationship between:

the method comprises the following steps: if there is a relationship between the two, then R_Ext(E_A,E_B)＝1；

Secondly, the step of: if there is no relationship between the two, R_Ext(E_A,E_B)＝0；

S22: setting hyper-parameters of a layer 1 convolutional neural network: setting structural parameters of an input layer module, a convolution layer module, a pooling layer module, a full-connection layer module and an output layer module;

s23: initializing parameters of the layer 1 convolutional neural network: initializing weight matrixes and offset vectors of different modules;

s24: sending sentence feature vectors based on the entity boundary into an input layer of a layer 1 convolutional neural network;

s25: sending the entity pair relation existence identification to an output layer of the layer 1 convolutional neural network;

s26: training a layer 1 convolutional neural network model according to a convergence condition;

s27: and returning the parameters of the layer 1 convolutional neural network.

Preferably, each entity pair (E) in the sentence S is processed using the relationship existence determination model in step S3_A,E_B) Determining entity E_AAnd E_BWhether a relationship exists; the method comprises the following specific steps:

s31: reading entity pairs from sentence S (E)_A,E_B) (ii) a Wherein E is_AAnd E_BRepresenting any two entities in the sentence S;

s32: generating an entity pair (E)_A,E_B) Corresponding sentence feature vectors based on entity boundaries;

s33: sending sentence characteristic vectors based on entity boundaries to an input end of a relation existence judgment model;

s34: obtaining the judgment result R from the output end of the relation existence judgment model_Ext(E_A,E_B) (ii) a Wherein R is_Ext(E_A,E_B) Represents E_AAnd E_BA result of determination of existence of a relationship between the two entities;

s35: judgment of R_Ext(E_A,E_B) Size relationship to 0.5:

the method comprises the following steps: if R is_Ext(E_A,E_B) If > 0.5, executing step S36;

secondly, the step of: if R is_Ext(E_A,E_B) If not more than 0.5, jumping to step S37;

s36: tagging entity E_AAnd E_BThere is a relationship in the sentence S;

s37: judging whether there is any unread entity pair in the sentence S (E)_A,E_B)：

The method comprises the following steps: if yes, go to step S31;

secondly, the step of: if not, go to step S38;

s38: returning sentence S and the related entity pair (E)_A,E_B)。

Preferably, the generating of the sentence feature vector based on the ELMO network in step S4 specifically refers to generating the sentence S through the ELMO network_BordThe dynamic word embedding of each vocabulary is spliced at the back of the static word embedding generated in the step S1 to form the word embedding of each vocabulary, and then the words are spliced according to the sequence of the vocabulary in the sentence to form the sentence characteristic vector based on the ELMO network; the method comprises the following specific steps:

s41: reading a sentence S_BordThe static word embedding sequence of (1); wherein S is_BordIs shown at E_AAnd E_BNew sentence generated after adding boundary symbols at both ends, E_AAnd E_BAn entity representing a relationship in the sentence S;

s42: generating a sentence S_BordThe dynamic word embedding sequence of (1): sentence S_BordThe static word embedding sequence is sent into an input layer of an ELMO network, and a sentence S is obtained from an output layer_BordThe sequence length of the dynamic word embedding sequence is the same as that of the static word embedding sequence; the method comprises the following specific steps:

s421: sending the static word embedding sequence into an input layer of an ELMO network;

s422: obtaining an operation result of an ELMO network output layer;

s423: acquiring vocabulary dynamic word embedding according to the operation result;

s424: splicing dynamic word embedding according to the vocabulary sequence of sentences;

s425: returning a dynamic word embedding sequence;

s43: generating a sentence S_BordThe word embedding sequence of (a): let sentence S_BordThe number of the vocabulary is n, and the static word embedding sequence is StaEmdSeq_S＝[e_s1,e_s2,…,e_si,…,e_sn]The dynamic word embedding sequence is DymEmdSeq_S＝[e_d1,e_d2,…,e_di,…,e_dn](ii) a Wherein e is_siAnd e_diRespectively representing the static state of the ith vocabularyWord embedding and dynamic word embedding; e.g. of the type_sie_diWord embedding for the ith word, with the corresponding operation being to embed e_diSpliced at e_siThe last, then sentence S_BordThe word-embedding sequence of (a) is EmdSeq_S＝[e_s1e_d1,e_s2e_d2,…,e_sne_dn]；

S44: return to S_BordThe word embedding sequence of (1);

the generation of the relationship type vector in step S5 is specifically as follows:

One-Hot coding is adopted to express the relation type between entities, namely vector RelVec with length of T_kRepresenting the relationship Rel_k，k∈[1,T](ii) a Wherein, Rel_kRepresenting an entity E_AAnd E_BThe relationship between; in RelVec_kIn (3), all bits are 0 except that the k-th bit is 1.

Preferably, the training of the convolutional neural network-based relational classification model in step S6 specifically includes setting hyper-parameters of the convolutional neural network, initializing a corresponding weight matrix and bias vector, sending sentence feature vectors and relational type vectors based on the ELMO network to the convolutional neural network, and performing training by optimizing a loss function to obtain a relational classification model; the method comprises the following specific steps:

s61: setting hyper-parameters of a layer 2 convolutional neural network: setting structural parameters of an input layer module, a convolution layer module, a pooling layer module, a full-connection layer module and an output layer module; the convolution layer module adopts convolution kernels with different sizes; the pooling layer module adopts pooling based on an attention mechanism; the method comprises the following specific steps:

s611: setting input layer hyper-parameters;

s612: setting convolution kernels with different sizes;

s613: pooling based on attention mechanism was set: let's assume that t convolution kernels of different sizes are used, and the size of the jth convolution kernel is L_jResult of convolution operation R_jIs of length n-L_j+1 vector, corresponding attention weight A_jIs a vector with the same length, and the corresponding relation type words are embedded into a matrix W_jIs of scale (n-L)_j+1) T matrix, corresponding "element-relation" correlation matrix G_jIs a matrix of the same size, B_jIs G_jThe column normalization matrix of (a) is,

is B_jThe g-th column vector of (1),

is that

The mth element of (1); the method comprises the following specific steps:

s6131: computing an "element-relationship" correlation matrix G_j＝R_j ^Trans*A_j*W_j，R_j ^TransIs R_jTransposing;

s6132: calculation of G_jColumn normalization matrix B of_j：

S6133: r is to be_jAnd B_jMaximum value of column vector inner product

As a result of pooling;

s614: setting other hyper-parameters;

s62: parameters for initializing the layer 2 convolutional neural network: initializing weight matrixes and offset vectors of different modules;

s63: sending sentence feature vectors based on the ELMO network into an input layer of a layer 2 convolutional neural network;

s64: sending the relation type vector to an output layer of a layer 2 convolutional neural network;

s65: training a layer 2 convolutional neural network model according to a convergence condition: training is performed using a distance-based loss function, which is defined as follows:

L_margin＝log(1+exp(ρ(d⁺-g⁺(S))))+log(1+exp(ρ(d^-+g^-(S))))；

wherein, g⁺(S) represents a correct classification score; g^-(S) represents a misclassification score; d⁺And d^-Respectively representing the spacing threshold of positive and negative samples; ρ represents a scaling coefficient;

s66: and returning the parameters of the layer 2 convolutional neural network.

Preferably, the step S7 of predicting the relationship type of the entity pair using the relationship classification model means predicting the relationship type of the entity pair in the sentence to be processed by using the trained convolutional neural network, E_A'And E_B'Representing any two entities in the sentence S' to be processed; the method comprises the following specific steps:

s71: at entity E_A'And E_B'Adding boundary symbols at both ends to obtain a new sentence S'_Bord；

S72: generates S'_BordThe sentence feature vector based on the entity boundary;

s73: generates S'_BordThe sentence feature vector based on the ELMO network;

s74: sending sentence feature vectors based on the ELMO network into an input layer of a layer 2 convolutional neural network;

s75: operating a layer 2 convolutional neural network to obtain output layer information;

s76: obtaining the corresponding relation type of the maximum probability output end as an entity E_A'And E_B'The relationship between;

s77: judging whether an unread entity E exists in the sentence S_A'And E_B'：

The method comprises the following steps: if yes, go to step S71;

secondly, the step of: if not, go to step S78;

s78: returning to entity E_A'And E_B'The set of relationships that exist in sentence S'.

A relationship extraction apparatus based on a two-layer convolutional neural network, the apparatus comprising:

a sentence feature vector generation component based on the entity boundary, which is used for generating a sentence feature vector based on the entity boundary;

a relation existence judgment model training component for training a relation existence judgment model based on a convolutional neural network;

the entity pair and sentence generation component is used for screening out the entity pair and the sentence where the relation possibly exists according to the relation existence judgment model;

the sentence feature vector generating component is used for generating a sentence feature vector based on the ELMO network;

the relation type vector generating component is used for generating a vector corresponding to the relation type;

the relation classification model training component is used for training a relation classification model based on a convolutional neural network;

and the entity pair relation type prediction component is used for predicting the relation type of the entity pair in the sentence according to the relation classification model.

Preferably, the sentence feature vector generation unit based on entity boundary includes:

the entity boundary symbol adding component is used for adding boundary symbols at two ends of the entity of the sentence;

the static word embedding generation component is used for carrying out static vectorization on each vocabulary containing the boundary symbol sentences;

the static word embedding and splicing component is used for splicing the static word embedding of each vocabulary according to the sequence of the vocabulary in the sentence;

the relationship existence determination model training section includes:

an entity pair relationship existence identification generation component for generating an identification whether the relationship between the entity pairs exists;

the layer 1 hyper-parameter setting component is used for setting structural parameters of different modules in the convolutional neural network facing the relation existence judgment;

the layer 1 parameter initialization component is used for initializing parameters of different modules in the convolutional neural network facing the relation existence judgment;

the layer 1 input end setting component is used for sending all sentence feature vectors based on entity boundaries into the input end of the convolutional neural network facing the relation existence judgment;

the layer 1 output end setting component is used for sending all entity pair relationship existence identifications to the output end of the convolutional neural network facing relationship existence judgment;

and the layer 1 training convergence component is used for training the convolutional neural network facing the relation existence judgment according to the convergence condition.

Preferably, the ELMO network-based sentence feature vector generation component includes:

the static word embedding sequence reading component is used for reading static word embedding sequences of all words in the sentence;

the dynamic word embedding sequence generating component is used for generating dynamic word embedding sequences of all words in the sentence;

a word embedding sequence generating part for generating word embedding sequences of all words in the sentence;

the relationship classification model training component includes:

the layer 2 hyper-parameter setting component is used for setting the structural parameters of different modules in the convolutional neural network facing the relationship classification;

the layer 2 parameter initialization component is used for initializing parameters of different modules in the convolutional neural network facing the relationship classification;

the layer 2 input end setting component is used for sending all sentence feature vectors based on the ELMO network into the input end of the convolutional neural network facing the relation classification;

the output end of the layer 2 is provided with a component which is used for sending all the relation type vectors into the output end of the convolutional neural network facing the relation classification;

and the layer 2 training convergence component is used for training the convolutional neural network facing the relation classification according to the convergence condition.

A computer-readable storage medium having stored thereon computer-executable instructions, which, when executed by a computer, implement a relationship extraction method based on a two-layer convolutional neural network as described above.

The relation extraction method and device based on the two-layer convolutional neural network have the following advantages:

the invention adopts the 1 st layer of convolutional neural network to screen out the entity pairs which can not have the relationship, and then uses the 2 nd layer of convolutional neural network to predict the specific relationship type, thereby reducing the calculation of the relationship types of a large number of unrelated entities, reducing the cost and improving the effect;

secondly, adding boundary symbols to sentence entities, then embedding static words to obtain sentence characteristic vectors based on entity boundaries, describing the environmental characteristics of the entities in sentences on the whole, sending the environmental characteristics into a layer 1 convolutional neural network, and rapidly judging whether relationships exist between the entities through a classification model; acquiring dynamic word embedding of sentences through an ELMO network, splicing with static word embedding of the sentences to obtain sentence characteristic vectors based on the ELMO network, describing not only the overall characteristics of the sentences, but also individual characteristics of ambiguity and the like of words in different contexts and the relation characteristics of the words and entity pairs, sending the sentence characteristic vectors into a layer 2 convolutional neural network, optimizing the network by using convolution kernels of different sizes, pooling based on an attention mechanism, distance loss function-based strategies and the like, and accurately identifying the relation types between the entities through a multi-classification model; in addition, the static word embedding input into the layer 1 convolutional neural network can generate the input data of the layer 2 convolutional neural network, and the reusability is good;

thirdly, for all sentences containing entities, the convolutional neural network is adopted to process the sentence characteristic vectors based on the entity boundaries, and entities which cannot have the relationship are screened out, so that the calculation cost of the relationship types of a large number of unrelated entities is saved; the method adopts a convolutional neural network to process sentence characteristic vectors based on an ELMO network for sentences where entities possibly having relations are located, and determines specific relation types; by the refinement treatment of the two layers of convolutional neural networks, limited resources are used for entity pairs possibly having relationships and a relationship classification model with higher accuracy, so that the effect of relationship extraction can be improved;

the two-layer convolutional neural network designed by the invention meets the actual requirement, the layer 1 is considered from the speed, and the conventional convolutional neural network is used for constructing a two-class model, so that the existence of the relationship can be rapidly judged; in the layer 2, in consideration of precision, an optimized convolutional neural network is adopted to construct a multi-classification model, so that the relation type can be accurately judged; static word embedding input into the layer 1 convolutional neural network has reusability and is used for generating input data of the layer 2 convolutional neural network; compared with the conventional relation extraction method, the method has the advantages that the speed and the precision are respectively ensured through the two layers of convolutional neural networks, the two contradictory performance indexes can be balanced, and the relation extraction effect is improved.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a flow chart diagram of a relationship extraction method based on a two-layer convolutional neural network;

FIG. 2 is a schematic diagram of a sentence with solid boundaries added;

FIG. 3 is a block diagram of the process of step S1 for generating sentence feature vectors based on entity boundaries;

FIG. 4 is a block flow diagram of the step S2 of training a convolutional neural network-based relationship existence determination model;

FIG. 5 is a block flow diagram of the step S3 of using the relationship existence determination model to filter pairs and sentences in which relationships may exist;

FIG. 6 is a block diagram of the process of step S4 for generating ELMO network-based sentence feature vectors;

FIG. 7 is the sentence S generated in step S42_BordThe flow diagram of the dynamic word embedding sequence of (1);

FIG. 8 is a block diagram of the process of training the convolutional neural network-based relational classification model of step S6;

FIG. 9 is a block diagram of the flow of setting the hyperparameters of the layer 2 convolutional neural network of step S61;

FIG. 10 is a block flow diagram of a step S7 of predicting the relationship type of an entity pair using a relationship classification model;

FIG. 11 is a block diagram of a double layer convolutional neural network created by the present invention;

FIG. 12 is a schematic diagram of generating dynamic word embedding via an ELMO network;

fig. 13 is a block diagram of a relationship extraction apparatus based on a two-layer convolutional neural network.

Detailed Description

The method and apparatus for extracting relationships based on two-layer convolutional neural network according to the present invention are described in detail below with reference to the drawings and the embodiments.

Example 1:

as shown in fig. 1, the relationship extraction method based on two layers of convolutional neural networks of the present invention specifically comprises the following steps:

s1: generating a sentence feature vector based on the entity boundary;

s4: generating sentence feature vectors based on the ELMO network;

s5: generating a relationship type vector;

In this embodiment, the step S1 of generating sentence eigenvectors based on the entity boundary refers to adding boundary symbols at the two ends of the entity, performing static vectorization on the vocabulary of the sentences containing the boundary symbols, and then embedding and splicing the vocabulary static words in sequence to form sentence eigenvectors based on the entity boundary; as shown in fig. 3, the following is detailed:

s11: adding boundary symbols at two ends of the entity; the method specifically comprises the following steps: in sentence S, W_StartAnd W_EndIndicating a beginning and an end vocabulary, E_AAnd E_BRepresenting any two entities, at E_AAnd E_BAdding boundary symbols at both ends<e_A〉、<\e_A>、<e_B〉、<\e_B>Form a new sentence S_BordAs shown in fig. 2;

s12: generating vocabulary static word embedding containing boundary symbols;

s14: returning a static word embedding sequence;

in the embodiment, the training of the relation existence determination model based on the convolutional neural network in step S2 is to send the sentence feature vector based on the entity boundary and the relation existence identification into the convolutional neural network, and obtain the relation existence determination model through training; as shown in fig. 4, the following is detailed:

s27: and returning the parameters of the layer 1 convolutional neural network.

The filtering of the pairs of entities and sentences for which there is a possibility of a relationship using the relationship existence determination model in step S3 in the present embodiment means that each pair of entities in the sentence S is processed using the relationship existence determination model (E)_A,E_B) Determining entity E_AAnd E_BWhether a relationship exists; as shown in fig. 5, the following is detailed:

s35: judgment of R_Ext(E_A,E_B) Size relationship to 0.5:

s36: tagging entity E_AAnd E_BThere is a relationship in the sentence S;

The method comprises the following steps: if yes, go to step S31;

secondly, the step of: if not, go to step S38;

s38: returning sentence S and the related entity pair (E)_A,E_B)。

The step S4 of generating the sentence feature vector based on the ELMO network in the present embodiment specifically refers to generating the sentence S through the ELMO network_BordThe dynamic word embedding of each vocabulary is spliced at the back of the static word embedding generated in the step S1 to form the word embedding of each vocabulary, and then the words are spliced according to the sequence of the vocabulary in the sentence to form the sentence characteristic vector based on the ELMO network; such asFIG. 6 shows the following details:

s42: generating a sentence S_BordThe dynamic word embedding sequence of (1): sentence S_BordThe static word embedding sequence is sent into an input layer of an ELMO network, and a sentence S is obtained from an output layer_BordThe sequence length of the dynamic word embedding sequence is the same as that of the static word embedding sequence; as shown in fig. 7, the following is detailed:

s422: obtaining an operation result of an ELMO network output layer;

s425: returning a dynamic word embedding sequence;

s43: generating a sentence S_BordThe word embedding sequence of (a): let sentence S_BordThe number of the vocabulary is n, and the static word embedding sequence is StaEmdSeq_S＝[e_s1,e_s2,…,e_si,…,e_sn]The dynamic word embedding sequence is DymEmdSeq_S＝[e_d1,e_d2,…,e_di,…,e_dn](ii) a Wherein e is_siAnd e_diStatic word embedding and dynamic word embedding which respectively represent the ith vocabulary; e.g. of the type_sie_diWord embedding for the ith word, with the corresponding operation being to embed e_diSpliced at e_siThe last, then sentence S_BordThe word-embedding sequence of (a) is EmdSeq_S＝[e_s1e_d1,e_s2e_d2,…,e_sne_dn]；

S44: return to S_BordThe word embedding sequence of (1);

the generation of the relationship type vector in step S5 in this embodiment is specifically as follows:

In this embodiment, the training of the convolutional neural network-based relationship classification model in step S6 specifically includes setting hyper-parameters of the convolutional neural network, initializing a corresponding weight matrix and bias vector, sending sentence feature vectors and relationship type vectors based on the ELMO network to the convolutional neural network, and performing training by optimizing a loss function to obtain a relationship classification model; as shown in fig. 8, the following is detailed:

s61: setting hyper-parameters of a layer 2 convolutional neural network: setting structural parameters of an input layer module, a convolution layer module, a pooling layer module, a full-connection layer module and an output layer module; the convolution layer module adopts convolution kernels with different sizes; the pooling layer module adopts pooling based on an attention mechanism; as shown in fig. 9, the details are as follows:

s611: setting the hyper-parameters of an input layer;

s612: setting convolution kernels with different sizes;

is B_jThe g-th column vector of (1),

is that

The mth element of (1); the method comprises the following specific steps:

s6131: computing an 'element-relationship' correlation matrix

Is R_jTransposing;

s6132: calculation of G_jColumn normalization matrix B of_j：

S6133: r is to be_jAnd B_jMaximum value of column vector inner product

As a result of pooling;

s614: setting other hyper-parameters;

L_margin＝log(1+exp(ρ(d⁺-g⁺(S))))+log(1+exp(ρ(d^-+g^-(S))))；

s66: and returning the parameters of the layer 2 convolutional neural network.

The step S7 of predicting the relationship type of the entity pair using the relationship classification model in this embodiment refers to predicting the relationship type of the entity pair in the sentence to be processed using the trained convolutional neural network, E_A'And E_B'Representing any two entities in the sentence S' to be processed; as shown in fig. 10, the following is detailed:

S72: generates S'_BordThe sentence feature vector based on the entity boundary;

s73: generates S'_BordThe sentence feature vector based on the ELMO network;

s77: judging whether an unread entity E exists in the sentence S_A'And E_B'：

The method comprises the following steps: if yes, go to step S71;

secondly, the step of: if not, go to step S78;

As shown in fig. 11, the double-layer convolutional neural network created by the present invention is as follows:

lowest [ e ]_s1,e_s2,…,e_si,…,e_sn]The sentence characteristic vector based on the entity boundary is sent into the layer 1 convolutional neural network to obtain the entity E_AAnd E_BResult of determination of presence or absence of relationship R_Ext(E_A,E_B) (ii) a Above the layer 1 convolutional neural network is a control composed of a step function sgn (x)A circuit, the threshold is 0.5, the middle point of the circle above the circuit [ ] indicates the multiplication of the vector and the scalar; r is to be_Ext(E_A,E_B) Considered as x, if R_Ext(E_A,E_B) If the output is more than 0.5, the output of the control circuit is 1, and the control circuit is used for sending the sentence characteristic vector based on the entity boundary into the ELMO network and executing the subsequent relation type calculation; if R is_Ext(E_A,E_B) And when the output value is less than or equal to 0.5, the output of the control circuit is 0, and the execution is stopped. The circle to the right of the ELMO network containing the symbol "C

Representing vector connections for static word embedding [ e ] at the input of a spliced ELMO network_s1,e_s2,…,e_si,…,e_sn]And dynamic word embedding of the output [ e ]_d1,e_d2,…,e_di,…,e_dn]Obtaining sentence characteristic vector [ e ] based on ELMO network_s1e_d1,e_s2e_d2,…,e_sne_dn](ii) a After the characteristic vector is sent into a layer 2 convolutional neural network, an entity E can be obtained_AAnd E_BThe type of relationship between.

Example 2:

the operation is executed in a Python environment, LabeledRelSet is used for representing a data set with a known relation, UnLabeledrelset is used for representing a data set with an unknown relation, and the method provided by the invention is used for processing the LabeledRelSet to obtain a relation existence judgment model and a relation classification model which are used for extracting the relation existing in the UnLabeledrelset.

In the data set, each row represents a relationship between pairs of entities in a sentence, and if there are multiple relationships in the sentence, then it is represented in multiple rows, the format is as follows:

entity 1 entity 2 relational sentences.

For example: some of the King, some of the Master and the King, some of the Pair of Wu, encourages others to pay attention to the painting and calligraphy of He jin Shi, especially, many workers are going to do on the aspect of jin Shi.

The above example shows that the type of relationship between the entities "wangtao and" wu-tang "is" teachers and students ".

There are 12 relationships in the dataset: superior and inferior, collaboration, friends, siblings, teachers and students, lovers, couples, parents, brothers and sisters, grandparents, relatives and others.

The data set only includes entities and relationships and lacks word segmentation information, so that the data set needs to be preprocessed before the method of the invention is used. Removing blank spaces, then removing stop words according to a stop word list, and finally performing word segmentation by using a jieba.

Execution of S11: adding boundary symbols at both ends of the entity, taking the above sentence as an example, the processed result is "<e_A>King a certain<\e_A>To pair<e_B>Wu Yi (a certain medicine)<\e_B>Encouraged, hoped that he will be able to draw on the gold stone again, especially much time and effort is needed to make the gold stone. ".

Execution of S12: taking the boundary symbols as vocabularies, the static word embedding of each vocabulary is obtained using the glove.

Execution of S13: and splicing words according to the lexical sequence to be embedded to obtain sentence characteristic vectors based on the entity boundary.

Execution of S22: first, creating a keras layers embedding object defines an input layer sequence, the maximum length of the sequence is 50, and the dimension of each element is 100.

Then, a convolution kernel (Size ═ 2) of the convolutional layer was created using keras. Maximal pooling of the convolution kernel was performed using a keras. layers. maxpooling1d, with a pooling window size of 49.

Subsequently, the output vectors of the different pooling layers are merged using keras. back. truncate, the merged output vector is flattened using keras. back. flatten, and the components of the flattened result are discarded using keras. back. drop with a given probability (0.2), with the dimensionality of the result being 32.

And finally, defining a full connection layer by using keras.

Execution of S23: the weight matrices and bias vectors are initialized using kernel _ initializer and bias _ initializer parameters of kernel.

Performing S24-S26: model, a CNN model object RelExtCheck is defined and trained by using keras, models and a model is compiled by using a compact function of the object, and main parameters are as follows:

loss function: selecting a cross entropy suitable for the two classes;

an optimizer: choosing random gradient descent when optimizer is 'sgd';

evaluation indexes are as follows: the metric [ 'binary _ accuracy' ], and an evaluation function suitable for the two classifications is selected.

The CNN model object is compiled according to the above parameters as follows:

RelExtCheck.compile(loss＝'binary_crossentropy',optimizer＝'sgd',metrics＝['binary_accuracy'])；

representing a sentence feature vector set based on an entity boundary and an entity pair relation existence identification set in a LabeledRelSet by BordFeatVecSet _ Labeled and RelExtSet _ Labeled respectively, and training a model by using a fit function as follows:

RelExtCheck.fit(BordFeatVecSet_Labeled,RelExtSet_Labeled,batch_size＝8000,epochs＝10)；

assuming that a sentence feature vector set based on an entity boundary contains G vectors, the command represents: and (4) totalizing 10 iterations, updating the gradient | G/8000| +1 time in each iteration, and selecting 8000 samples for training each time.

Execution of S3: the set of sentence feature vectors based on entity boundaries in an unladetrelset is represented by bordfeedvecset _ unladed, and the prediction result is predicted using the prediction function of the RelExtCheck object as follows:

RelExtSet_UnLabeled＝RelExtCheck.predict(BordFeatVecSet_UnLabeled)；

the obtained RelExtSet _ UnLabeled is an entity pair relationship existence identification set in the UnLabeledRelSet, and if the value of a certain element is more than 0.5, the corresponding entity pair existence relationship is shown.

Execution of S41: reading the sentence characteristic vector based on the entity boundary generated in the step S1 to obtain a sentence S_BordStatic word embedding sequence StaEmdSeq_S＝[e_s1,e_s2,…,e_si,…,e_sn]。

Execution of S42: module () function is used to load the ELMO word embedding model, StaemdSeq_SInputting the model to obtain a sentence S_BordDynamic word-embedding sequence DymEmdSeq_S＝[e_d1,e_d2,…,e_di,…,e_dn]As shown in fig. 12.

Execution of S43: by means of the splicing sequence StaEmdSeq_SAnd DymEmdSeq_SGet the sentence S_BordWord embedding sequence of (EmdSeq)_S＝[e_s1e_d1,e_s2e_d2,…,e_sne_dn]。

Execution of S5: the 12 relationship types are ID encoded as follows (the numbers in parentheses are the corresponding ID encodings):

superior and inferior (0), cooperation (1), friends (2), sibling (3), teachers and students (4), lovers (5), couples (6), parents (7), brothers and sisters (8), grandparents (9), relatives (10) and others (11).

ID codes are vectorized by adopting One-Hot codes, the dimension of a vector corresponding to each ID code is 12, the component value of a corresponding bit is 1, and the component values of the rest bits are 0.

For example: the ID code for the relationship "sibling" is 8, and its corresponding vector is [0,0,0,0,0, 1,0,0,0 ].

Executing S611: creating a keras layers embedding object defines the input layer sequence with a maximum length of 50 and a dimension of 200 for each element.

Executing S612: different sizes of (Size 2, 3, 4) convolution kernels were created using keras. layers. conv1d, the number of convolution kernels for each Size being 256.

Execution of S613: writing a pooling function AttPooling based on an attention mechanism, calling AttPooling at a custom layer created by keras.

Executing S614: firstly, combining output vectors of different pooling layers by using keras, backup and containment; then flattening the combined output vector by using keras. backskend. flatten; then, the components of the flattened result are discarded with a given probability (0.2) using keras. back. drop, and the dimensionality of the result is 768.

Finally, a full-link layer is defined using keras.

Execution of S62: the weight matrices and bias vectors are initialized using kernel _ initializer and bias _ initializer parameters of kernel.

Performing S63-S65: compiling a margin based on a distance loss function;

model, a CNN model object RelTypeCheck is defined and trained by using keras, models and the model is compiled by using a compact function of the object, and main parameters are as follows:

loss function: selecting a self-defined distance-based loss function;

an optimizer: choosing random gradient descent when optimizer is 'sgd';

evaluation indexes are as follows: the evaluation function suitable for multi-classification is selected.

The CNN model object is compiled according to the above parameters as follows:

RelTypeCheck.compile(loss＝'margin',optimizer＝'sgd',metrics＝['categorical_accuracy'])；

the set of ELMO network-based sentence feature vectors and relationship type vectors in LabeledRelSet is represented by ELMOFeatVecSet _ Labeled and RelTypeSet _ Labeled, respectively, and the model is trained using the fit function of the RelTypeCheck object, as follows:

RelTypeCheck.fit(ELMOFeatVecSet_Labeled,RelTypeSet_Labeled,batch_size＝2000,epochs＝10)；

let the number of "entity pair-sentence" feature vector sequences be G, the above command represents: and totalizing 10 iterations, updating the gradient | G/2000| for 1 time in each iteration, and selecting 2000 samples for training each time.

Execution of S7: the set of sentence feature vectors based on the ELMO network in UnLabeledRelSet is represented by ELMOFeatVecSet _ UnLabeled, and the prediction result is predicted by using the prediction function of the RelTypeCheck object:

RelTypeSet_UnLabeled＝

RelTypeCheck.predict(ELMOFeatVecSet_UnLabeled)；

the obtained RelTypeSet _ UnLabeled is a relationship type vector set in the UnLabeledRelSet, elements are probability vectors with the length of 12, the dimension number with the largest value is an ID code of the relationship type, and the corresponding relationship type can be determined according to the code.

Example 3:

as shown in fig. 13, the relationship extraction device based on two layers of convolutional neural networks of the present invention includes:

a sentence feature vector generation part M1 based on the entity boundary, which is used for generating a sentence feature vector based on the entity boundary;

a relation existence judgment model training part M2 for training a relation existence judgment model based on a convolutional neural network;

the entity pair and sentence generation component M3 for screening out the entity pair and the sentence which may have the relationship according to the relationship existence judgment model;

the ELMO network-based sentence feature vector generating component M4 is used for generating ELMO network-based sentence feature vectors;

a relation type vector generating component M5, configured to generate a vector corresponding to a relation type;

a relation classification model training component M6 for training a relation classification model based on a convolutional neural network;

and the entity pair relation type prediction component M7 is used for predicting the relation type of the entity pair in the sentence according to the relation classification model.

The sentence feature vector generation part M1 based on the entity boundary in the present embodiment includes:

an entity boundary symbol adding component M11, for adding boundary symbols at two ends of the entity of the sentence;

a static word embedding generation component M12 for performing static vectorization on each word containing the boundary symbol sentences;

a static word embedding and splicing component M13 for splicing the static word embedding of each vocabulary according to the sequence of the vocabulary in the sentence;

the relationship existence determination model training section M2 in the present embodiment includes:

an entity pair relationship existence identification generating component M21 for generating an identification of whether a relationship exists between entity pairs;

the layer 1 hyper-parameter setting component M22 is used for setting structural parameters of different modules in the convolutional neural network facing the relation existence judgment;

a layer 1 parameter initialization component M23, configured to initialize parameters of different modules in the convolutional neural network oriented to the relationship existence determination;

the layer 1 input end setting component M24 is used for sending all sentence feature vectors based on entity boundaries into the input end of the convolutional neural network facing the relationship existence judgment;

the layer 1 output end setting component M25 is used for sending all entity pair relationship existence identifications to the output end of the convolutional neural network facing relationship existence judgment;

the layer 1 training convergence unit M26 is configured to train the convolutional neural network oriented to the relationship existence determination according to the convergence condition.

The ELMO network-based sentence feature vector generation component M4 in the present embodiment includes:

a static word embedding sequence reading means M41 for reading the static word embedding sequences of all words in the sentence;

a dynamic word embedding sequence generating part M42 for generating a dynamic word embedding sequence of all words in the sentence;

a word embedding sequence generating part M43 for generating word embedding sequences of all words in the sentence;

the relationship classification model training component M6 includes:

the layer 2 hyper-parameter setting component M61 is used for setting the structural parameters of different modules in the convolutional neural network facing the relationship classification;

a layer 2 parameter initialization component M62, configured to initialize parameters of different modules in the convolutional neural network based on the relationship classification;

the layer 2 input end setting component M63 is used for sending all sentence feature vectors based on the ELMO network into the input end of the convolutional neural network facing the relational classification;

the layer 2 output end setting component M64 is used for sending all the relation type vectors into the output end of the convolutional neural network facing the relation classification;

the layer 2 trains a convergence component M65 for training the convolutional neural network facing the relationship classification according to the convergence condition.

Example 4:

the embodiment of the invention also provides a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are loaded by the processor, so that the processor executes the relationship extraction method based on the two-layer convolutional neural network in any embodiment of the invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A relation extraction method based on a two-layer convolutional neural network is characterized by comprising the following specific steps:

s1: generating a sentence feature vector based on the entity boundary;

s4: generating sentence feature vectors based on the ELMO network;

s5: generating a relationship type vector;

2. The method for extracting relationship based on two-layer convolutional neural network of claim 1, wherein the step S1 of generating sentence feature vectors based on entity boundaries means adding boundary symbols at both ends of the entity, performing static vectorization on the vocabulary of sentences containing the boundary symbols, and then embedding and splicing the static words of the vocabulary in sequence to form sentence feature vectors based on entity boundaries; the method comprises the following specific steps:

S12: generating vocabulary static word embedding containing boundary symbols;

s14: returning a static word embedding sequence;

s27: and returning the parameters of the layer 1 convolutional neural network.

3. The method of extracting relationship based on two-layer convolutional neural network of claim 1, wherein each entity pair (E) in sentence S is processed using the relationship existence decision model in step S3_A,E_B) Determining entity E_AAnd E_BWhether a relationship exists; the method comprises the following specific steps:

s35: judgment of R_Ext(E_A,E_B) Size relationship to 0.5:

s36: markingEntity E_AAnd E_BThere is a relationship in the sentence S;

The method comprises the following steps: if yes, go to step S31;

secondly, the step of: if not, go to step S38;

s38: returning sentence S and the related entity pair (E)_A,E_B)。

4. The method of claim 1, wherein the step S4 of generating the sentence feature vector based on the ELMO network is to generate the sentence S through the ELMO network_BordThe dynamic word embedding of each vocabulary is spliced at the back of the static word embedding generated in the step S1 to form the word embedding of each vocabulary, and then the words are spliced according to the sequence of the vocabulary in the sentence to form the sentence characteristic vector based on the ELMO network; the method comprises the following specific steps:

s422: obtaining an operation result of an ELMO network output layer;

s425: returning a dynamic word embedding sequence;

s43: generating sentencesS_BordThe word embedding sequence of (a): let sentence S_BordThe number of the vocabulary is n, and the static word embedding sequence is StaEmdSeq_S＝[e_s1,e_s2,…,e_si,…,e_sn]The dynamic word embedding sequence is DymEmdSeq_S＝[e_d1,e_d2,…,e_di,…,e_dn](ii) a Wherein e is_siAnd e_diStatic word embedding and dynamic word embedding which respectively represent the ith vocabulary; e.g. of the type_sie_diWord embedding for the ith word, with the corresponding operation being to embed e_diSpliced at e_siThe last, then sentence S_BordThe word-embedding sequence of (a) is EmdSeq_S＝[e_s1e_d1,e_s2e_d2,…,e_sne_dn]；

S44: return to S_BordThe word embedding sequence of (1);

5. The method for extracting relationship based on two layers of convolutional neural networks as claimed in claim 1, wherein the training of the convolutional neural network-based relationship classification model in step S6 specifically means setting hyper-parameters of the convolutional neural network, initializing corresponding weight matrix and bias vector, sending sentence feature vector and relationship type vector based on the ELMO network into the convolutional neural network, and training by optimizing the loss function to obtain the relationship classification model; the method comprises the following specific steps:

s61: setting hyper-parameters of a layer 2 convolutional neural network: setting structural parameters of an input layer module, a convolution layer module, a pooling layer module, a full-connection layer module and an output layer module; the convolution layer module adopts convolution kernels with different sizes, and the pooling layer module adopts pooling based on an attention mechanism; the method comprises the following specific steps:

s611: setting input layer hyper-parameters;

s612: setting convolution kernels with different sizes;

is B_jThe g-th column vector of (1),

is that

The mth element of (1); the method comprises the following specific steps:

s6132: calculation of G_jColumn normalization matrix B of_j：

S6133: r is to be_jAnd B_jMaximum value of column vector inner product

As a result of pooling;

s614: setting other hyper-parameters;

L_margin＝log(1+exp(ρ(d⁺-g⁺(S))))+log(1+exp(ρ(d^-+g^-(S))))；

s66: and returning the parameters of the layer 2 convolutional neural network.

6. The method for extracting relationship based on two-layer convolutional neural network as claimed in any of claims 1-5, wherein the step S7 of predicting the relationship type of the entity pair using the relationship classification model means using the trained convolutional neural network to predict the relationship type of the entity pair in the sentence to be processed, E_A'And E_B'Representing any two entities in the sentence S' to be processed; the method comprises the following specific steps:

S72: generates S'_BordThe sentence feature vector based on the entity boundary;

s73: generates S'_BordThe sentence feature vector based on the ELMO network;

s76: obtaining a probabilistic maximum outputCorresponding relation type as entity E_A'And E_B'The relationship between;

s77: judging whether an unread entity E exists in the sentence S_A'And E_B'：

The method comprises the following steps: if yes, go to step S71;

secondly, the step of: if not, go to step S78;

7. A relationship extraction apparatus based on two layers of convolutional neural networks, the apparatus comprising:

8. The apparatus for extracting relationship based on two-layer convolutional neural network of claim 7, wherein the sentence feature vector generation means based on entity boundary comprises:

the relationship existence determination model training section includes:

9. The apparatus of claim 7, wherein the ELMO network-based sentence feature vector generation component comprises:

the relationship classification model training component includes:

10. A computer-readable storage medium having stored thereon computer-executable instructions, which when executed by a computer, implement the method for extracting a relationship based on two-layer convolutional neural network as claimed in any one of claims 1 to 6.