CN113627192A - Relation extraction method and device based on two-layer convolutional neural network - Google Patents

Relation extraction method and device based on two-layer convolutional neural network Download PDF

Info

Publication number
CN113627192A
CN113627192A CN202110864354.0A CN202110864354A CN113627192A CN 113627192 A CN113627192 A CN 113627192A CN 202110864354 A CN202110864354 A CN 202110864354A CN 113627192 A CN113627192 A CN 113627192A
Authority
CN
China
Prior art keywords
sentence
convolutional neural
neural network
layer
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110864354.0A
Other languages
Chinese (zh)
Inventor
王功明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202110864354.0A priority Critical patent/CN113627192A/en
Publication of CN113627192A publication Critical patent/CN113627192A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a relation extraction method and device based on a two-layer convolutional neural network, and belongs to the fields of relation extraction, convolutional neural networks and the like. The technical problems to be solved by the invention are that the construction of deep learning characteristic vectors by adopting a pre-training model in the current relation extraction process cannot reflect the ambiguity of words in different contexts, and the processing of samples without entity relations can seriously waste resources and does not fully refer to the optimization strategy in the deep learning field. The invention comprises the following steps: s1: generating a sentence feature vector based on the entity boundary; s2: training a relation existence judgment model based on a convolutional neural network; s3: screening entity pairs and sentences which possibly have relations by using a relation existence judgment model; s4: generating sentence feature vectors based on the ELMO network; s5: generating a relationship type vector; s6: training a relation classification model based on a convolutional neural network; s7: the relationship type of the entity pair is predicted using a relationship classification model.

Description

Relation extraction method and device based on two-layer convolutional neural network
Technical Field
The invention relates to the fields of relation extraction, word vectors, context features, convolutional neural networks and the like, in particular to a relation extraction method and a device based on a two-layer convolutional neural network.
Background
The relationship is a descriptionA triplet of semantic links between pairs of entities of the form (e)1,r,e2) Wherein e is1、e2Is an entity and r is a semantic relationship between entities. Relationships exist in large numbers in natural text, for example: the sentence "country a president B goes to country C" includes the relations (country a, president, B) and (country a, coming, C). The relation extraction is an important research content in the field of information extraction, and can establish the relation between different entities, convert unstructured texts into structured or semi-structured knowledge, form a relation network formed by the knowledge and be used for intelligent services such as intelligent question answering, semantic search, community discovery and the like.
Currently, the commonly used relationship extraction methods include four methods: rule template method, dependency analysis method, machine learning method and deep learning method. The first two methods belong to the traditional methods, the core of the method is template matching and syntax analysis, the processing rule is limited, expansibility and universality are lacked, and the method is suitable for relation extraction of a closed domain. The machine learning method considers the relation extraction task as a multi-classification problem, and predicts through a training model, so that the method has self-learning capability, can break through the limit of the processing rules of the former two methods, and is suitable for relation extraction of different scenes; however, most of the features used by the training model are the linguistic features of the entity itself, and the features of the context environment where the entity is located are lacked, so that the polysemous relation of the entity in different contexts cannot be represented; in addition, the characteristics are designed according to practical application, the skill is strong, and no fixed rule can be followed. The deep learning method takes vectorization (word embedding, part of speech embedding and the like) results of sentences in which the entities are located as characteristics, predicts by training a deep network model capable of accurately describing nonlinear characteristics among data, adopts migration learning to adjust model parameters, can flexibly match application scenes, has good expansibility and universality, and is suitable for relation extraction of an open domain; the method uses the characteristics of ordered sequences of vocabulary individual vectorization results, can directly generate the results through various vectorization models, does not need special design according to actual application, and reduces the complexity of characteristic design. However, the feature vectors used for training the model by the deep learning method are usually from pre-training models, and the numerical values of the feature vectors are solidified and cannot reflect the ambiguity of the vocabulary in different contexts; the deep network has complex structure and numerous parameters, the cost for predicting the relationship is high, and if the relationship between the entities does not exist, the method for predicting the relationship can bring great resource waste. In addition, deep learning is in continuous development and perfection, and optimization strategies in the field need to be used for reference, so that the relation extraction effect is improved.
Therefore, the following problems in the current relationship extraction process need to be solved: the ambiguity of the vocabulary in different contexts cannot be reflected by constructing the deep learning characteristic vector by adopting the pre-training model, the resources can be seriously wasted by processing a sample without an entity relationship, and the optimization strategy in the deep learning field is not fully used for reference.
Disclosure of Invention
The invention provides a relation extraction method and a device based on a two-layer convolutional neural network, which are used for solving the problems that the ambiguity of vocabularies in different contexts cannot be reflected by constructing a deep learning characteristic vector by adopting a pre-training model, resources are seriously wasted by processing a sample without an entity relation per se, and an optimization strategy in the deep learning field is not fully used.
The technical task of the invention is realized in the following way, and the relation extraction method based on the two-layer convolutional neural network comprises the following specific steps:
s1: generating a sentence feature vector based on the entity boundary;
s2: training a relation existence judgment model based on a convolutional neural network;
s3: screening entity pairs and sentences which possibly have relations by using a relation existence judgment model;
s4: generating sentence feature vectors based on the ELMO network;
s5: generating a relationship type vector;
s6: training a relation classification model based on a convolutional neural network;
s7: the relationship type of the entity pair is predicted using a relationship classification model.
Preferably, the step S1 of generating sentence feature vectors based on entity boundaries means adding boundary symbols at two ends of an entity, performing static vectorization on sentence words containing the boundary symbols, and then embedding the word static words in sequence to form sentence feature vectors based on entity boundaries; the method comprises the following specific steps:
s11: adding boundary symbols at two ends of the entity; the method specifically comprises the following steps: in sentence S, WStartAnd WEndIndicating a beginning and an end vocabulary, EAAnd EBRepresenting any two entities, at EAAnd EBAdding boundary symbols at both ends<eA>、<\eA>、<eB>、<\eB>Form a new sentence SBord
S12: generating vocabulary static word embedding containing boundary symbols;
s13: splicing static words according to the vocabulary sequence of sentences for embedding;
s14: returning a static word embedding sequence;
in the step S2, the training of the convolutional neural network-based relationship existence determination model is to send the sentence feature vectors and the relationship existence identifications based on the entity boundaries to the convolutional neural network, and obtain the relationship existence determination model through training; the method comprises the following specific steps:
s21: generating entity-to-relationship existence identification: with RExt(EA,EB) Representing an entity EAAnd EBWhether or not there is a relationship between:
the method comprises the following steps: if there is a relationship between the two, then RExt(EA,EB)=1;
Secondly, the step of: if there is no relationship between the two, RExt(EA,EB)=0;
S22: setting hyper-parameters of a layer 1 convolutional neural network: setting structural parameters of an input layer module, a convolution layer module, a pooling layer module, a full-connection layer module and an output layer module;
s23: initializing parameters of the layer 1 convolutional neural network: initializing weight matrixes and offset vectors of different modules;
s24: sending sentence feature vectors based on the entity boundary into an input layer of a layer 1 convolutional neural network;
s25: sending the entity pair relation existence identification to an output layer of the layer 1 convolutional neural network;
s26: training a layer 1 convolutional neural network model according to a convergence condition;
s27: and returning the parameters of the layer 1 convolutional neural network.
Preferably, each entity pair (E) in the sentence S is processed using the relationship existence determination model in step S3A,EB) Determining entity EAAnd EBWhether a relationship exists; the method comprises the following specific steps:
s31: reading entity pairs from sentence S (E)A,EB) (ii) a Wherein E isAAnd EBRepresenting any two entities in the sentence S;
s32: generating an entity pair (E)A,EB) Corresponding sentence feature vectors based on entity boundaries;
s33: sending sentence characteristic vectors based on entity boundaries to an input end of a relation existence judgment model;
s34: obtaining the judgment result R from the output end of the relation existence judgment modelExt(EA,EB) (ii) a Wherein R isExt(EA,EB) Represents EAAnd EBA result of determination of existence of a relationship between the two entities;
s35: judgment of RExt(EA,EB) Size relationship to 0.5:
the method comprises the following steps: if R isExt(EA,EB) If > 0.5, executing step S36;
secondly, the step of: if R isExt(EA,EB) If not more than 0.5, jumping to step S37;
s36: tagging entity EAAnd EBThere is a relationship in the sentence S;
s37: judging whether there is any unread entity pair in the sentence S (E)A,EB):
The method comprises the following steps: if yes, go to step S31;
secondly, the step of: if not, go to step S38;
s38: returning sentence S and the related entity pair (E)A,EB)。
Preferably, the generating of the sentence feature vector based on the ELMO network in step S4 specifically refers to generating the sentence S through the ELMO networkBordThe dynamic word embedding of each vocabulary is spliced at the back of the static word embedding generated in the step S1 to form the word embedding of each vocabulary, and then the words are spliced according to the sequence of the vocabulary in the sentence to form the sentence characteristic vector based on the ELMO network; the method comprises the following specific steps:
s41: reading a sentence SBordThe static word embedding sequence of (1); wherein S isBordIs shown at EAAnd EBNew sentence generated after adding boundary symbols at both ends, EAAnd EBAn entity representing a relationship in the sentence S;
s42: generating a sentence SBordThe dynamic word embedding sequence of (1): sentence SBordThe static word embedding sequence is sent into an input layer of an ELMO network, and a sentence S is obtained from an output layerBordThe sequence length of the dynamic word embedding sequence is the same as that of the static word embedding sequence; the method comprises the following specific steps:
s421: sending the static word embedding sequence into an input layer of an ELMO network;
s422: obtaining an operation result of an ELMO network output layer;
s423: acquiring vocabulary dynamic word embedding according to the operation result;
s424: splicing dynamic word embedding according to the vocabulary sequence of sentences;
s425: returning a dynamic word embedding sequence;
s43: generating a sentence SBordThe word embedding sequence of (a): let sentence SBordThe number of the vocabulary is n, and the static word embedding sequence is StaEmdSeqS=[es1,es2,…,esi,…,esn]The dynamic word embedding sequence is DymEmdSeqS=[ed1,ed2,…,edi,…,edn](ii) a Wherein e issiAnd ediRespectively representing the static state of the ith vocabularyWord embedding and dynamic word embedding; e.g. of the typesiediWord embedding for the ith word, with the corresponding operation being to embed ediSpliced at esiThe last, then sentence SBordThe word-embedding sequence of (a) is EmdSeqS=[es1ed1,es2ed2,…,esnedn];
S44: return to SBordThe word embedding sequence of (1);
the generation of the relationship type vector in step S5 is specifically as follows:
One-Hot coding is adopted to express the relation type between entities, namely vector RelVec with length of TkRepresenting the relationship Relk,k∈[1,T](ii) a Wherein, RelkRepresenting an entity EAAnd EBThe relationship between; in RelVeckIn (3), all bits are 0 except that the k-th bit is 1.
Preferably, the training of the convolutional neural network-based relational classification model in step S6 specifically includes setting hyper-parameters of the convolutional neural network, initializing a corresponding weight matrix and bias vector, sending sentence feature vectors and relational type vectors based on the ELMO network to the convolutional neural network, and performing training by optimizing a loss function to obtain a relational classification model; the method comprises the following specific steps:
s61: setting hyper-parameters of a layer 2 convolutional neural network: setting structural parameters of an input layer module, a convolution layer module, a pooling layer module, a full-connection layer module and an output layer module; the convolution layer module adopts convolution kernels with different sizes; the pooling layer module adopts pooling based on an attention mechanism; the method comprises the following specific steps:
s611: setting input layer hyper-parameters;
s612: setting convolution kernels with different sizes;
s613: pooling based on attention mechanism was set: let's assume that t convolution kernels of different sizes are used, and the size of the jth convolution kernel is LjResult of convolution operation RjIs of length n-Lj+1 vector, corresponding attention weight AjIs a vector with the same length, and the corresponding relation type words are embedded into a matrix WjIs of scale (n-L)j+1) T matrix, corresponding "element-relation" correlation matrix GjIs a matrix of the same size, BjIs GjThe column normalization matrix of (a) is,
Figure BDA0003186811420000041
is BjThe g-th column vector of (1),
Figure BDA0003186811420000042
is that
Figure BDA0003186811420000043
The mth element of (1); the method comprises the following specific steps:
s6131: computing an "element-relationship" correlation matrix Gj=Rj Trans*Aj*Wj,Rj TransIs RjTransposing;
s6132: calculation of GjColumn normalization matrix B ofj
Figure BDA0003186811420000044
S6133: r is to bejAnd BjMaximum value of column vector inner product
Figure BDA0003186811420000045
As a result of pooling;
s614: setting other hyper-parameters;
s62: parameters for initializing the layer 2 convolutional neural network: initializing weight matrixes and offset vectors of different modules;
s63: sending sentence feature vectors based on the ELMO network into an input layer of a layer 2 convolutional neural network;
s64: sending the relation type vector to an output layer of a layer 2 convolutional neural network;
s65: training a layer 2 convolutional neural network model according to a convergence condition: training is performed using a distance-based loss function, which is defined as follows:
Lmargin=log(1+exp(ρ(d+-g+(S))))+log(1+exp(ρ(d-+g-(S))));
wherein, g+(S) represents a correct classification score; g-(S) represents a misclassification score; d+And d-Respectively representing the spacing threshold of positive and negative samples; ρ represents a scaling coefficient;
s66: and returning the parameters of the layer 2 convolutional neural network.
Preferably, the step S7 of predicting the relationship type of the entity pair using the relationship classification model means predicting the relationship type of the entity pair in the sentence to be processed by using the trained convolutional neural network, EA'And EB'Representing any two entities in the sentence S' to be processed; the method comprises the following specific steps:
s71: at entity EA'And EB'Adding boundary symbols at both ends to obtain a new sentence S'Bord
S72: generates S'BordThe sentence feature vector based on the entity boundary;
s73: generates S'BordThe sentence feature vector based on the ELMO network;
s74: sending sentence feature vectors based on the ELMO network into an input layer of a layer 2 convolutional neural network;
s75: operating a layer 2 convolutional neural network to obtain output layer information;
s76: obtaining the corresponding relation type of the maximum probability output end as an entity EA'And EB'The relationship between;
s77: judging whether an unread entity E exists in the sentence SA'And EB'
The method comprises the following steps: if yes, go to step S71;
secondly, the step of: if not, go to step S78;
s78: returning to entity EA'And EB'The set of relationships that exist in sentence S'.
A relationship extraction apparatus based on a two-layer convolutional neural network, the apparatus comprising:
a sentence feature vector generation component based on the entity boundary, which is used for generating a sentence feature vector based on the entity boundary;
a relation existence judgment model training component for training a relation existence judgment model based on a convolutional neural network;
the entity pair and sentence generation component is used for screening out the entity pair and the sentence where the relation possibly exists according to the relation existence judgment model;
the sentence feature vector generating component is used for generating a sentence feature vector based on the ELMO network;
the relation type vector generating component is used for generating a vector corresponding to the relation type;
the relation classification model training component is used for training a relation classification model based on a convolutional neural network;
and the entity pair relation type prediction component is used for predicting the relation type of the entity pair in the sentence according to the relation classification model.
Preferably, the sentence feature vector generation unit based on entity boundary includes:
the entity boundary symbol adding component is used for adding boundary symbols at two ends of the entity of the sentence;
the static word embedding generation component is used for carrying out static vectorization on each vocabulary containing the boundary symbol sentences;
the static word embedding and splicing component is used for splicing the static word embedding of each vocabulary according to the sequence of the vocabulary in the sentence;
the relationship existence determination model training section includes:
an entity pair relationship existence identification generation component for generating an identification whether the relationship between the entity pairs exists;
the layer 1 hyper-parameter setting component is used for setting structural parameters of different modules in the convolutional neural network facing the relation existence judgment;
the layer 1 parameter initialization component is used for initializing parameters of different modules in the convolutional neural network facing the relation existence judgment;
the layer 1 input end setting component is used for sending all sentence feature vectors based on entity boundaries into the input end of the convolutional neural network facing the relation existence judgment;
the layer 1 output end setting component is used for sending all entity pair relationship existence identifications to the output end of the convolutional neural network facing relationship existence judgment;
and the layer 1 training convergence component is used for training the convolutional neural network facing the relation existence judgment according to the convergence condition.
Preferably, the ELMO network-based sentence feature vector generation component includes:
the static word embedding sequence reading component is used for reading static word embedding sequences of all words in the sentence;
the dynamic word embedding sequence generating component is used for generating dynamic word embedding sequences of all words in the sentence;
a word embedding sequence generating part for generating word embedding sequences of all words in the sentence;
the relationship classification model training component includes:
the layer 2 hyper-parameter setting component is used for setting the structural parameters of different modules in the convolutional neural network facing the relationship classification;
the layer 2 parameter initialization component is used for initializing parameters of different modules in the convolutional neural network facing the relationship classification;
the layer 2 input end setting component is used for sending all sentence feature vectors based on the ELMO network into the input end of the convolutional neural network facing the relation classification;
the output end of the layer 2 is provided with a component which is used for sending all the relation type vectors into the output end of the convolutional neural network facing the relation classification;
and the layer 2 training convergence component is used for training the convolutional neural network facing the relation classification according to the convergence condition.
A computer-readable storage medium having stored thereon computer-executable instructions, which, when executed by a computer, implement a relationship extraction method based on a two-layer convolutional neural network as described above.
The relation extraction method and device based on the two-layer convolutional neural network have the following advantages:
the invention adopts the 1 st layer of convolutional neural network to screen out the entity pairs which can not have the relationship, and then uses the 2 nd layer of convolutional neural network to predict the specific relationship type, thereby reducing the calculation of the relationship types of a large number of unrelated entities, reducing the cost and improving the effect;
secondly, adding boundary symbols to sentence entities, then embedding static words to obtain sentence characteristic vectors based on entity boundaries, describing the environmental characteristics of the entities in sentences on the whole, sending the environmental characteristics into a layer 1 convolutional neural network, and rapidly judging whether relationships exist between the entities through a classification model; acquiring dynamic word embedding of sentences through an ELMO network, splicing with static word embedding of the sentences to obtain sentence characteristic vectors based on the ELMO network, describing not only the overall characteristics of the sentences, but also individual characteristics of ambiguity and the like of words in different contexts and the relation characteristics of the words and entity pairs, sending the sentence characteristic vectors into a layer 2 convolutional neural network, optimizing the network by using convolution kernels of different sizes, pooling based on an attention mechanism, distance loss function-based strategies and the like, and accurately identifying the relation types between the entities through a multi-classification model; in addition, the static word embedding input into the layer 1 convolutional neural network can generate the input data of the layer 2 convolutional neural network, and the reusability is good;
thirdly, for all sentences containing entities, the convolutional neural network is adopted to process the sentence characteristic vectors based on the entity boundaries, and entities which cannot have the relationship are screened out, so that the calculation cost of the relationship types of a large number of unrelated entities is saved; the method adopts a convolutional neural network to process sentence characteristic vectors based on an ELMO network for sentences where entities possibly having relations are located, and determines specific relation types; by the refinement treatment of the two layers of convolutional neural networks, limited resources are used for entity pairs possibly having relationships and a relationship classification model with higher accuracy, so that the effect of relationship extraction can be improved;
the two-layer convolutional neural network designed by the invention meets the actual requirement, the layer 1 is considered from the speed, and the conventional convolutional neural network is used for constructing a two-class model, so that the existence of the relationship can be rapidly judged; in the layer 2, in consideration of precision, an optimized convolutional neural network is adopted to construct a multi-classification model, so that the relation type can be accurately judged; static word embedding input into the layer 1 convolutional neural network has reusability and is used for generating input data of the layer 2 convolutional neural network; compared with the conventional relation extraction method, the method has the advantages that the speed and the precision are respectively ensured through the two layers of convolutional neural networks, the two contradictory performance indexes can be balanced, and the relation extraction effect is improved.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flow chart diagram of a relationship extraction method based on a two-layer convolutional neural network;
FIG. 2 is a schematic diagram of a sentence with solid boundaries added;
FIG. 3 is a block diagram of the process of step S1 for generating sentence feature vectors based on entity boundaries;
FIG. 4 is a block flow diagram of the step S2 of training a convolutional neural network-based relationship existence determination model;
FIG. 5 is a block flow diagram of the step S3 of using the relationship existence determination model to filter pairs and sentences in which relationships may exist;
FIG. 6 is a block diagram of the process of step S4 for generating ELMO network-based sentence feature vectors;
FIG. 7 is the sentence S generated in step S42BordThe flow diagram of the dynamic word embedding sequence of (1);
FIG. 8 is a block diagram of the process of training the convolutional neural network-based relational classification model of step S6;
FIG. 9 is a block diagram of the flow of setting the hyperparameters of the layer 2 convolutional neural network of step S61;
FIG. 10 is a block flow diagram of a step S7 of predicting the relationship type of an entity pair using a relationship classification model;
FIG. 11 is a block diagram of a double layer convolutional neural network created by the present invention;
FIG. 12 is a schematic diagram of generating dynamic word embedding via an ELMO network;
fig. 13 is a block diagram of a relationship extraction apparatus based on a two-layer convolutional neural network.
Detailed Description
The method and apparatus for extracting relationships based on two-layer convolutional neural network according to the present invention are described in detail below with reference to the drawings and the embodiments.
Example 1:
as shown in fig. 1, the relationship extraction method based on two layers of convolutional neural networks of the present invention specifically comprises the following steps:
s1: generating a sentence feature vector based on the entity boundary;
s2: training a relation existence judgment model based on a convolutional neural network;
s3: screening entity pairs and sentences which possibly have relations by using a relation existence judgment model;
s4: generating sentence feature vectors based on the ELMO network;
s5: generating a relationship type vector;
s6: training a relation classification model based on a convolutional neural network;
s7: the relationship type of the entity pair is predicted using a relationship classification model.
In this embodiment, the step S1 of generating sentence eigenvectors based on the entity boundary refers to adding boundary symbols at the two ends of the entity, performing static vectorization on the vocabulary of the sentences containing the boundary symbols, and then embedding and splicing the vocabulary static words in sequence to form sentence eigenvectors based on the entity boundary; as shown in fig. 3, the following is detailed:
s11: adding boundary symbols at two ends of the entity; the method specifically comprises the following steps: in sentence S, WStartAnd WEndIndicating a beginning and an end vocabulary, EAAnd EBRepresenting any two entities, at EAAnd EBAdding boundary symbols at both ends<eA〉、<\eA>、<eB〉、<\eB>Form a new sentence SBordAs shown in fig. 2;
s12: generating vocabulary static word embedding containing boundary symbols;
s13: splicing static words according to the vocabulary sequence of sentences for embedding;
s14: returning a static word embedding sequence;
in the embodiment, the training of the relation existence determination model based on the convolutional neural network in step S2 is to send the sentence feature vector based on the entity boundary and the relation existence identification into the convolutional neural network, and obtain the relation existence determination model through training; as shown in fig. 4, the following is detailed:
s21: generating entity-to-relationship existence identification: with RExt(EA,EB) Representing an entity EAAnd EBWhether or not there is a relationship between:
the method comprises the following steps: if there is a relationship between the two, then RExt(EA,EB)=1;
Secondly, the step of: if there is no relationship between the two, RExt(EA,EB)=0;
S22: setting hyper-parameters of a layer 1 convolutional neural network: setting structural parameters of an input layer module, a convolution layer module, a pooling layer module, a full-connection layer module and an output layer module;
s23: initializing parameters of the layer 1 convolutional neural network: initializing weight matrixes and offset vectors of different modules;
s24: sending sentence feature vectors based on the entity boundary into an input layer of a layer 1 convolutional neural network;
s25: sending the entity pair relation existence identification to an output layer of the layer 1 convolutional neural network;
s26: training a layer 1 convolutional neural network model according to a convergence condition;
s27: and returning the parameters of the layer 1 convolutional neural network.
The filtering of the pairs of entities and sentences for which there is a possibility of a relationship using the relationship existence determination model in step S3 in the present embodiment means that each pair of entities in the sentence S is processed using the relationship existence determination model (E)A,EB) Determining entity EAAnd EBWhether a relationship exists; as shown in fig. 5, the following is detailed:
s31: reading entity pairs from sentence S (E)A,EB) (ii) a Wherein E isAAnd EBRepresenting any two entities in the sentence S;
s32: generating an entity pair (E)A,EB) Corresponding sentence feature vectors based on entity boundaries;
s33: sending sentence characteristic vectors based on entity boundaries to an input end of a relation existence judgment model;
s34: obtaining the judgment result R from the output end of the relation existence judgment modelExt(EA,EB) (ii) a Wherein R isExt(EA,EB) Represents EAAnd EBA result of determination of existence of a relationship between the two entities;
s35: judgment of RExt(EA,EB) Size relationship to 0.5:
the method comprises the following steps: if R isExt(EA,EB) If > 0.5, executing step S36;
secondly, the step of: if R isExt(EA,EB) If not more than 0.5, jumping to step S37;
s36: tagging entity EAAnd EBThere is a relationship in the sentence S;
s37: judging whether there is any unread entity pair in the sentence S (E)A,EB):
The method comprises the following steps: if yes, go to step S31;
secondly, the step of: if not, go to step S38;
s38: returning sentence S and the related entity pair (E)A,EB)。
The step S4 of generating the sentence feature vector based on the ELMO network in the present embodiment specifically refers to generating the sentence S through the ELMO networkBordThe dynamic word embedding of each vocabulary is spliced at the back of the static word embedding generated in the step S1 to form the word embedding of each vocabulary, and then the words are spliced according to the sequence of the vocabulary in the sentence to form the sentence characteristic vector based on the ELMO network; such asFIG. 6 shows the following details:
s41: reading a sentence SBordThe static word embedding sequence of (1); wherein S isBordIs shown at EAAnd EBNew sentence generated after adding boundary symbols at both ends, EAAnd EBAn entity representing a relationship in the sentence S;
s42: generating a sentence SBordThe dynamic word embedding sequence of (1): sentence SBordThe static word embedding sequence is sent into an input layer of an ELMO network, and a sentence S is obtained from an output layerBordThe sequence length of the dynamic word embedding sequence is the same as that of the static word embedding sequence; as shown in fig. 7, the following is detailed:
s421: sending the static word embedding sequence into an input layer of an ELMO network;
s422: obtaining an operation result of an ELMO network output layer;
s423: acquiring vocabulary dynamic word embedding according to the operation result;
s424: splicing dynamic word embedding according to the vocabulary sequence of sentences;
s425: returning a dynamic word embedding sequence;
s43: generating a sentence SBordThe word embedding sequence of (a): let sentence SBordThe number of the vocabulary is n, and the static word embedding sequence is StaEmdSeqS=[es1,es2,…,esi,…,esn]The dynamic word embedding sequence is DymEmdSeqS=[ed1,ed2,…,edi,…,edn](ii) a Wherein e issiAnd ediStatic word embedding and dynamic word embedding which respectively represent the ith vocabulary; e.g. of the typesiediWord embedding for the ith word, with the corresponding operation being to embed ediSpliced at esiThe last, then sentence SBordThe word-embedding sequence of (a) is EmdSeqS=[es1ed1,es2ed2,…,esnedn];
S44: return to SBordThe word embedding sequence of (1);
the generation of the relationship type vector in step S5 in this embodiment is specifically as follows:
One-Hot coding is adopted to express the relation type between entities, namely vector RelVec with length of TkRepresenting the relationship Relk,k∈[1,T](ii) a Wherein, RelkRepresenting an entity EAAnd EBThe relationship between; in RelVeckIn (3), all bits are 0 except that the k-th bit is 1.
In this embodiment, the training of the convolutional neural network-based relationship classification model in step S6 specifically includes setting hyper-parameters of the convolutional neural network, initializing a corresponding weight matrix and bias vector, sending sentence feature vectors and relationship type vectors based on the ELMO network to the convolutional neural network, and performing training by optimizing a loss function to obtain a relationship classification model; as shown in fig. 8, the following is detailed:
s61: setting hyper-parameters of a layer 2 convolutional neural network: setting structural parameters of an input layer module, a convolution layer module, a pooling layer module, a full-connection layer module and an output layer module; the convolution layer module adopts convolution kernels with different sizes; the pooling layer module adopts pooling based on an attention mechanism; as shown in fig. 9, the details are as follows:
s611: setting the hyper-parameters of an input layer;
s612: setting convolution kernels with different sizes;
s613: pooling based on attention mechanism was set: let's assume that t convolution kernels of different sizes are used, and the size of the jth convolution kernel is LjResult of convolution operation RjIs of length n-Lj+1 vector, corresponding attention weight AjIs a vector with the same length, and the corresponding relation type words are embedded into a matrix WjIs of scale (n-L)j+1) T matrix, corresponding "element-relation" correlation matrix GjIs a matrix of the same size, BjIs GjThe column normalization matrix of (a) is,
Figure BDA0003186811420000101
is BjThe g-th column vector of (1),
Figure BDA0003186811420000102
is that
Figure BDA0003186811420000103
The mth element of (1); the method comprises the following specific steps:
s6131: computing an 'element-relationship' correlation matrix
Figure BDA0003186811420000104
Figure BDA0003186811420000105
Is RjTransposing;
s6132: calculation of GjColumn normalization matrix B ofj
Figure BDA0003186811420000106
S6133: r is to bejAnd BjMaximum value of column vector inner product
Figure BDA0003186811420000107
As a result of pooling;
s614: setting other hyper-parameters;
s62: parameters for initializing the layer 2 convolutional neural network: initializing weight matrixes and offset vectors of different modules;
s63: sending sentence feature vectors based on the ELMO network into an input layer of a layer 2 convolutional neural network;
s64: sending the relation type vector to an output layer of a layer 2 convolutional neural network;
s65: training a layer 2 convolutional neural network model according to a convergence condition: training is performed using a distance-based loss function, which is defined as follows:
Lmargin=log(1+exp(ρ(d+-g+(S))))+log(1+exp(ρ(d-+g-(S))));
wherein, g+(S) represents a correct classification score; g-(S) represents a misclassification score; d+And d-Respectively representing the spacing threshold of positive and negative samples; ρ represents a scaling coefficient;
s66: and returning the parameters of the layer 2 convolutional neural network.
The step S7 of predicting the relationship type of the entity pair using the relationship classification model in this embodiment refers to predicting the relationship type of the entity pair in the sentence to be processed using the trained convolutional neural network, EA'And EB'Representing any two entities in the sentence S' to be processed; as shown in fig. 10, the following is detailed:
s71: at entity EA'And EB'Adding boundary symbols at both ends to obtain a new sentence S'Bord
S72: generates S'BordThe sentence feature vector based on the entity boundary;
s73: generates S'BordThe sentence feature vector based on the ELMO network;
s74: sending sentence feature vectors based on the ELMO network into an input layer of a layer 2 convolutional neural network;
s75: operating a layer 2 convolutional neural network to obtain output layer information;
s76: obtaining the corresponding relation type of the maximum probability output end as an entity EA'And EB'The relationship between;
s77: judging whether an unread entity E exists in the sentence SA'And EB'
The method comprises the following steps: if yes, go to step S71;
secondly, the step of: if not, go to step S78;
s78: returning to entity EA'And EB'The set of relationships that exist in sentence S'.
As shown in fig. 11, the double-layer convolutional neural network created by the present invention is as follows:
lowest [ e ]s1,es2,…,esi,…,esn]The sentence characteristic vector based on the entity boundary is sent into the layer 1 convolutional neural network to obtain the entity EAAnd EBResult of determination of presence or absence of relationship RExt(EA,EB) (ii) a Above the layer 1 convolutional neural network is a control composed of a step function sgn (x)A circuit, the threshold is 0.5, the middle point of the circle above the circuit [ ] indicates the multiplication of the vector and the scalar; r is to beExt(EA,EB) Considered as x, if RExt(EA,EB) If the output is more than 0.5, the output of the control circuit is 1, and the control circuit is used for sending the sentence characteristic vector based on the entity boundary into the ELMO network and executing the subsequent relation type calculation; if R isExt(EA,EB) And when the output value is less than or equal to 0.5, the output of the control circuit is 0, and the execution is stopped. The circle to the right of the ELMO network containing the symbol "C
Figure BDA0003186811420000111
Representing vector connections for static word embedding [ e ] at the input of a spliced ELMO networks1,es2,…,esi,…,esn]And dynamic word embedding of the output [ e ]d1,ed2,…,edi,…,edn]Obtaining sentence characteristic vector [ e ] based on ELMO networks1ed1,es2ed2,…,esnedn](ii) a After the characteristic vector is sent into a layer 2 convolutional neural network, an entity E can be obtainedAAnd EBThe type of relationship between.
Example 2:
the operation is executed in a Python environment, LabeledRelSet is used for representing a data set with a known relation, UnLabeledrelset is used for representing a data set with an unknown relation, and the method provided by the invention is used for processing the LabeledRelSet to obtain a relation existence judgment model and a relation classification model which are used for extracting the relation existing in the UnLabeledrelset.
In the data set, each row represents a relationship between pairs of entities in a sentence, and if there are multiple relationships in the sentence, then it is represented in multiple rows, the format is as follows:
entity 1 entity 2 relational sentences.
For example: some of the King, some of the Master and the King, some of the Pair of Wu, encourages others to pay attention to the painting and calligraphy of He jin Shi, especially, many workers are going to do on the aspect of jin Shi.
The above example shows that the type of relationship between the entities "wangtao and" wu-tang "is" teachers and students ".
There are 12 relationships in the dataset: superior and inferior, collaboration, friends, siblings, teachers and students, lovers, couples, parents, brothers and sisters, grandparents, relatives and others.
The data set only includes entities and relationships and lacks word segmentation information, so that the data set needs to be preprocessed before the method of the invention is used. Removing blank spaces, then removing stop words according to a stop word list, and finally performing word segmentation by using a jieba.
Execution of S11: adding boundary symbols at both ends of the entity, taking the above sentence as an example, the processed result is "<eA>King a certain<\eA>To pair<eB>Wu Yi (a certain medicine)<\eB>Encouraged, hoped that he will be able to draw on the gold stone again, especially much time and effort is needed to make the gold stone. ".
Execution of S12: taking the boundary symbols as vocabularies, the static word embedding of each vocabulary is obtained using the glove.
Execution of S13: and splicing words according to the lexical sequence to be embedded to obtain sentence characteristic vectors based on the entity boundary.
Execution of S22: first, creating a keras layers embedding object defines an input layer sequence, the maximum length of the sequence is 50, and the dimension of each element is 100.
Then, a convolution kernel (Size ═ 2) of the convolutional layer was created using keras. Maximal pooling of the convolution kernel was performed using a keras. layers. maxpooling1d, with a pooling window size of 49.
Subsequently, the output vectors of the different pooling layers are merged using keras. back. truncate, the merged output vector is flattened using keras. back. flatten, and the components of the flattened result are discarded using keras. back. drop with a given probability (0.2), with the dimensionality of the result being 32.
And finally, defining a full connection layer by using keras.
Execution of S23: the weight matrices and bias vectors are initialized using kernel _ initializer and bias _ initializer parameters of kernel.
Performing S24-S26: model, a CNN model object RelExtCheck is defined and trained by using keras, models and a model is compiled by using a compact function of the object, and main parameters are as follows:
loss function: selecting a cross entropy suitable for the two classes;
an optimizer: choosing random gradient descent when optimizer is 'sgd';
evaluation indexes are as follows: the metric [ 'binary _ accuracy' ], and an evaluation function suitable for the two classifications is selected.
The CNN model object is compiled according to the above parameters as follows:
RelExtCheck.compile(loss='binary_crossentropy',optimizer='sgd',metrics=['binary_accuracy']);
representing a sentence feature vector set based on an entity boundary and an entity pair relation existence identification set in a LabeledRelSet by BordFeatVecSet _ Labeled and RelExtSet _ Labeled respectively, and training a model by using a fit function as follows:
RelExtCheck.fit(BordFeatVecSet_Labeled,RelExtSet_Labeled,batch_size=8000,epochs=10);
assuming that a sentence feature vector set based on an entity boundary contains G vectors, the command represents: and (4) totalizing 10 iterations, updating the gradient | G/8000| +1 time in each iteration, and selecting 8000 samples for training each time.
Execution of S3: the set of sentence feature vectors based on entity boundaries in an unladetrelset is represented by bordfeedvecset _ unladed, and the prediction result is predicted using the prediction function of the RelExtCheck object as follows:
RelExtSet_UnLabeled=RelExtCheck.predict(BordFeatVecSet_UnLabeled);
the obtained RelExtSet _ UnLabeled is an entity pair relationship existence identification set in the UnLabeledRelSet, and if the value of a certain element is more than 0.5, the corresponding entity pair existence relationship is shown.
Execution of S41: reading the sentence characteristic vector based on the entity boundary generated in the step S1 to obtain a sentence SBordStatic word embedding sequence StaEmdSeqS=[es1,es2,…,esi,…,esn]。
Execution of S42: module () function is used to load the ELMO word embedding model, StaemdSeqSInputting the model to obtain a sentence SBordDynamic word-embedding sequence DymEmdSeqS=[ed1,ed2,…,edi,…,edn]As shown in fig. 12.
Execution of S43: by means of the splicing sequence StaEmdSeqSAnd DymEmdSeqSGet the sentence SBordWord embedding sequence of (EmdSeq)S=[es1ed1,es2ed2,…,esnedn]。
Execution of S5: the 12 relationship types are ID encoded as follows (the numbers in parentheses are the corresponding ID encodings):
superior and inferior (0), cooperation (1), friends (2), sibling (3), teachers and students (4), lovers (5), couples (6), parents (7), brothers and sisters (8), grandparents (9), relatives (10) and others (11).
ID codes are vectorized by adopting One-Hot codes, the dimension of a vector corresponding to each ID code is 12, the component value of a corresponding bit is 1, and the component values of the rest bits are 0.
For example: the ID code for the relationship "sibling" is 8, and its corresponding vector is [0,0,0,0,0, 1,0,0,0 ].
Executing S611: creating a keras layers embedding object defines the input layer sequence with a maximum length of 50 and a dimension of 200 for each element.
Executing S612: different sizes of (Size 2, 3, 4) convolution kernels were created using keras. layers. conv1d, the number of convolution kernels for each Size being 256.
Execution of S613: writing a pooling function AttPooling based on an attention mechanism, calling AttPooling at a custom layer created by keras.
Executing S614: firstly, combining output vectors of different pooling layers by using keras, backup and containment; then flattening the combined output vector by using keras. backskend. flatten; then, the components of the flattened result are discarded with a given probability (0.2) using keras. back. drop, and the dimensionality of the result is 768.
Finally, a full-link layer is defined using keras.
Execution of S62: the weight matrices and bias vectors are initialized using kernel _ initializer and bias _ initializer parameters of kernel.
Performing S63-S65: compiling a margin based on a distance loss function;
model, a CNN model object RelTypeCheck is defined and trained by using keras, models and the model is compiled by using a compact function of the object, and main parameters are as follows:
loss function: selecting a self-defined distance-based loss function;
an optimizer: choosing random gradient descent when optimizer is 'sgd';
evaluation indexes are as follows: the evaluation function suitable for multi-classification is selected.
The CNN model object is compiled according to the above parameters as follows:
RelTypeCheck.compile(loss='margin',optimizer='sgd',metrics=['categorical_accuracy']);
the set of ELMO network-based sentence feature vectors and relationship type vectors in LabeledRelSet is represented by ELMOFeatVecSet _ Labeled and RelTypeSet _ Labeled, respectively, and the model is trained using the fit function of the RelTypeCheck object, as follows:
RelTypeCheck.fit(ELMOFeatVecSet_Labeled,RelTypeSet_Labeled,batch_size=2000,epochs=10);
let the number of "entity pair-sentence" feature vector sequences be G, the above command represents: and totalizing 10 iterations, updating the gradient | G/2000| for 1 time in each iteration, and selecting 2000 samples for training each time.
Execution of S7: the set of sentence feature vectors based on the ELMO network in UnLabeledRelSet is represented by ELMOFeatVecSet _ UnLabeled, and the prediction result is predicted by using the prediction function of the RelTypeCheck object:
RelTypeSet_UnLabeled=
RelTypeCheck.predict(ELMOFeatVecSet_UnLabeled);
the obtained RelTypeSet _ UnLabeled is a relationship type vector set in the UnLabeledRelSet, elements are probability vectors with the length of 12, the dimension number with the largest value is an ID code of the relationship type, and the corresponding relationship type can be determined according to the code.
Example 3:
as shown in fig. 13, the relationship extraction device based on two layers of convolutional neural networks of the present invention includes:
a sentence feature vector generation part M1 based on the entity boundary, which is used for generating a sentence feature vector based on the entity boundary;
a relation existence judgment model training part M2 for training a relation existence judgment model based on a convolutional neural network;
the entity pair and sentence generation component M3 for screening out the entity pair and the sentence which may have the relationship according to the relationship existence judgment model;
the ELMO network-based sentence feature vector generating component M4 is used for generating ELMO network-based sentence feature vectors;
a relation type vector generating component M5, configured to generate a vector corresponding to a relation type;
a relation classification model training component M6 for training a relation classification model based on a convolutional neural network;
and the entity pair relation type prediction component M7 is used for predicting the relation type of the entity pair in the sentence according to the relation classification model.
The sentence feature vector generation part M1 based on the entity boundary in the present embodiment includes:
an entity boundary symbol adding component M11, for adding boundary symbols at two ends of the entity of the sentence;
a static word embedding generation component M12 for performing static vectorization on each word containing the boundary symbol sentences;
a static word embedding and splicing component M13 for splicing the static word embedding of each vocabulary according to the sequence of the vocabulary in the sentence;
the relationship existence determination model training section M2 in the present embodiment includes:
an entity pair relationship existence identification generating component M21 for generating an identification of whether a relationship exists between entity pairs;
the layer 1 hyper-parameter setting component M22 is used for setting structural parameters of different modules in the convolutional neural network facing the relation existence judgment;
a layer 1 parameter initialization component M23, configured to initialize parameters of different modules in the convolutional neural network oriented to the relationship existence determination;
the layer 1 input end setting component M24 is used for sending all sentence feature vectors based on entity boundaries into the input end of the convolutional neural network facing the relationship existence judgment;
the layer 1 output end setting component M25 is used for sending all entity pair relationship existence identifications to the output end of the convolutional neural network facing relationship existence judgment;
the layer 1 training convergence unit M26 is configured to train the convolutional neural network oriented to the relationship existence determination according to the convergence condition.
The ELMO network-based sentence feature vector generation component M4 in the present embodiment includes:
a static word embedding sequence reading means M41 for reading the static word embedding sequences of all words in the sentence;
a dynamic word embedding sequence generating part M42 for generating a dynamic word embedding sequence of all words in the sentence;
a word embedding sequence generating part M43 for generating word embedding sequences of all words in the sentence;
the relationship classification model training component M6 includes:
the layer 2 hyper-parameter setting component M61 is used for setting the structural parameters of different modules in the convolutional neural network facing the relationship classification;
a layer 2 parameter initialization component M62, configured to initialize parameters of different modules in the convolutional neural network based on the relationship classification;
the layer 2 input end setting component M63 is used for sending all sentence feature vectors based on the ELMO network into the input end of the convolutional neural network facing the relational classification;
the layer 2 output end setting component M64 is used for sending all the relation type vectors into the output end of the convolutional neural network facing the relation classification;
the layer 2 trains a convergence component M65 for training the convolutional neural network facing the relationship classification according to the convergence condition.
Example 4:
the embodiment of the invention also provides a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are loaded by the processor, so that the processor executes the relationship extraction method based on the two-layer convolutional neural network in any embodiment of the invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A relation extraction method based on a two-layer convolutional neural network is characterized by comprising the following specific steps:
s1: generating a sentence feature vector based on the entity boundary;
s2: training a relation existence judgment model based on a convolutional neural network;
s3: screening entity pairs and sentences which possibly have relations by using a relation existence judgment model;
s4: generating sentence feature vectors based on the ELMO network;
s5: generating a relationship type vector;
s6: training a relation classification model based on a convolutional neural network;
s7: the relationship type of the entity pair is predicted using a relationship classification model.
2. The method for extracting relationship based on two-layer convolutional neural network of claim 1, wherein the step S1 of generating sentence feature vectors based on entity boundaries means adding boundary symbols at both ends of the entity, performing static vectorization on the vocabulary of sentences containing the boundary symbols, and then embedding and splicing the static words of the vocabulary in sequence to form sentence feature vectors based on entity boundaries; the method comprises the following specific steps:
s11: adding boundary symbols at two ends of the entity; the method specifically comprises the following steps: in sentence S, WStartAnd WEndIndicating a beginning and an end vocabulary, EAAnd EBRepresenting any two entities, at EAAnd EBAdding boundary symbols at both ends<eA>、<\eA>、<eB>、<\eB>Form a new sentence SBord
S12: generating vocabulary static word embedding containing boundary symbols;
s13: splicing static words according to the vocabulary sequence of sentences for embedding;
s14: returning a static word embedding sequence;
in the step S2, the training of the convolutional neural network-based relationship existence determination model is to send the sentence feature vectors and the relationship existence identifications based on the entity boundaries to the convolutional neural network, and obtain the relationship existence determination model through training; the method comprises the following specific steps:
s21: generating entity-to-relationship existence identification: with RExt(EA,EB) Representing an entity EAAnd EBWhether or not there is a relationship between:
the method comprises the following steps: if there is a relationship between the two, then RExt(EA,EB)=1;
Secondly, the step of: if there is no relationship between the two, RExt(EA,EB)=0;
S22: setting hyper-parameters of a layer 1 convolutional neural network: setting structural parameters of an input layer module, a convolution layer module, a pooling layer module, a full-connection layer module and an output layer module;
s23: initializing parameters of the layer 1 convolutional neural network: initializing weight matrixes and offset vectors of different modules;
s24: sending sentence feature vectors based on the entity boundary into an input layer of a layer 1 convolutional neural network;
s25: sending the entity pair relation existence identification to an output layer of the layer 1 convolutional neural network;
s26: training a layer 1 convolutional neural network model according to a convergence condition;
s27: and returning the parameters of the layer 1 convolutional neural network.
3. The method of extracting relationship based on two-layer convolutional neural network of claim 1, wherein each entity pair (E) in sentence S is processed using the relationship existence decision model in step S3A,EB) Determining entity EAAnd EBWhether a relationship exists; the method comprises the following specific steps:
s31: reading entity pairs from sentence S (E)A,EB) (ii) a Wherein E isAAnd EBRepresenting any two entities in the sentence S;
s32: generating an entity pair (E)A,EB) Corresponding sentence feature vectors based on entity boundaries;
s33: sending sentence characteristic vectors based on entity boundaries to an input end of a relation existence judgment model;
s34: obtaining the judgment result R from the output end of the relation existence judgment modelExt(EA,EB) (ii) a Wherein R isExt(EA,EB) Represents EAAnd EBA result of determination of existence of a relationship between the two entities;
s35: judgment of RExt(EA,EB) Size relationship to 0.5:
the method comprises the following steps: if R isExt(EA,EB) If > 0.5, executing step S36;
secondly, the step of: if R isExt(EA,EB) If not more than 0.5, jumping to step S37;
s36: markingEntity EAAnd EBThere is a relationship in the sentence S;
s37: judging whether there is any unread entity pair in the sentence S (E)A,EB):
The method comprises the following steps: if yes, go to step S31;
secondly, the step of: if not, go to step S38;
s38: returning sentence S and the related entity pair (E)A,EB)。
4. The method of claim 1, wherein the step S4 of generating the sentence feature vector based on the ELMO network is to generate the sentence S through the ELMO networkBordThe dynamic word embedding of each vocabulary is spliced at the back of the static word embedding generated in the step S1 to form the word embedding of each vocabulary, and then the words are spliced according to the sequence of the vocabulary in the sentence to form the sentence characteristic vector based on the ELMO network; the method comprises the following specific steps:
s41: reading a sentence SBordThe static word embedding sequence of (1); wherein S isBordIs shown at EAAnd EBNew sentence generated after adding boundary symbols at both ends, EAAnd EBAn entity representing a relationship in the sentence S;
s42: generating a sentence SBordThe dynamic word embedding sequence of (1): sentence SBordThe static word embedding sequence is sent into an input layer of an ELMO network, and a sentence S is obtained from an output layerBordThe sequence length of the dynamic word embedding sequence is the same as that of the static word embedding sequence; the method comprises the following specific steps:
s421: sending the static word embedding sequence into an input layer of an ELMO network;
s422: obtaining an operation result of an ELMO network output layer;
s423: acquiring vocabulary dynamic word embedding according to the operation result;
s424: splicing dynamic word embedding according to the vocabulary sequence of sentences;
s425: returning a dynamic word embedding sequence;
s43: generating sentencesSBordThe word embedding sequence of (a): let sentence SBordThe number of the vocabulary is n, and the static word embedding sequence is StaEmdSeqS=[es1,es2,…,esi,…,esn]The dynamic word embedding sequence is DymEmdSeqS=[ed1,ed2,…,edi,…,edn](ii) a Wherein e issiAnd ediStatic word embedding and dynamic word embedding which respectively represent the ith vocabulary; e.g. of the typesiediWord embedding for the ith word, with the corresponding operation being to embed ediSpliced at esiThe last, then sentence SBordThe word-embedding sequence of (a) is EmdSeqS=[es1ed1,es2ed2,…,esnedn];
S44: return to SBordThe word embedding sequence of (1);
the generation of the relationship type vector in step S5 is specifically as follows:
One-Hot coding is adopted to express the relation type between entities, namely vector RelVec with length of TkRepresenting the relationship Relk,k∈[1,T](ii) a Wherein, RelkRepresenting an entity EAAnd EBThe relationship between; in RelVeckIn (3), all bits are 0 except that the k-th bit is 1.
5. The method for extracting relationship based on two layers of convolutional neural networks as claimed in claim 1, wherein the training of the convolutional neural network-based relationship classification model in step S6 specifically means setting hyper-parameters of the convolutional neural network, initializing corresponding weight matrix and bias vector, sending sentence feature vector and relationship type vector based on the ELMO network into the convolutional neural network, and training by optimizing the loss function to obtain the relationship classification model; the method comprises the following specific steps:
s61: setting hyper-parameters of a layer 2 convolutional neural network: setting structural parameters of an input layer module, a convolution layer module, a pooling layer module, a full-connection layer module and an output layer module; the convolution layer module adopts convolution kernels with different sizes, and the pooling layer module adopts pooling based on an attention mechanism; the method comprises the following specific steps:
s611: setting input layer hyper-parameters;
s612: setting convolution kernels with different sizes;
s613: pooling based on attention mechanism was set: let's assume that t convolution kernels of different sizes are used, and the size of the jth convolution kernel is LjResult of convolution operation RjIs of length n-Lj+1 vector, corresponding attention weight AjIs a vector with the same length, and the corresponding relation type words are embedded into a matrix WjIs of scale (n-L)j+1) T matrix, corresponding "element-relation" correlation matrix GjIs a matrix of the same size, BjIs GjThe column normalization matrix of (a) is,
Figure FDA0003186811410000041
is BjThe g-th column vector of (1),
Figure FDA0003186811410000042
is that
Figure FDA0003186811410000043
The mth element of (1); the method comprises the following specific steps:
s6131: computing an "element-relationship" correlation matrix Gj=Rj Trans*Aj*Wj,Rj TransIs RjTransposing;
s6132: calculation of GjColumn normalization matrix B ofj
Figure FDA0003186811410000044
S6133: r is to bejAnd BjMaximum value of column vector inner product
Figure FDA0003186811410000045
As a result of pooling;
s614: setting other hyper-parameters;
s62: parameters for initializing the layer 2 convolutional neural network: initializing weight matrixes and offset vectors of different modules;
s63: sending sentence feature vectors based on the ELMO network into an input layer of a layer 2 convolutional neural network;
s64: sending the relation type vector to an output layer of a layer 2 convolutional neural network;
s65: training a layer 2 convolutional neural network model according to a convergence condition: training is performed using a distance-based loss function, which is defined as follows:
Lmargin=log(1+exp(ρ(d+-g+(S))))+log(1+exp(ρ(d-+g-(S))));
wherein, g+(S) represents a correct classification score; g-(S) represents a misclassification score; d+And d-Respectively representing the spacing threshold of positive and negative samples; ρ represents a scaling coefficient;
s66: and returning the parameters of the layer 2 convolutional neural network.
6. The method for extracting relationship based on two-layer convolutional neural network as claimed in any of claims 1-5, wherein the step S7 of predicting the relationship type of the entity pair using the relationship classification model means using the trained convolutional neural network to predict the relationship type of the entity pair in the sentence to be processed, EA'And EB'Representing any two entities in the sentence S' to be processed; the method comprises the following specific steps:
s71: at entity EA'And EB'Adding boundary symbols at both ends to obtain a new sentence S'Bord
S72: generates S'BordThe sentence feature vector based on the entity boundary;
s73: generates S'BordThe sentence feature vector based on the ELMO network;
s74: sending sentence feature vectors based on the ELMO network into an input layer of a layer 2 convolutional neural network;
s75: operating a layer 2 convolutional neural network to obtain output layer information;
s76: obtaining a probabilistic maximum outputCorresponding relation type as entity EA'And EB'The relationship between;
s77: judging whether an unread entity E exists in the sentence SA'And EB'
The method comprises the following steps: if yes, go to step S71;
secondly, the step of: if not, go to step S78;
s78: returning to entity EA'And EB'The set of relationships that exist in sentence S'.
7. A relationship extraction apparatus based on two layers of convolutional neural networks, the apparatus comprising:
a sentence feature vector generation component based on the entity boundary, which is used for generating a sentence feature vector based on the entity boundary;
a relation existence judgment model training component for training a relation existence judgment model based on a convolutional neural network;
the entity pair and sentence generation component is used for screening out the entity pair and the sentence where the relation possibly exists according to the relation existence judgment model;
the sentence feature vector generating component is used for generating a sentence feature vector based on the ELMO network;
the relation type vector generating component is used for generating a vector corresponding to the relation type;
the relation classification model training component is used for training a relation classification model based on a convolutional neural network;
and the entity pair relation type prediction component is used for predicting the relation type of the entity pair in the sentence according to the relation classification model.
8. The apparatus for extracting relationship based on two-layer convolutional neural network of claim 7, wherein the sentence feature vector generation means based on entity boundary comprises:
the entity boundary symbol adding component is used for adding boundary symbols at two ends of the entity of the sentence;
the static word embedding generation component is used for carrying out static vectorization on each vocabulary containing the boundary symbol sentences;
the static word embedding and splicing component is used for splicing the static word embedding of each vocabulary according to the sequence of the vocabulary in the sentence;
the relationship existence determination model training section includes:
an entity pair relationship existence identification generation component for generating an identification whether the relationship between the entity pairs exists;
the layer 1 hyper-parameter setting component is used for setting structural parameters of different modules in the convolutional neural network facing the relation existence judgment;
the layer 1 parameter initialization component is used for initializing parameters of different modules in the convolutional neural network facing the relation existence judgment;
the layer 1 input end setting component is used for sending all sentence feature vectors based on entity boundaries into the input end of the convolutional neural network facing the relation existence judgment;
the layer 1 output end setting component is used for sending all entity pair relationship existence identifications to the output end of the convolutional neural network facing relationship existence judgment;
and the layer 1 training convergence component is used for training the convolutional neural network facing the relation existence judgment according to the convergence condition.
9. The apparatus of claim 7, wherein the ELMO network-based sentence feature vector generation component comprises:
the static word embedding sequence reading component is used for reading static word embedding sequences of all words in the sentence;
the dynamic word embedding sequence generating component is used for generating dynamic word embedding sequences of all words in the sentence;
a word embedding sequence generating part for generating word embedding sequences of all words in the sentence;
the relationship classification model training component includes:
the layer 2 hyper-parameter setting component is used for setting the structural parameters of different modules in the convolutional neural network facing the relationship classification;
the layer 2 parameter initialization component is used for initializing parameters of different modules in the convolutional neural network facing the relationship classification;
the layer 2 input end setting component is used for sending all sentence feature vectors based on the ELMO network into the input end of the convolutional neural network facing the relation classification;
the output end of the layer 2 is provided with a component which is used for sending all the relation type vectors into the output end of the convolutional neural network facing the relation classification;
and the layer 2 training convergence component is used for training the convolutional neural network facing the relation classification according to the convergence condition.
10. A computer-readable storage medium having stored thereon computer-executable instructions, which when executed by a computer, implement the method for extracting a relationship based on two-layer convolutional neural network as claimed in any one of claims 1 to 6.
CN202110864354.0A 2021-07-29 2021-07-29 Relation extraction method and device based on two-layer convolutional neural network Pending CN113627192A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110864354.0A CN113627192A (en) 2021-07-29 2021-07-29 Relation extraction method and device based on two-layer convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110864354.0A CN113627192A (en) 2021-07-29 2021-07-29 Relation extraction method and device based on two-layer convolutional neural network

Publications (1)

Publication Number Publication Date
CN113627192A true CN113627192A (en) 2021-11-09

Family

ID=78381614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110864354.0A Pending CN113627192A (en) 2021-07-29 2021-07-29 Relation extraction method and device based on two-layer convolutional neural network

Country Status (1)

Country Link
CN (1) CN113627192A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023130688A1 (en) * 2022-01-05 2023-07-13 苏州浪潮智能科技有限公司 Natural language processing method and apparatus, device, and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023130688A1 (en) * 2022-01-05 2023-07-13 苏州浪潮智能科技有限公司 Natural language processing method and apparatus, device, and readable storage medium

Similar Documents

Publication Publication Date Title
CN111914091B (en) Entity and relation combined extraction method based on reinforcement learning
CN108446271B (en) Text emotion analysis method of convolutional neural network based on Chinese character component characteristics
CN110196982B (en) Method and device for extracting upper-lower relation and computer equipment
CN110263325B (en) Chinese word segmentation system
CN112395393B (en) Remote supervision relation extraction method based on multitask and multiple examples
CN111259144A (en) Multi-model fusion text matching method, device, equipment and storage medium
CN110275928B (en) Iterative entity relation extraction method
CN112434535A (en) Multi-model-based factor extraction method, device, equipment and storage medium
CN115687610A (en) Text intention classification model training method, recognition device, electronic equipment and storage medium
CN113221569A (en) Method for extracting text information of damage test
CN115577283A (en) Entity classification method and device, electronic equipment and storage medium
CN115687609A (en) Zero sample relation extraction method based on Prompt multi-template fusion
CN115238026A (en) Medical text subject segmentation method and device based on deep learning
CN113722439B (en) Cross-domain emotion classification method and system based on antagonism class alignment network
CN114510946A (en) Chinese named entity recognition method and system based on deep neural network
CN113627192A (en) Relation extraction method and device based on two-layer convolutional neural network
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN113158659A (en) Case-related property calculation method based on judicial text
CN116384379A (en) Chinese clinical term standardization method based on deep learning
CN115759043A (en) Document-level sensitive information detection model training and prediction method
CN113095087B (en) Chinese word sense disambiguation method based on graph convolution neural network
CN115759090A (en) Chinese named entity recognition method combining soft dictionary and Chinese character font features
CN114780725A (en) Text classification algorithm based on deep clustering
CN115130475A (en) Extensible universal end-to-end named entity identification method
CN112329440A (en) Relation extraction method and device based on two-stage screening and classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination