CN113807079A - End-to-end entity and relation combined extraction method based on sequence-to-sequence - Google Patents

End-to-end entity and relation combined extraction method based on sequence-to-sequence Download PDF

Info

Publication number
CN113807079A
CN113807079A CN202010531196.2A CN202010531196A CN113807079A CN 113807079 A CN113807079 A CN 113807079A CN 202010531196 A CN202010531196 A CN 202010531196A CN 113807079 A CN113807079 A CN 113807079A
Authority
CN
China
Prior art keywords
network
vector
sentence
entity
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010531196.2A
Other languages
Chinese (zh)
Other versions
CN113807079B (en
Inventor
何小海
刘露平
卿粼波
罗晓东
吴晓红
任超
吴小强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202010531196.2A priority Critical patent/CN113807079B/en
Publication of CN113807079A publication Critical patent/CN113807079A/en
Application granted granted Critical
Publication of CN113807079B publication Critical patent/CN113807079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for jointly extracting end-to-end entities and relations based on sequences. The method adopts a sequence-to-sequence network structure to generate a triple sequence, and the network consists of an encoding network, a relation decoding network and a pointer network which are integrated with syntax dependence information. The coding network fusing the syntax dependence information is realized based on a Transformer network, and the syntax dependence tree information of the sentence is fused to obtain better coding representation during coding so as to reduce noise information. The relational decoding outputs a relational sequence based on a transform decoding network. The pointer network is composed of two networks with the same structure and is respectively used for extracting a head entity and a tail entity. The pointer network is implemented in a multi-headed attention mechanism, where the attention matrix is used as a pointer for selecting the starting position of an entity from an input sentence. The network provided by the invention realizes the parallel output of the relationship and the entity by adopting multi-head attention, thereby enhancing the dependence of the entity and the relationship on one hand and accelerating the decoding speed on the other hand.

Description

End-to-end entity and relation combined extraction method based on sequence-to-sequence
Technical Field
The invention designs an end-to-end information extraction method based on sequence-to-sequence, belonging to the technical field of natural language processing.
Background
Information extraction is a basic and important task in natural language processing, which is the basis for constructing a knowledge graph and is also an important step for realizing the conversion from unstructured data to structured data. In information extraction, entity and relationship joint extraction refers to directly extracting entity pairs and corresponding relationships from an original text to form effective triples, and the triples are widely applied to tasks such as knowledge graph construction, internet data structured extraction and structured extraction of data in the fields of judicial science, medical treatment and the like. Due to the wide application prospect in reality, the triple extraction has been widely concerned by researchers in academia and industry. In recent years, with the development of deep learning technology, the rapid development of the technology is promoted. But in real-world scenarios, due to the complexity and diversity of text expressions, the extraction of triplet information still faces some challenges. Among these challenges, the overlap relationship abstraction is a more complex challenge. In the extraction of the overlapping relationship, different relationships exist between an entity and a plurality of entities, and even different relationships exist between an entity pair. These widely existing overlapping relationships do not allow current methods to handle very efficiently.
In the traditional method, a triple extraction task is finished as two independent tasks, namely named entity identification and relationship classification. In these methods, all entities in a sentence are first identified by a named entity identification method, and then their relationships are predicted for different pairs of entities. The method is simple to implement, but has the defects of error propagation and the like, namely, errors generated in the named entity identification stage can influence the subsequent relation classification. Furthermore, such methods fail to model the relevance of the two tasks. To solve this problem, the Joint learning method was proposed by researchers, which models the correlation between two networks by a way of sharing the network, thus achieving a better effect in this task, which has been proven in many existing studies (Zheng S, Hao Y, Lu D, Bao H, Xu J, Hao H, et al. Joint entry and Relation expression based on a hybrid network. neural expression.2017; 257: 59-66; Katiyer A, card C, edition. Gout a symbol: Joint Extraction of Entity and Relations with Dependency Trees2017: Association for use in computers; Miwa M, ball M, object Entity and Relation of Relations between Trees and Relations between models about the Relations of Relations between models and Relations of Relations between models 2016).
Although the progress of joint extraction is promoted to a certain extent by joint learning, the problem of extraction of overlapping relationships cannot be solved well by the joint extraction method. The joint learning uses a classification method to classify the relationship between two entity pairs, so if all triples need to be extracted, all entity pairs need to be enumerated and classified, which brings a large calculation cost. Furthermore, many entity pairs do not contain any relationship, so in the training phase, there will be a large number of entity pairs assigned 'None' labels, which will make it difficult for the neural network model to learn the true relationship. Furthermore, classification models cannot efficiently handle scenarios where one entity pair contains two relationships.
In order to solve the problems, the invention provides a combined information extraction method based on sequence-to-sequence. The method models triple generation as a sequence generation task, so that an entity or relationship can be generated multiple times to meet the need for overlapping triple generation. In conventional sequence generation tasks, it is based on word-level generation, i.e. only one word is generated at a time. In the joint extraction of triples, it is necessary to extract a sequence of triples, and therefore the extraction needs to be modeled at the sequence level, i.e. one triplet is generated at a time. Furthermore, in the joint extraction of triples, entity pairs come from input sentences, not from vocabularies. In order to meet the two requirements, the invention constructs a joint information extraction method based on the combination of a self-attention mechanism and a pointer network.
Disclosure of Invention
The invention provides a joint information extraction method based on the combination of a self-attention mechanism network and a pointer network, aiming at the problem of overlapping relation in triple extraction. The network is an end-to-end network structure, which is composed of an encoding network, a relation decoding network and a pointer network. Wherein the coding network and the decoding network are mainly implemented based on a Transformer network structure. The Transformer network is a network structure based on the autofocusing mechanism proposed by google, and its specific attentive mechanism enables the network to compute in parallel, thereby increasing training and reasoning speed (a. vaswani, n. shazer, n.parmar, j.uszkoreot, l.jones, a.n.gomez, l.u.kaiser, i.polosukhin, Attention is all you needed, in: i.guyon, u.v.luxburg, s.bengio, h.wallach, r.ferus, s.viswa, r.garnetet (Eds)), advance in Neural Processing Information, Curran Associates, system 2017, pp.5998{6008 }.6008 }. In the self-attention mechanism of the Transformer network, attention is paid to all words in a sentence when attention calculation is performed. In the task of extracting the triples, one triplet can be generally determined only by a part of a sentence, and certain dependency relationship exists among the words. Thus, when performing attention calculations, noise may be introduced if attention is calculated for all words of a sentence. In order to solve the problem, in the method, the dependency tree relationship of the sentences is taken as the prior knowledge of display to be merged into the self-attention network, so that the model can give more attention to some important related words, and the performance of the model is improved.
The invention realizes the purpose through the following technical scheme:
1. the invention discloses a sequence-to-sequence-based information extraction model which is shown in figure 1 and comprises an encoding network, a relation decoding network and a pointer network. The entity and relationship extraction of the invention comprises two processes of training and testing, and is carried out according to the following method:
(1) for a sentence to be trained, firstly, the sentence is encoded by using a Transformer encoder, and hidden layer vector output of a model is obtained and is used for representing a semantic vector of the sentence.
(2) And sending the encoded semantic vector and the right-shifted target sequence into a relational decoding network, outputting a decoding vector, and inputting the decoding vector into a full connection layer and a Softmax classification network to obtain relational category output.
(3) And sending the hidden layer vector output by the coding network and the vector output by the decoding layer into a pointer network, and decoding and outputting a corresponding entity pair.
(4) Calculating loss of the output relation class probability of the step 2 and the step 3, the probability of the entity to the initial position and the real sequence label, sending the loss to an optimizer to optimize network parameters, and storing the model after the model is converged.
(5) During testing, the stored model is loaded, the newly input sentences are subjected to relationship and corresponding entity recognition by using a network, and then corresponding triple information is output.
Specifically, in step (1), the sentence is encoded into two steps, first, the sentence is encoded by using a transform encoder, then the encoded result is sent to an encoding network based on the syntax tree guidance, and finally the outputs of the two networks are weighted and summed to obtain the output of the encoder, which is shown in fig. 2. The whole coding network consists of 4 layers, namely a word embedding layer, a multi-head attention layer, a forward feedback layer and a grammar-guided self-attention layer, and the layers are described in detail below.
1) And a word embedding layer, wherein each sentence is represented as an N multiplied by D word vector, wherein N is the number of words in the sentence, and D is the dimension of the corresponding word vector. In this invention, the word vector is composed of three parts: word-level based word-embedding, character-level based word-embedding, and position-embedding vector information. The word embedding is obtained by adopting a random vector initialization mode, and the dimensionality of the word embedding is 300. The character-level vector is obtained by encoding each character vector in the word through a one-dimensional Convolutional Neural Network (CNN), wherein the size of a filter window is 3 and the number of Convolutional kernels is 212 in the Convolutional Neural network. In character vector encoding, the maximum number of characters of a word is 10. The calculation process is as follows:
Vw=Conv1D(Vchar)
wherein VcharIs a word-embedded representation of the word characters. In the case of the character-level word-embedding, the resulting word-embedding size is 212 dimensions. Finally, the word-level embedded word representations and the character-level embedded word representations are combined to form 512-dimensional embedded word representations. The vector is further added with position vector information to obtain the coding vector representation of the sentence, and the position vector information adopts a transform position coding mode.
2) In the invention, a total of 8 attention heads are adopted, and in each attention head, an encoded representation of a word is obtained through the attention mechanism firstly, and the calculation process is as follows:
Figure BDA0002535499510000031
and after obtaining the attention mechanism of a single head, splicing and fusing all multi-head attention, and then obtaining the coding vector representation of the sentence through a full connection layer. The process is represented as follows:
m=MultiHead(Q,K,V)=concat(head1,...,headh)Wo
in order to prevent gradient disappearance, the output of the multi-head attention network further passes through a residual connection and normalization layer to obtain the output of the coding network.
hm=LayerNorm(m+x)
Where x is the encoded vector of the word-embedding and m is the output of the multi-head attention layer.
3) And the forward feedback network layer is used for further fusing the vectors passing through the multi-head attention mechanism layer. The network consists of two full connections and a ReLU activation unit, and the calculation process is as follows:
h=FFN(m)=max(0,mW1+b1)W2+b2
the output of the forward feedback vector also passes through a residual connecting and normalizing layer to obtain the output of the coding layer.
h_e=LayerNorm(m+hm)
4) The main role of the syntactic tree-guided attention layer is to blend the syntactic dependency information of sentences into the network, so that the sentences can be expressed in a better semantic mode. In specific operation, a matrix based on the syntactic dependency tree is firstly constructed, and then the matrix is fused into a network when attention mechanism calculation is carried out, so that sentences can be better represented semantically. When the syntactic dependency tree matrix is constructed, the sentence is analyzed by the syntactic dependency tree to extract the dependency relationship, and then the syntactic dependency tree matrix is constructed according to the following rules.
a) First a matrix of size N x N is constructed, where N is the number of sentence words, and then the matrix is all initialized to 0.
b) The matrix is assigned according to the syntactic dependency information of the sentence, and the assignment rule is as follows, aiming at a certain node M in the matrixij
If node j is an "ancestor" of node i in the syntactic dependency tree, then the position is assigned a value of 1, otherwise the position is assigned a value of 0,
the process is represented as follows.
Figure BDA0002535499510000032
An example of building a syntactic dependency tree is shown in FIG. 2. After the syntax dependency tree matrix is obtained, the output of the coding network is combined with the syntax dependency tree matrix to perform further multi-head attention mechanism calculation, and the calculation process is as follows:
Figure BDA0002535499510000033
h′i=A′iV′
after the dependency tree matrix is merged and the attention mechanism representation is carried out, the sentence only carries out self-attention calculation on the words with the dependency relationship, and therefore irrelevant noise information in the sentence can be removed. After the output is obtained through calculation based on the syntactic dependency network, the output is weighted and summed with the output of the original transform coding network to obtain a complete sentence semantic representation, and the formula is represented as follows:
Figure BDA0002535499510000034
wherein
Figure BDA0002535499510000035
Is a hyperparameter, and in the present invention, has a value of 0.5.
In step (2), the decoder is used to encode the encoded vector, which also includes three layers, namely a multi-head attention layer based on Mask operation, an encoding-decoding multi-head attention layer, and a forward feedback layer.
a) And a multi-head attention layer based on Mask operation, wherein the multi-head attention layer is used for coding sequence information output before the current decoding moment. In performing the attention mechanism calculation, the future information of the output sequence is prevented from being seen by multiplying by a Mask matrix M', and the process is expressed as follows. Where M' is N × N, where N is the number of words in the input sentence, and the structure is shown in fig. 2. The calculation process for this layer is represented as follows:
Figure BDA0002535499510000041
Figure BDA0002535499510000042
b) encoding-decoding a multi-head attention layer for mutual attention between the output with mask layer and the output vector of the encoded net, the calculation process is as follows:
he_d=LayerNorm(hm+MultiHead(hm,H,H)
where H is the output of the coding network and HmIs based on the output of the mask attention network.
c) And the forward feedback network layer is used for performing characteristic fusion on the output of the second layer so as to obtain better characteristic representation. The network computation process is as follows.
h′e_d=FFN(he_d)
The forward feedback network layer is connected with a residual connecting and normalizing network layer, and the calculation process is as follows:
He_d=LayerNorm(hm+MultiHead(hm,h′e_d,h′e_d)
the output of the coding network finally passes through a full connection layer and then passes through a softmax classifier to obtain the probability of the relation class, and the process is represented as follows:
Pr=softmax(He_dWo+bo)
in step (3), two identical decoders are designed, each of which is based on the structure of the multi-headed attention mechanism. The input of the decoding network is divided into the output of the coding network and the output of the relation decoding network, and the attention matrix is obtained after the two are calculated through the multi-head attention mechanism.
Figure BDA0002535499510000043
After the attention mechanism, the attention matrix is directly used as a pointer to select the boundary of the corresponding entity from the sentence. In the present invention, an entity contains a start position and an end position, because we split the multi-head attention sentence into two parts, in which the sum of the attention matrixes of the heads of the former part is used to indicate the start position of the entity, and the sum of the attention matrixes of the heads of the latter part is used to indicate the end position of the entity, which is expressed as follows.
Figure BDA0002535499510000044
Figure BDA0002535499510000045
In step (4), the training objective function is designed as follows:
Figure BDA0002535499510000046
where B is the number of each Batch, which is 64 in the present invention, and T is the maximum number of triples in a sentence, which is set to 10 in the present invention. And rtSoftmax score information expressed as true category, e1sProbability score of softmax expressed as the start position of the real subject (subject), e1eProbability score of softmax expressed as the end position of the true subject (subject), e2sProbability score of softmax expressed as start position of real object (object), e2eThe softmax probability score is expressed as the start position of the real object (object). And optimizing the network by using an Adam optimizer during model training, wherein the learning rate is 1 e-5. Meanwhile, in order to prevent the overfitting of the model, an "early stop" mechanism is also adopted, namely if in 10 consecutive epochs, if the F1 value of the network is not promoted any more, the training of the network is stopped.
Drawings
FIG. 1 is a main framework of the network model proposed by the present invention
FIG. 2 is a network structure of a coding layer
FIG. 3 is an example of syntax tree dependency matrix generation
FIG. 4 is a schematic diagram of the shape of a mask matrix
Detailed Description
The invention will be further described with reference to the accompanying drawings in which:
fig. 1 is a structure of an entire network, which is composed of a syntax guidance-based encoding network, a relationship decoding network, and a pointer network. In the coding network, an input sentence firstly passes through word-embedding with word strength and word-embedding with character strength, then the word-embedding with the character strength and the word-embedding with the character strength are combined to obtain a word vector with the dimension of 512 dimensions, and then the word vector is further added with a position coding vector to obtain the input of the network. The input vector first passes through a Transformer encoder consisting of 4 stacked layers, where each layer consists of a multi-headed attention subnetwork and a forward feedback subnetwork, each with one residual connection and a normalized connection. After Transformer coding output, the coding vector is further input into network coding which integrates syntax dependence information to obtain syntax dependence enhanced vector coding representation, and then the original output of the Transformer coding network and the output of the syntax guide network are weighted and summed to obtain the output of the network. The output will be further sent to a decoding network for decoding. The decoding network is also made up of 4 stacked layers, where each layer contains three sublayers: a multi-head attention mechanism sublayer with mask operation, an encoding-decoding multi-head attention mechanism sublayer and a forward feedback network sublayer. Firstly, the target vector after shifting is coded through a multi-head attention layer with a mask, then the target vector and the coding network output are subjected to coding-decoding multi-head attention mechanism operation to obtain output vector information, and the output vector information is further subjected to a forward feedback network to obtain decoding output. And finally, decoding and outputting the relation class probability information after passing through a full connection layer and a softmax classifier. And finally, the output of the coding network and the output of the decoding network are sent to the pointer decoding network together to obtain the boundary information of the entity, and finally, corresponding triple information is extracted according to the outputs of the relation decoding network and the pointer decoding network.
Fig. 2 shows two steps of a coding network, where an input sentence vector is first output through a transform encoder, then further coded through a coding network guided by syntax to obtain a coded output with enhanced syntax, and finally the two are weighted and summed to obtain a final coded output.
FIG. 3 is a schematic diagram of the process of generating a syntactic dependency tree matrix, first constructing an N matrix, where N is the number of words in a sentence and all values are set to 0, and then setting the dependent positions to 1 according to the syntactic dependency tree, with the specific rule that for position (i, j), if the word j is an "ancestor" of the word i in the syntactic dependency tree, then the position is set to 1, otherwise the position is set to 0.
Fig. 4 is a schematic diagram of a mask matrix, which has a size of M × M, where M is the length of the target sequence, and in this matrix, a diagonal line, i.e., a part below the diagonal line, is 1, and the other parts have values of 0.
Table 1 shows the experimental results of the invention on NYT24, NYT29 and WebNLG data sets, and experiments show that the comprehensive evaluation index F of the proposed model is compared with the best existing model1The values gave the best results.
TABLE 1 Experimental comparison of the network model of the present invention on NYT24, NYT29 and WebNLG datasets with other existing models
Figure BDA0002535499510000061
Table 2 is some examples of experimental procedures in which the inventive method can also output overlapping triplet sequence information for some complex scenarios.
Table 2 some practical results of the invention on validation of data sets
Figure BDA0002535499510000062
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the technical solutions of the present invention, so long as the technical solutions can be realized on the basis of the above embodiments without creative efforts, which should be considered to fall within the protection scope of the patent of the present invention.

Claims (5)

1. An end-to-end entity and relation joint extraction method based on sequence-to-sequence is characterized by comprising the following steps:
(1) for a sentence to be trained, firstly, coding the sentence by using a coder with grammar dependence enhancement to obtain hidden layer vector output of a model for representing a semantic vector of the sentence;
(2) sending the encoded semantic vector and the right-shifted target sequence into a relational decoding network, outputting a decoding vector, and inputting the decoding vector into a full connection layer and a Softmax classification network to obtain relational category output;
(3) sending the hidden layer vector output by the coding network and the vector output by the decoding layer into a pointer network, and decoding and outputting a corresponding entity pair;
(4) calculating loss of the relation category probability and the probability of the entity to the initial position output in the step 2 and the step 3 and the real sequence label, sending the loss into an optimizer to optimize network parameters, and storing the model after the model is converged;
(5) and during application, loading the stored model, then carrying out relationship and corresponding entity identification on the newly input sentence, and outputting corresponding triple information.
2. The entity and relationship joint extraction method of claim 1, wherein: each sentence is represented as an N × D word vector, where N is the number of words in the sentence and D is the dimension of the corresponding word vector; in this invention, the word vector is composed of three parts: word embedding based on word level, word embedding based on character level and position vector information, wherein the word embedding is obtained by adopting a random vector initialization mode, and the dimensionality of the word embedding is 300; the vector of the character level is obtained by encoding each character vector in the word through a one-dimensional Convolutional Neural Network (CNN), wherein in the one-dimensional Convolutional Neural network, the size of a filtering window is 3, and the number of Convolutional kernels is 212; in character vector encoding, the maximum number of characters of a word is 10, and the calculation process is as follows:
Vw=Conv1D(Vchar)
wherein VcharIs the word-embedded expression of the characters in the words, and the size of the obtained word-embedded expression is 212 dimensions in the word-level word-embedded expression; finally, word and word embedding of characters are connected to form 512-dimensional vector representation, the vector representation is further added with a position coding vector to obtain final vector representation, and the position vector coding uses a position coding method of an original Transformer coder.
3. The entity and relationship joint extraction method of claim 1, wherein: after a sentence is coded by a transform, the sentence is input into a coding network of a merged syntax dependency tree to obtain a representation of the merged syntax dependency, wherein a syntax dependency tree matrix construction process is as follows:
a) firstly, constructing a matrix with the size of N multiplied by N, wherein N is the number of words in a sentence, and then initializing all the matrixes to be 0;
b) the matrix is assigned according to the syntactic dependency information of the sentence, and the assignment rule is as follows, aiming at a certain node M in the matrixijIf node j is an "ancestor" of node i on the syntactic dependency tree, then the position is assigned a value of 1, otherwise the position is assigned a value of 0, and the process is represented as follows:
Figure FDA0002535499500000011
after obtaining the syntactic dependency tree matrix, the output of the coding network is further calculated by a multi-head attention mechanism, and the calculation process is as follows:
Figure FDA0002535499500000012
h′i=A′iV'
after the dependency tree matrix is merged into the output of the forward feedback network, when the attention mechanism calculation is carried out, the sentence only carries out self-attention calculation on words with dependency relationship with the dependency tree matrix, so that irrelevant noise information in the sentence can be removed; after the output is obtained through calculation based on a syntactic dependency network, the output is weighted and summed with the output of an original Transformer coding network to obtain a complete sentence semantic representation, and the process is represented as follows:
Figure FDA0002535499500000021
wherein
Figure FDA0002535499500000022
Is a hyperparameter, and in the present invention, has a value of 0.5.
4. The entity and relationship joint extraction method of claim 3, wherein: two identical pointer decoding networks are designed, wherein each pointer decoding network is realized based on a multi-head attention mechanism, the input of the decoding networks is the output of the coding network and the output of the relation decoding network, and the attention moment array calculation process comprises the following steps:
Figure FDA0002535499500000023
after the attention mechanism, the attention moment array is directly used as a pointer to select a corresponding entity boundary from the sentence; in the present invention, an entity contains two bit positions, so we split the multi-head attention average into two parts, where the sum of the attention matrixes of the head of the former part is used to indicate the start position of the entity, and the sum of the attention matrixes of the head of the latter part is used to indicate the end position of the entity, and the process is shown as follows.
Figure FDA0002535499500000024
Figure FDA0002535499500000025
5. In training the joint information extraction network, the objective function is defined as follows:
Figure FDA0002535499500000026
where B is the number of each Batch, which in the present invention is 6; t is the maximum value of the number of the triples in a sentence, and the value is set to be 10 in the invention; and rtProbability score of softmax expressed as a relation category, e1sProbability score of softmax expressed as subject's (subject) start position, e1eProbability score of softmax expressed as subject (subject) end position, e2sProbability score of softmax expressed as object start position, e2eA softmax probability score expressed as an object (object) end position; and optimizing the network by using an Adam optimizer during model training, wherein the learning rate is 1 e-5.
CN202010531196.2A 2020-06-11 2020-06-11 Sequence-to-sequence-based end-to-end entity and relationship joint extraction method Active CN113807079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010531196.2A CN113807079B (en) 2020-06-11 2020-06-11 Sequence-to-sequence-based end-to-end entity and relationship joint extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010531196.2A CN113807079B (en) 2020-06-11 2020-06-11 Sequence-to-sequence-based end-to-end entity and relationship joint extraction method

Publications (2)

Publication Number Publication Date
CN113807079A true CN113807079A (en) 2021-12-17
CN113807079B CN113807079B (en) 2023-06-23

Family

ID=78892005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010531196.2A Active CN113807079B (en) 2020-06-11 2020-06-11 Sequence-to-sequence-based end-to-end entity and relationship joint extraction method

Country Status (1)

Country Link
CN (1) CN113807079B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064938A (en) * 2022-01-17 2022-02-18 中国人民解放军总医院 Medical literature relation extraction method and device, electronic equipment and storage medium
CN115659986A (en) * 2022-12-13 2023-01-31 南京邮电大学 Entity relation extraction method for diabetes text

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024476A1 (en) * 2012-01-05 2017-01-26 Yewno, Inc. Information network with linked information nodes
CN109165385A (en) * 2018-08-29 2019-01-08 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model
CN109408812A (en) * 2018-09-30 2019-03-01 北京工业大学 A method of the sequence labelling joint based on attention mechanism extracts entity relationship
CN109492113A (en) * 2018-11-05 2019-03-19 扬州大学 Entity and relation combined extraction method for software defect knowledge
CN110472235A (en) * 2019-07-22 2019-11-19 北京航天云路有限公司 A kind of end-to-end entity relationship joint abstracting method towards Chinese text
CN110781683A (en) * 2019-11-04 2020-02-11 河海大学 Entity relation joint extraction method
CN111241294A (en) * 2019-12-31 2020-06-05 中国地质大学(武汉) Graph convolution network relation extraction method based on dependency analysis and key words

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024476A1 (en) * 2012-01-05 2017-01-26 Yewno, Inc. Information network with linked information nodes
CN109165385A (en) * 2018-08-29 2019-01-08 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model
CN109408812A (en) * 2018-09-30 2019-03-01 北京工业大学 A method of the sequence labelling joint based on attention mechanism extracts entity relationship
CN109492113A (en) * 2018-11-05 2019-03-19 扬州大学 Entity and relation combined extraction method for software defect knowledge
CN110472235A (en) * 2019-07-22 2019-11-19 北京航天云路有限公司 A kind of end-to-end entity relationship joint abstracting method towards Chinese text
CN110781683A (en) * 2019-11-04 2020-02-11 河海大学 Entity relation joint extraction method
CN111241294A (en) * 2019-12-31 2020-06-05 中国地质大学(武汉) Graph convolution network relation extraction method based on dependency analysis and key words

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064938A (en) * 2022-01-17 2022-02-18 中国人民解放军总医院 Medical literature relation extraction method and device, electronic equipment and storage medium
CN114064938B (en) * 2022-01-17 2022-04-22 中国人民解放军总医院 Medical literature relation extraction method and device, electronic equipment and storage medium
CN115659986A (en) * 2022-12-13 2023-01-31 南京邮电大学 Entity relation extraction method for diabetes text

Also Published As

Publication number Publication date
CN113807079B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN110781680B (en) Semantic similarity matching method based on twin network and multi-head attention mechanism
CN109840287B (en) Cross-modal information retrieval method and device based on neural network
CN111368996B (en) Retraining projection network capable of transmitting natural language representation
CN113239700A (en) Text semantic matching device, system, method and storage medium for improving BERT
CN108519890A (en) A kind of robustness code abstraction generating method based on from attention mechanism
CN110619034A (en) Text keyword generation method based on Transformer model
CN112989796B (en) Text naming entity information identification method based on syntactic guidance
CN112818676A (en) Medical entity relationship joint extraction method
CN113297364A (en) Natural language understanding method and device for dialog system
CN113408430B (en) Image Chinese description system and method based on multi-level strategy and deep reinforcement learning framework
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
CN111611346A (en) Text matching method and device based on dynamic semantic coding and double attention
CN111914553B (en) Financial information negative main body judging method based on machine learning
CN111382574A (en) Semantic parsing system combining syntax under virtual reality and augmented reality scenes
CN113705196A (en) Chinese open information extraction method and device based on graph neural network
CN113807079B (en) Sequence-to-sequence-based end-to-end entity and relationship joint extraction method
CN114510946B (en) Deep neural network-based Chinese named entity recognition method and system
CN116226357B (en) Document retrieval method under input containing error information
CN117033423A (en) SQL generating method for injecting optimal mode item and historical interaction information
CN116629361A (en) Knowledge reasoning method based on ontology learning and attention mechanism
CN116521857A (en) Method and device for abstracting multi-text answer abstract of question driven abstraction based on graphic enhancement
CN116680407A (en) Knowledge graph construction method and device
CN110929006A (en) Data type question-answering system
CN116258147A (en) Multimode comment emotion analysis method and system based on heterogram convolution
CN115203388A (en) Machine reading understanding method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant