CN113761893A - Relation extraction method based on mode pre-training - Google Patents
Relation extraction method based on mode pre-training Download PDFInfo
- Publication number
- CN113761893A CN113761893A CN202111331381.8A CN202111331381A CN113761893A CN 113761893 A CN113761893 A CN 113761893A CN 202111331381 A CN202111331381 A CN 202111331381A CN 113761893 A CN113761893 A CN 113761893A
- Authority
- CN
- China
- Prior art keywords
- training
- entity
- text
- model
- dependency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 114
- 238000000605 extraction Methods 0.000 title claims abstract description 29
- 239000011159 matrix material Substances 0.000 claims abstract description 38
- 230000007246 mechanism Effects 0.000 claims abstract description 30
- 238000004458 analytical method Methods 0.000 claims abstract description 23
- 239000013598 vector Substances 0.000 claims description 55
- 230000006870 function Effects 0.000 claims description 45
- 238000000034 method Methods 0.000 claims description 40
- 238000002372 labelling Methods 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 9
- 239000003550 marker Substances 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000011423 initialization method Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The scheme discloses a mode pre-training relationship extraction method, which is mainly used for solving the problems of entity boundary determination and relationship distribution imbalance in entity relationship joint extraction. The scheme constructs a mode for pre-training the model, and the model is modeled in an encoding-decoding mode. The column pre-trains the encoding part and the decoding part. The coded portion is pre-trained with the associated entity data to obtain information that predicts the entity boundaries. The decoding part is pre-trained by relevant relational data, a dependency tree is constructed by using syntactic dependency analysis, and part of information is shielded according to a mask mechanism in an attention mechanism through an adjacent matrix of the dependency tree. And finally, updating the parameters in the whole frame in a supervised learning mode, and finally learning a relationship extraction model with strong adaptability and strong expression capability.
Description
Technical Field
The invention relates to a relation extraction method based on mode pre-training, and belongs to the technical field of information extraction.
Background
With the rapid development and popularization of computers and the internet, the amount of data created by humans is on a high-speed growth trend. In this information explosion age, how to analyze and process information rapidly and extract valuable information from text becomes a research hotspot and an urgent problem to be solved. In response to these challenges, it is imperative to develop an automated information processing tool that automatically and quickly extracts valuable knowledge from a vast amount of information. In this context, Information Extraction (IE) technology has become a hot content of academic and industrial research, and the purpose of Information Extraction is to extract specific and valuable Information from semi-structured and unstructured texts and structured data, and process the extracted Information into structured data that can be easily stored and calculated by computers, and the Information Extraction includes Entity identification (Entity Recognition), relationship Extraction (relationship Extraction) and Event Extraction (Event Extraction).
As one of important tasks of information extraction, the relation extraction aims at extracting semantic relations contained between two entities from a text, and has wide application value in the fields of mass data processing, automatic construction of knowledge bases, automatic question answering and the like; for example, the large-scale Knowledge base systems constructed by automatically processing large-scale Web texts through a relational extraction technology can be applied to search engine optimization, for example, Knowledge Vault constructed by Google with more than 16 hundred million fact data is applied to improvement of search results, and the user experience effect is improved. Meanwhile, the relationship extraction technology provides infrastructure in other tasks in the natural language field, such as entity linking, automatic summarization, emotion analysis and the like.
The idea of using a pre-training method to improve the performance of a model has been widely applied in the field of deep learning. Natural language processing researchers have proposed methods of pre-training language models based on predecessors, the most well-known of which are BERT, GPT, ELMO, and the like. According to the different problems, the aim of the pre-training model is divided into two categories, including the direction facing language coding and the direction facing model vector space. The direction of language-oriented coding mainly solves the problem of coding a text, namely how to code a section of text, for example, a deep learning convolutional neural network codes the text by using vector representation of pre-trained words. The model-oriented vector space mainly solves the problems of long training time and high cost in deep learning by adjusting model parameters to the problem of easy learning, and introduces some information to adjust the parameters of the model to the position of easy convergence to an optimal solution, including introducing artificially labeled information or additional information contained in an entity, such as entity category and the like. It is noted that the above two types of approaches are not mutually exclusive and can be used in common.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an entity-relationship joint extraction method based on mode pre-training, which comprises the steps of constructing a pre-training mode by introducing a pre-training structure, training the capability of a model for positioning an entity by using an entity marking task, and extracting the relationship between two entities by an attention mechanism.
The scheme discloses a body-relation combined extraction method based on mode pre-training. The scheme provides a method for supervising learning, and triples of < entity h, relation and entity t > can be mined from a text. For each text segment, firstly pre-training a model through a pre-training network, wherein a pre-training task comprises an encoding part and a decoding part; the coding part enhances the capability of searching the entity through the entity data; the decoding section enhances the weights of the entities in the relationship extraction by syntactic dependency analysis. The pre-training is carried out by using the steps, then the pre-training structure is removed, and the formal data is used for training, so that the probability distribution of the relationship between the entity pairs in the text is obtained.
In order to achieve the purpose, the technical scheme of the invention is as follows: a relation extraction method based on mode pre-training comprises the following specific steps:
step 1) pre-training in an encoding stage, constructing a pre-training network for acquiring information of an entity position, and representing related information by labeling a text, which mainly comprises the following steps:
a) marking entity position information, namely the head, the interior and the tail of the entity, and identifying various entities to provide information for the model;
b) the expectation of pre-training the coding part model is to obtain the position information of an entity, wherein the specification is that the head character of the entity represents the information of the whole entity, a BERT pre-training language model is used for coding a text, and pre-training is carried out based on the coding mechanism so that the model can obtain all the information;
c) the encoding part is pre-trained, and specifically, the pre-training in the encoding part is a multi-training task of word prediction and boundary prediction, that is, for a piece of text, given a character, the model predicts the head and tail positions of an entity where the character is located, and predicts the content of the whole entity (if the character is the head of the entity). In actual operation, a neural network is added to a BERT coding result, probability mapping is carried out, and BIO labels corresponding to sequences are predicted.
Step 2) pre-training in a decoding stage, enhancing information of an entity through a mask matrix of an attention mechanism, obtaining pre-trained data by using a syntax dependency tree, and generating the mask matrix, wherein the method mainly comprises the following steps:
a) in the SDP (shortest dependency path) of the entity-related text, the shortest dependency path between entities is found by performing word segmentation, part of speech tagging and dependency syntax analysis on the text containing the target entity, and the shortest dependency path contains all semantic information between the entities in the text, so that syntactic dependency analysis is performed on a large number of linguistic data by using the syntactic dependency analysis, and the dependency tree is sampled according to the constructed dependency tree to obtain the relationships of some major predicates and verb guest types as the training data of a decoding part;
b) for an original text, a syntax dependency tree is constructed by using a syntax dependency analysis tool to obtain a corresponding adjacent matrix, and the matrix is used as a mask matrix of a decoding part attention mechanism;
c) applying a mask mechanism in the attribute to mask vectors, wherein in a syntactic dependency tree, a node is regarded as an entity, the importance of the entity in the overall task is highlighted in such a way, the capability of the model for detecting the entity is enhanced, and the two vectors are subjected to weighted summation to obtain the vector for decoding;
d) the prediction result of the decoding part is represented by a label representation method in the form of a matrix, the elements represented by the rows and the columns of the matrix are the same and are all entities corresponding to the input text, and the relationship between the two entities is represented by the values of the matrix, obviously, the matrix is a symmetrical matrix;
e) and adding a prediction structure of the predicate-object triple behind the transform part to finish training.
Step 3) training the model by using formal data:
in formal training, the network in the pre-training stage is removed, the network is trained in a mini-batch gradient descending mode, and parameters are updated. The method comprises the steps of abandoning a full connection layer added in a pre-training process, directly using a BERT model which is pre-trained to encode a text, inputting the text to a transform part of a decoding layer, modeling a triple into a mapping from a subject to a predicate, and normalizing an output node value through a softmax function to obtain the probability distribution of the text-to-relation.
Wherein, in step 1) a) additional information needs to be obtained in different ways.
1-1) firstly, carrying out word segmentation on a text original text by using a related tool, and obtaining entity marking data by a manual fine adjustment method, wherein the entity marking data comprises the category of an entity and the position of the entity. And performing word segmentation, part of speech tagging and dependency syntactic analysis on the text containing the entity pairs. In the dependency syntax tree generated by the dependency syntax analysis, the shortest link between two entities is found. The words and edges on this shortest connected path will be the shortest dependent path of the entity pair in the text. In this way, the relationship of the main and the subordinate is constructed.
1-2) pre-training a word vector model. Word vectors in the data set are trained in advance using BERT or GPT approaches. If not, the method of the scheme keeps synchronous training with the model parameters. But in effect, word vectors trained on large-scale predictions in advance can hold more semantic and grammatical information.
Training a pre-training language model in a coding stage by using the constructed entity data in the step 1):
in the boundary prediction task, mapping the pre-training language model, and performing the following steps by a Bi-LSTM sublayer:
and passing the obtained mapping result through a CRF layer:
in the word prediction task, 15% of tokens in an entity are masked (Mask), and are directly replaced by Mask with 80% probability, and replaced by any other word with 10% probability, and the original Token is retained with 10% probability.
Step 1) c) obtaining a series of vectors for the text, represented as feature matrices, by b) in step 1). Wherein,is a feature vector representation of all nodes of the input, the dimensions are,Is the number of the nodes that are present,is the dimension of the vector representation of each node.
And performing dependency analysis on the original text by using a syntactic dependency tool to construct a syntactic dependency tree, and obtaining an adjacency matrix M of the text, wherein the dimensionality is N x N. By performing operation through a multihead-attribute mechanism, a mask mechanism in a transform can shield entities which are not 1 in a matrix, highlight information of related entities, and perform pre-training on a transform layer, which is specifically as follows:
whereinIn order to be a mask matrix, the method comprises the following steps of,a mask operator representing the type of attention,for the three vectors of the attention mechanism,is a vectorThe dimension (c) of (a) is,represents the firstA head.
In step 2) a), the output of step 1) is a matrix Z of N × F, Z being represented as a sequence Z = { Z0, Z1.., ZN }, each node being a vector of dimension F. And then, calculating the vector representation of each node according to a weighted average mode to obtain the final vector representation. Combining the results of each calculation, through a feedforward neural network sublayer:
the final obtained vector is expressed as follows:
It is specified here that a relational model is a function, the parameters of which are the subject, the result is the object, and the probability that the ith character is the starting position and the ending position of the subject is:
whereinIs a function of the Sigmod and,is a parameter that is trainable,is as followsA word vector of locations.
whereinIs a vector representation of the kth subject,is a function of the Sigmod and,are trainable parameters.
Step 2) training through the data generated in step 1), wherein the specific loss function is as follows:
and an encoding part:
a decoding part:
The training function is represented as:
whereinFor inputting the length of the text whenTaking 1 when true, taking 0 when false,the value of the marker for the ith token is 0 or 1,as a parameter;
whereinIn order to input the length of the text,taking 1 when true, taking 0 when false,the value of the marker for the ith token is 0 or 1,as a parameter
the overall loss function can be expressed as:
And 3) completing the pre-training of the model through the previous steps, removing the pre-trained network, and training by using formal data:
and removing word prediction codes of the pre-training part, and training by using formal data by using the network after the pre-training is finished. By modeling the code from the decoding layer as a function with the parameters subject and the result object as defined above, the probability that it is the starting and ending position of the subject for the ith character is:
The probability of the position of the object is:
whereinIs a vector representation of the kth subject,is a function of the Sigmod and,are trainable parameters.
Wherein the specific loss function is as follows:
and an encoding part:
whereinFor the purpose of the corresponding text sequence,in order for the sequence of the corresponding labels,
a decoding part:
whereinIn order to input the length of the text,taking 1 when true, taking 0 when false,the value of the marker for the ith token is 0 or 1,as a parameter;
whereinIn order to input the length of the text,taking 1 when true, taking 0 when false,the value of the marker for the ith token is 0 or 1,as a parameter
the overall loss function can be expressed as:
Compared with the prior art, the invention has the following advantages:
1) the method introduces additional corpora containing entity category, entity representation, entity context and text path information, and can improve the coding capacity of the model compared with the traditional representation-based method and rule-based method.
2) Compared with the method for introducing the additional information, the method of the scheme introduces the additional information into the pre-training language model by pre-training in the coding layer. The method has good adaptability, can be applied to the task, and can also apply the model which is trained in advance to other problems. Secondly, multiple kinds of extra information act on simultaneously, and the problem of insufficient coverage rate of single information can be avoided.
3) The method customizes a reasonable initialization method for all extra information, uses a scheme based on word segmentation and syntactic dependency analysis on entity and relation embedding, adopts a method based on attention mechanism on relation extraction, and can fully utilize semantic information so as to achieve better effect.
4) The method adopts the SDP (shortest dependent path) of the text to introduce an attention mechanism for operation, firstly reserves the original semantics of the text, adds the dependency syntactic characteristic and highlights the importance of the entity in text coding. The weighted sum mode retains the original semantic meaning of the text and adds the dependency syntactic characteristics, which cannot be achieved by the traditional text coding mode.
5) The scheme compresses and screens information by using an attention mechanism on a graph, and the method has the characteristics of high efficiency and strong expression capability. The attention mechanism can effectively mine the association between the entities, can integrate high-order logic expression of an inference chain into the feature representation, and well utilizes the summary experience of human beings.
6) The scheme introduces an attention mechanism and syntax dependency analysis at a decoding layer to perform further pre-training and feature extraction work. In the text, various information introduced in the encoding stage may have a lot of noises, such as entity category identification errors, or a text path introduces irrelevant text, and the information does not have a great effect on extracting relevant entities, so that a relationship data representation generated by syntactic dependency analysis is used as supervision information in a way of attention mechanism, the relevance is calculated for each character, the character features with low relevance are shielded by using mask of attention mechanism, the nodes more prone to the entities are reserved, and the original encoding result is subjected to weighted summation, so that after weighting, the features can be further compressed, and the noises are also filtered.
7) In the scheme, different entity pairs and different additional information have different contribution degrees to the relation extraction task, and the information is filtered by using a mask mechanism of an attention mechanism. The method has the advantages that semantic deviation caused by partial information is avoided, and the model automatically selects the most favorable information of the current entity to carry out relation classification.
Drawings
FIG. 1 is a schematic overall framework diagram of the present solution;
FIG. 2 is a schematic diagram of the encoding part of the pre-training in the present scheme;
fig. 3 is a schematic diagram of formal training in the present scheme.
Detailed Description
The following detailed description of the embodiments of the invention is provided in connection with the accompanying drawings.
Example 1: referring to fig. 1-3, the present invention is a method for extracting remote supervised relationship of mode pre-training, comprising the following steps:
step 1) pre-training in an encoding stage, constructing a pre-training network for acquiring information of an entity position, and representing related information by labeling a text, which mainly comprises the following steps:
1-1) firstly, carrying out word segmentation on original text of a text by using a related tool, and obtaining entity marking data by a manual fine adjustment method, wherein the entity marking data comprises the category of an entity and the position of the entity. And performing word segmentation, part of speech tagging and dependency syntactic analysis on the text containing the entity pairs. In the dependency syntax tree generated by the dependency syntax analysis, the shortest link between two entities is found. The words and edges on this shortest connected path will be the shortest dependent path of the entity pair in the text. In this way, the relationship of the main and the subordinate is constructed.
1-2) pre-training a word vector model. Word vectors in the data set are trained in advance using BERT or GPT approaches. If not, the method of the scheme keeps synchronous training with the model parameters. But in effect, word vectors trained on large-scale predictions in advance can hold more semantic and grammatical information.
2-1) in the boundary prediction task, mapping the pre-training language model, and mapping the vector through BilSTM:
Sequence annotation prediction using CRF (conditional random field):
and obtaining the result of sequence annotation.
2-2) in the word prediction task, the Token of 15% of the entity in the input text is hidden (Mask), and is directly replaced by Mask with a probability of 80%, and replaced by any other word with a probability of 10%, and the original Token is retained with a probability of 10%. A corpus is repeatedly provided to the model in this way for training.
Step 2) pre-training in a decoding stage, enhancing information of an entity through a mask matrix of an attention mechanism, obtaining pre-trained data by using a syntax dependency tree, and generating the mask matrix, wherein the method mainly comprises the following steps:
a series of vectors for the text is obtained by step 1), represented as a feature matrix X. Wherein, X is the feature vector representation of all the input nodes, the dimension is N X D, N is the number of the nodes, and D is the dimension of the vector representation of each node.
And performing dependency analysis on the original text by using a syntactic dependency tool to construct a syntactic dependency tree, and obtaining an adjacency matrix M of the text, wherein the dimensionality is N x N. By performing operation through a multihead-attribute mechanism, a mask mechanism in a transform can shield entities which are not 1 in a matrix, highlight information of related entities, and perform pre-training on a transform layer, which is specifically as follows:
wherein,in order to be a mask matrix, the method comprises the following steps of,a mask operator representing the type of attention,for the three vectors of the attention mechanism,is a vectorThe dimension (c) of (a) is,represents the firstA head.
The output of the above steps is a matrix Z of N x F. Denote Z as a sequence of nodes Z = { Z0, Z1.., ZN }, each node being a vector of dimension F. And then, calculating the vector representation of each node according to a weighted average mode to obtain the final vector representation. Combining the results of each calculation, through a feedforward neural network sublayer:
The final obtained vector is expressed as follows:
A relational model is modeled as a function with the parameters subject and the result object, and the probability of the starting and ending positions of the subject for the ith character is:
whereinIs a function of the Sigmod and,is a parameter that is trainable,is as followsWord vectors of individual positions.
The probability of the position of the object is:
whereinIs a vector representation of the kth subject,is a function of the Sigmod and,are trainable parameters.
Wherein the specific loss function is as follows:
and an encoding part:
the training function in the decoding part is represented as:
wherein,in order to input the length of the text,taking 1 when true, taking 0 when false,the value of the marker for the ith token is 0 or 1,as a parameter;
whereinIn order to input the length of the text,taking 1 when true, taking 0 when false,the value of the marker for the ith token is 0 or 1,as a parameter
the overall loss function can be expressed as:
Step 3) training the model by using formal data;
in formal training, the full-link layer added in the pre-training process is abandoned. Directly coding a text by using a pre-trained BERT model, inputting the text into a transform part of a decoding layer, then mapping the text to an output layer through a full connection layer, and then normalizing an output node value through a softmax function to obtain probability distribution of a text pair relation; through the previous steps, the model is pre-trained, the pre-trained network is removed, and formal data are used for training:
and removing word prediction codes of the pre-training part, and training by using formal data by using the network after the pre-training is finished. The output result is obtained by inputting the code obtained from the decoding layer into a feedforward neural network and predicting the probability by an activation function.
During the training phase, the loss function is as follows:
and (3) defining reference step (2) by specific parameters.
The task of the training phase is to formally train the model, so that training of the code part is not involved here.
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent substitutions or substitutions made on the basis of the above-mentioned technical solutions belong to the scope of the present invention.
Claims (8)
1. A relation extraction method based on mode pre-training comprises the following steps:
step 1) pre-training in an encoding stage, constructing a pre-training network for acquiring information of an entity position, and representing related information by labeling a text, which mainly comprises the following steps:
marking entity position information, namely the head, the interior and the tail of the entity, and identifying various entities to provide information for the model;
the expectation of pre-training the coding part model is to obtain the position information of an entity, the head character of the entity represents the information of the whole entity, a BERT pre-training language model is used for coding a text, and the model is pre-trained on the basis of the coding mechanism so as to enable the model to obtain the text information;
pre-training a coding part, wherein the pre-training of the coding part is a multi-training task, namely word prediction and boundary prediction, namely, for a section of text, a character is given, results are added to a neural network through BERT coding, probability mapping is carried out, and BIO labels corresponding to prediction sequences are marked;
step 2) pre-training in a decoding stage, enhancing information of an entity through a mask matrix of an attention mechanism, obtaining pre-trained data by using a syntax dependency tree, and generating the mask matrix, wherein the method mainly comprises the following steps:
the SDP (short dependency path) of the entity-related text is the shortest dependency path, the shortest dependency path between the entities is found by performing word segmentation, part of speech tagging and dependency syntax analysis on the text containing the target entity, and the shortest dependency path contains all semantic information between the entities in the text, so that syntactic dependency analysis is performed on a large number of linguistic data by using the syntactic dependency analysis, and the dependency tree is sampled according to the constructed dependency tree to obtain the relationships of some major predicates and the verb guest types as the training data of a decoding part;
for an original text, a syntax dependency tree is constructed by using a syntax dependency analysis tool to obtain a corresponding adjacent matrix, and the matrix is used as a mask matrix of a decoding part attention mechanism;
applying a mask mechanism in the attribute to mask vectors, wherein in a syntactic dependency tree, a node is regarded as an entity, the importance of the entity in the overall task is highlighted in such a way, the capability of the model for detecting the entity is enhanced, and the two vectors are subjected to weighted summation to obtain the vector for decoding;
the prediction result of the decoding part is represented by a label representation method in the form of a matrix, the elements represented by the rows and the columns of the matrix are the same and are all entities corresponding to the input text, and the relationship between the two entities is represented by the values of the matrix, obviously, the matrix is a symmetrical matrix;
adding a prediction structure of a main and predicate element triple behind a transform part to finish training;
and 3) training the model by using formal data, wherein in the formal training, a full connection layer added in the pre-training process is abandoned, the text is directly encoded by using the pre-trained BERT model and is input to a transform part of a decoding layer, then the triple is modeled into the mapping from a subject to a predicate, and then the output node value is normalized by a softmax function to obtain the probability distribution of the text-to-relation.
2. The method for extracting relationship based on pattern pre-training as claimed in claim 1, wherein the step of obtaining the sequence labeling information of the text in step 1) is as follows:
1-1) firstly, segmenting a section of original text by using a related tool to acquire entity position information contained in the original text, specifically, using a character "B" to represent the head of an entity, using a character "I" to represent the middle part of the entity, and using a character "O" to represent that the character does not belong to any entity;
1-2) pre-training a word vector model, acquiring the pre-trained model in advance by using a BERT or GPT mode,
the step of obtaining the sequence labeling information of the text in the step 2) is as follows:
2-1) performing word segmentation and dependency syntax analysis on a text containing entity pairs, finding the shortest connection path between two entities in a dependency syntax tree generated by the dependency syntax analysis, wherein words and edges on the shortest connection path are used as the shortest dependency paths of the entity pairs in the text;
2-2) the dependency syntax tree contains multiple types of edges, and one or more types of paths are selected.
3. The pattern pre-training based relationship extraction method of claim 1, wherein step 1) pre-trains the BERT pre-trained language model as follows:
3-1) pre-training a BERT model to obtain vectorized representation of a text, obtaining vectorized representation of each entity including position information and semantic information through the BERT, wherein the BERT is a pre-training language model based on a transformer, a vector of the BERT is composed of three parts, token embedding, segmentation embedding and position embedding are carried out, and vectorized representation containing the semantic information is obtained through the BERT;
3-2) performing further pre-training on the existing BERT Model, wherein the task is word prediction (Masked Language Model), a given Model is required to have a character, the Model judges whether the given Model is the first character of an entity or not, and the full text of the word where the character is located is predicted;
3-3) performing further pre-training on the BERT based on the steps, wherein the task is boundary prediction, a character of a model is required to be given, the model judges whether the character belongs to an entity according to the vector of the character, and the boundary of the entity where the character is located is predicted;
3-4) modeling is carried out based on the tasks in the steps, the tasks are multi-task training of word prediction and boundary prediction, in the word prediction task, part of words are randomly covered, and the words are replaced according to the specified probability;
3-5) modeling is carried out based on the tasks of the steps, the tasks are multitask training of word prediction and boundary prediction, in the boundary prediction task, entity boundary prediction is carried out based on obtained word codes, and vectors are mapped by using BilSTM:
sequence labeling prediction using a CRF conditional random field:
4. The pattern pre-training based relationship extraction method according to claim 1, wherein the step 2) models the decoding part as follows:
4-1) constructing a syntactic tree for the processed sequence through syntactic dependency analysis to obtain an adjacency matrix between entities as a transform's mask matrix,
4-2) based on the vectorized expression of the text obtained in the step 3) and the mask matrix in the step 4-1), performing operation through a multihead-attention mechanism, wherein the mask mechanism in the transformer can shield entities which are not 1 in the matrix, highlight information of related entities, and pre-train a transform layer, and the method comprises the following specific steps:
wherein,in order to be a mask matrix, the method comprises the following steps of,a mask operator representing the type of attention,for the three vectors of the attention mechanism,is a vectorThe dimension (c) of (a) is,represents the firstA head;
combining the results of each calculation, through a feedforward neural network sublayer:
the final obtained vector is expressed as follows:
4-3) expressing the predicted relation by using an x matrix taking the entity sequence as rows and columns, and mapping the vector obtained in the 4-2) through a feedforward neural network and an activation function to obtain a predicted relation result.
5. The method for extracting relationship based on pattern pre-training as claimed in claim 1, wherein the step 3) is as follows:
removing the networks added in the pre-training, namely word prediction and boundary prediction of the coding layer and relation prediction of the decoding layer, inputting formal linguistic data for training, specifically, decoding a vector through a decoding layer transformer, and expressing the obtained vector as follows:
6. The method for extracting relationship based on pattern pre-training as claimed in claim 1, wherein the step 2) is specifically as follows:
a relationship is modeled as a function with parameters as the subject and the result as the object, and for the ith character, the probability of being the starting and ending positions of the subject is:
whereinIs a function of the Sigmod and,andis a parameter that is trainable,is as followsA word vector of individual positions;
in a modeling manner, the probability of the position of the object is:
7. The pattern pre-training based relationship extraction method according to claim 1, wherein the training function of step 3) is as follows: and an encoding part:
whereinFor the purpose of the corresponding text sequence,for the corresponding labeled sequence, the training function in the decoding part is expressed as:
whereinIn order to input the length of the text,taking 1 when true, taking 0 when false,the value of the marker for the ith token is 0 or 1,as a parameter;
whereinIn order to input the length of the text,taking 1 when true, taking 0 when false,the value of the marker for the ith token is 0 or 1,as a parameter
the overall loss function can be expressed as:
8. The method for extracting relationship based on pattern pre-training as claimed in claim 7, wherein the training process of step 3) is as follows:
the method abandons a pre-trained word prediction and boundary prediction part, and re-initializes modeling of the triples, and specifically comprises the following steps: for the ith character, the probability that it is the starting and ending positions of the subject is:
whereinIs a function of the Sigmod and,andis a parameter that is trainable,is as followsA word vector of individual positions;
in a modeling manner, the probability of the position of the object is:
whereinIs the firstThe vector expression of the individual subjects is,is a function of the Sigmod and,is a parameter that is trainable,
the training function is:
by maximizing this function, training of the model is accomplished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111331381.8A CN113761893B (en) | 2021-11-11 | 2021-11-11 | Relation extraction method based on mode pre-training |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111331381.8A CN113761893B (en) | 2021-11-11 | 2021-11-11 | Relation extraction method based on mode pre-training |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113761893A true CN113761893A (en) | 2021-12-07 |
CN113761893B CN113761893B (en) | 2022-02-11 |
Family
ID=78784893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111331381.8A Active CN113761893B (en) | 2021-11-11 | 2021-11-11 | Relation extraction method based on mode pre-training |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113761893B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114328978A (en) * | 2022-03-10 | 2022-04-12 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Relationship extraction method, device, equipment and readable storage medium |
CN114417824A (en) * | 2022-01-14 | 2022-04-29 | 大连海事大学 | Dependency syntax pre-training model-based chapter-level relation extraction method and system |
CN114519356A (en) * | 2022-02-22 | 2022-05-20 | 平安科技(深圳)有限公司 | Target word detection method and device, electronic equipment and storage medium |
CN114528394A (en) * | 2022-04-22 | 2022-05-24 | 杭州费尔斯通科技有限公司 | Text triple extraction method and device based on mask language model |
CN114861600A (en) * | 2022-07-07 | 2022-08-05 | 之江实验室 | NER-oriented Chinese clinical text data enhancement method and device |
CN117807956A (en) * | 2023-12-29 | 2024-04-02 | 兰州理工大学 | ICD automatic coding method based on clinical text tree structure |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130346069A1 (en) * | 2012-06-15 | 2013-12-26 | Canon Kabushiki Kaisha | Method and apparatus for identifying a mentioned person in a dialog |
CN109543183A (en) * | 2018-11-16 | 2019-03-29 | 西安交通大学 | Multi-tag entity-relation combined extraction method based on deep neural network and mark strategy |
CN110472235A (en) * | 2019-07-22 | 2019-11-19 | 北京航天云路有限公司 | A kind of end-to-end entity relationship joint abstracting method towards Chinese text |
CN111488726A (en) * | 2020-03-31 | 2020-08-04 | 成都数之联科技有限公司 | Pointer network-based unstructured text extraction multi-task joint training method |
CN112507699A (en) * | 2020-09-16 | 2021-03-16 | 东南大学 | Remote supervision relation extraction method based on graph convolution network |
CN112613306A (en) * | 2020-12-31 | 2021-04-06 | 恒安嘉新(北京)科技股份公司 | Method, device, electronic equipment and storage medium for extracting entity relationship |
CN113254429A (en) * | 2021-05-13 | 2021-08-13 | 东北大学 | BERT and MLM-based noise reduction method for remote supervision relationship extraction |
CN113326698A (en) * | 2021-06-18 | 2021-08-31 | 深圳前海微众银行股份有限公司 | Method for detecting entity relationship, model training method and electronic equipment |
-
2021
- 2021-11-11 CN CN202111331381.8A patent/CN113761893B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130346069A1 (en) * | 2012-06-15 | 2013-12-26 | Canon Kabushiki Kaisha | Method and apparatus for identifying a mentioned person in a dialog |
CN109543183A (en) * | 2018-11-16 | 2019-03-29 | 西安交通大学 | Multi-tag entity-relation combined extraction method based on deep neural network and mark strategy |
CN110472235A (en) * | 2019-07-22 | 2019-11-19 | 北京航天云路有限公司 | A kind of end-to-end entity relationship joint abstracting method towards Chinese text |
CN111488726A (en) * | 2020-03-31 | 2020-08-04 | 成都数之联科技有限公司 | Pointer network-based unstructured text extraction multi-task joint training method |
CN112507699A (en) * | 2020-09-16 | 2021-03-16 | 东南大学 | Remote supervision relation extraction method based on graph convolution network |
CN112613306A (en) * | 2020-12-31 | 2021-04-06 | 恒安嘉新(北京)科技股份公司 | Method, device, electronic equipment and storage medium for extracting entity relationship |
CN113254429A (en) * | 2021-05-13 | 2021-08-13 | 东北大学 | BERT and MLM-based noise reduction method for remote supervision relationship extraction |
CN113326698A (en) * | 2021-06-18 | 2021-08-31 | 深圳前海微众银行股份有限公司 | Method for detecting entity relationship, model training method and electronic equipment |
Non-Patent Citations (1)
Title |
---|
刘雅璇等: "基于头实体注意力的实体关系联合抽取方法", 《计算机应用》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114417824A (en) * | 2022-01-14 | 2022-04-29 | 大连海事大学 | Dependency syntax pre-training model-based chapter-level relation extraction method and system |
CN114417824B (en) * | 2022-01-14 | 2024-09-10 | 大连海事大学 | Chapter-level relation extraction method and system based on dependency syntax pre-training model |
CN114519356A (en) * | 2022-02-22 | 2022-05-20 | 平安科技(深圳)有限公司 | Target word detection method and device, electronic equipment and storage medium |
CN114519356B (en) * | 2022-02-22 | 2023-07-18 | 平安科技(深圳)有限公司 | Target word detection method and device, electronic equipment and storage medium |
CN114328978A (en) * | 2022-03-10 | 2022-04-12 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Relationship extraction method, device, equipment and readable storage medium |
CN114328978B (en) * | 2022-03-10 | 2022-05-24 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Relationship extraction method, device, equipment and readable storage medium |
CN114528394A (en) * | 2022-04-22 | 2022-05-24 | 杭州费尔斯通科技有限公司 | Text triple extraction method and device based on mask language model |
CN114861600A (en) * | 2022-07-07 | 2022-08-05 | 之江实验室 | NER-oriented Chinese clinical text data enhancement method and device |
CN114861600B (en) * | 2022-07-07 | 2022-12-13 | 之江实验室 | NER-oriented Chinese clinical text data enhancement method and device |
US11972214B2 (en) | 2022-07-07 | 2024-04-30 | Zhejiang Lab | Method and apparatus of NER-oriented chinese clinical text data augmentation |
CN117807956A (en) * | 2023-12-29 | 2024-04-02 | 兰州理工大学 | ICD automatic coding method based on clinical text tree structure |
Also Published As
Publication number | Publication date |
---|---|
CN113761893B (en) | 2022-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113761893B (en) | Relation extraction method based on mode pre-training | |
CN111241294B (en) | Relationship extraction method of graph convolution network based on dependency analysis and keywords | |
CN110134757B (en) | Event argument role extraction method based on multi-head attention mechanism | |
CN110597735B (en) | Software defect prediction method for open-source software defect feature deep learning | |
CN111159407B (en) | Method, apparatus, device and medium for training entity recognition and relation classification model | |
CN112507699B (en) | Remote supervision relation extraction method based on graph convolution network | |
CN111160035B (en) | Text corpus processing method and device | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
CN112183064B (en) | Text emotion reason recognition system based on multi-task joint learning | |
CN113868432B (en) | Automatic knowledge graph construction method and system for iron and steel manufacturing enterprises | |
CN109063164A (en) | A kind of intelligent answer method based on deep learning | |
CN112800203B (en) | Question-answer matching method and system fusing text representation and knowledge representation | |
CN112232087A (en) | Transformer-based specific aspect emotion analysis method of multi-granularity attention model | |
CN116661805B (en) | Code representation generation method and device, storage medium and electronic equipment | |
CN116484024A (en) | Multi-level knowledge base construction method based on knowledge graph | |
CN117033423A (en) | SQL generating method for injecting optimal mode item and historical interaction information | |
CN117763363A (en) | Cross-network academic community resource recommendation method based on knowledge graph and prompt learning | |
CN115292568B (en) | Civil news event extraction method based on joint model | |
CN117787253B (en) | Triple extraction method based on double gating mechanism and depending on directed attention network | |
CN117828024A (en) | Plug-in retrieval method, device, storage medium and equipment | |
CN111382333B (en) | Case element extraction method in news text sentence based on case correlation joint learning and graph convolution | |
CN116975161A (en) | Entity relation joint extraction method, equipment and medium of power equipment partial discharge text | |
CN115408506B (en) | NL2SQL method combining semantic analysis and semantic component matching | |
CN114298052B (en) | Entity joint annotation relation extraction method and system based on probability graph | |
CN116340507A (en) | Aspect-level emotion analysis method based on mixed weight and double-channel graph convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |