CN113761893A

CN113761893A - Relation extraction method based on mode pre-training

Info

Publication number: CN113761893A
Application number: CN202111331381.8A
Authority: CN
Inventors: 杜熙源; 高成睿; 岳元浩
Original assignee: Shenzhen Aerospace Technology & Innovation Industrial Co ltd
Current assignee: Shenzhen Aerospace Technology & Innovation Industrial Co ltd
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2021-12-07
Anticipated expiration: 2041-11-11
Also published as: CN113761893B

Abstract

The scheme discloses a mode pre-training relationship extraction method, which is mainly used for solving the problems of entity boundary determination and relationship distribution imbalance in entity relationship joint extraction. The scheme constructs a mode for pre-training the model, and the model is modeled in an encoding-decoding mode. The column pre-trains the encoding part and the decoding part. The coded portion is pre-trained with the associated entity data to obtain information that predicts the entity boundaries. The decoding part is pre-trained by relevant relational data, a dependency tree is constructed by using syntactic dependency analysis, and part of information is shielded according to a mask mechanism in an attention mechanism through an adjacent matrix of the dependency tree. And finally, updating the parameters in the whole frame in a supervised learning mode, and finally learning a relationship extraction model with strong adaptability and strong expression capability.

Description

Relation extraction method based on mode pre-training

Technical Field

The invention relates to a relation extraction method based on mode pre-training, and belongs to the technical field of information extraction.

Background

With the rapid development and popularization of computers and the internet, the amount of data created by humans is on a high-speed growth trend. In this information explosion age, how to analyze and process information rapidly and extract valuable information from text becomes a research hotspot and an urgent problem to be solved. In response to these challenges, it is imperative to develop an automated information processing tool that automatically and quickly extracts valuable knowledge from a vast amount of information. In this context, Information Extraction (IE) technology has become a hot content of academic and industrial research, and the purpose of Information Extraction is to extract specific and valuable Information from semi-structured and unstructured texts and structured data, and process the extracted Information into structured data that can be easily stored and calculated by computers, and the Information Extraction includes Entity identification (Entity Recognition), relationship Extraction (relationship Extraction) and Event Extraction (Event Extraction).

As one of important tasks of information extraction, the relation extraction aims at extracting semantic relations contained between two entities from a text, and has wide application value in the fields of mass data processing, automatic construction of knowledge bases, automatic question answering and the like; for example, the large-scale Knowledge base systems constructed by automatically processing large-scale Web texts through a relational extraction technology can be applied to search engine optimization, for example, Knowledge Vault constructed by Google with more than 16 hundred million fact data is applied to improvement of search results, and the user experience effect is improved. Meanwhile, the relationship extraction technology provides infrastructure in other tasks in the natural language field, such as entity linking, automatic summarization, emotion analysis and the like.

The idea of using a pre-training method to improve the performance of a model has been widely applied in the field of deep learning. Natural language processing researchers have proposed methods of pre-training language models based on predecessors, the most well-known of which are BERT, GPT, ELMO, and the like. According to the different problems, the aim of the pre-training model is divided into two categories, including the direction facing language coding and the direction facing model vector space. The direction of language-oriented coding mainly solves the problem of coding a text, namely how to code a section of text, for example, a deep learning convolutional neural network codes the text by using vector representation of pre-trained words. The model-oriented vector space mainly solves the problems of long training time and high cost in deep learning by adjusting model parameters to the problem of easy learning, and introduces some information to adjust the parameters of the model to the position of easy convergence to an optimal solution, including introducing artificially labeled information or additional information contained in an entity, such as entity category and the like. It is noted that the above two types of approaches are not mutually exclusive and can be used in common.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides an entity-relationship joint extraction method based on mode pre-training, which comprises the steps of constructing a pre-training mode by introducing a pre-training structure, training the capability of a model for positioning an entity by using an entity marking task, and extracting the relationship between two entities by an attention mechanism.

The scheme discloses a body-relation combined extraction method based on mode pre-training. The scheme provides a method for supervising learning, and triples of < entity h, relation and entity t > can be mined from a text. For each text segment, firstly pre-training a model through a pre-training network, wherein a pre-training task comprises an encoding part and a decoding part; the coding part enhances the capability of searching the entity through the entity data; the decoding section enhances the weights of the entities in the relationship extraction by syntactic dependency analysis. The pre-training is carried out by using the steps, then the pre-training structure is removed, and the formal data is used for training, so that the probability distribution of the relationship between the entity pairs in the text is obtained.

In order to achieve the purpose, the technical scheme of the invention is as follows: a relation extraction method based on mode pre-training comprises the following specific steps:

step 1) pre-training in an encoding stage, constructing a pre-training network for acquiring information of an entity position, and representing related information by labeling a text, which mainly comprises the following steps:

a) marking entity position information, namely the head, the interior and the tail of the entity, and identifying various entities to provide information for the model;

b) the expectation of pre-training the coding part model is to obtain the position information of an entity, wherein the specification is that the head character of the entity represents the information of the whole entity, a BERT pre-training language model is used for coding a text, and pre-training is carried out based on the coding mechanism so that the model can obtain all the information;

c) the encoding part is pre-trained, and specifically, the pre-training in the encoding part is a multi-training task of word prediction and boundary prediction, that is, for a piece of text, given a character, the model predicts the head and tail positions of an entity where the character is located, and predicts the content of the whole entity (if the character is the head of the entity). In actual operation, a neural network is added to a BERT coding result, probability mapping is carried out, and BIO labels corresponding to sequences are predicted.

Step 2) pre-training in a decoding stage, enhancing information of an entity through a mask matrix of an attention mechanism, obtaining pre-trained data by using a syntax dependency tree, and generating the mask matrix, wherein the method mainly comprises the following steps:

a) in the SDP (shortest dependency path) of the entity-related text, the shortest dependency path between entities is found by performing word segmentation, part of speech tagging and dependency syntax analysis on the text containing the target entity, and the shortest dependency path contains all semantic information between the entities in the text, so that syntactic dependency analysis is performed on a large number of linguistic data by using the syntactic dependency analysis, and the dependency tree is sampled according to the constructed dependency tree to obtain the relationships of some major predicates and verb guest types as the training data of a decoding part;

b) for an original text, a syntax dependency tree is constructed by using a syntax dependency analysis tool to obtain a corresponding adjacent matrix, and the matrix is used as a mask matrix of a decoding part attention mechanism;

c) applying a mask mechanism in the attribute to mask vectors, wherein in a syntactic dependency tree, a node is regarded as an entity, the importance of the entity in the overall task is highlighted in such a way, the capability of the model for detecting the entity is enhanced, and the two vectors are subjected to weighted summation to obtain the vector for decoding;

d) the prediction result of the decoding part is represented by a label representation method in the form of a matrix, the elements represented by the rows and the columns of the matrix are the same and are all entities corresponding to the input text, and the relationship between the two entities is represented by the values of the matrix, obviously, the matrix is a symmetrical matrix;

e) and adding a prediction structure of the predicate-object triple behind the transform part to finish training.

Step 3) training the model by using formal data:

in formal training, the network in the pre-training stage is removed, the network is trained in a mini-batch gradient descending mode, and parameters are updated. The method comprises the steps of abandoning a full connection layer added in a pre-training process, directly using a BERT model which is pre-trained to encode a text, inputting the text to a transform part of a decoding layer, modeling a triple into a mapping from a subject to a predicate, and normalizing an output node value through a softmax function to obtain the probability distribution of the text-to-relation.

Wherein, in step 1) a) additional information needs to be obtained in different ways.

1-1) firstly, carrying out word segmentation on a text original text by using a related tool, and obtaining entity marking data by a manual fine adjustment method, wherein the entity marking data comprises the category of an entity and the position of the entity. And performing word segmentation, part of speech tagging and dependency syntactic analysis on the text containing the entity pairs. In the dependency syntax tree generated by the dependency syntax analysis, the shortest link between two entities is found. The words and edges on this shortest connected path will be the shortest dependent path of the entity pair in the text. In this way, the relationship of the main and the subordinate is constructed.

1-2) pre-training a word vector model. Word vectors in the data set are trained in advance using BERT or GPT approaches. If not, the method of the scheme keeps synchronous training with the model parameters. But in effect, word vectors trained on large-scale predictions in advance can hold more semantic and grammatical information.

Training a pre-training language model in a coding stage by using the constructed entity data in the step 1):

in the boundary prediction task, mapping the pre-training language model, and performing the following steps by a Bi-LSTM sublayer:

；

and passing the obtained mapping result through a CRF layer:

wherein

The calculation formula is a normalization factor:

；

in the word prediction task, 15% of tokens in an entity are masked (Mask), and are directly replaced by Mask with 80% probability, and replaced by any other word with 10% probability, and the original Token is retained with 10% probability.

Step 1) c) obtaining a series of vectors for the text, represented as feature matrices, by b) in step 1)

. Wherein,

is a feature vector representation of all nodes of the input, the dimensions are

，

Is the number of the nodes that are present,

is the dimension of the vector representation of each node.

And performing dependency analysis on the original text by using a syntactic dependency tool to construct a syntactic dependency tree, and obtaining an adjacency matrix M of the text, wherein the dimensionality is N x N. By performing operation through a multihead-attribute mechanism, a mask mechanism in a transform can shield entities which are not 1 in a matrix, highlight information of related entities, and perform pre-training on a transform layer, which is specifically as follows:

；

wherein

In order to be a mask matrix, the method comprises the following steps of,

a mask operator representing the type of attention,

for the three vectors of the attention mechanism,

is a vector

The dimension (c) of (a) is,

represents the first

A head.

In step 2) a), the output of step 1) is a matrix Z of N × F, Z being represented as a sequence Z = { Z0, Z1.., ZN }, each node being a vector of dimension F. And then, calculating the vector representation of each node according to a weighted average mode to obtain the final vector representation. Combining the results of each calculation, through a feedforward neural network sublayer:

；

the final obtained vector is expressed as follows:

；

wherein,

the original result output by the coding layer.

It is specified here that a relational model is a function, the parameters of which are the subject, the result is the object, and the probability that the ith character is the starting position and the ending position of the subject is:

；

wherein

Is a function of the Sigmod and,

is a parameter that is trainable,

is as follows

A word vector of locations.

The probability of the position of the object is:

wherein

Is a vector representation of the kth subject,

is a function of the Sigmod and,

are trainable parameters.

Step 2) training through the data generated in step 1), wherein the specific loss function is as follows:

and an encoding part:

；

a decoding part:

；

wherein

For the purpose of the corresponding text sequence,

are the corresponding labeled sequences.

The training function is represented as:

wherein,

the likelihood function for labeling the part of the subject is specifically defined as:

wherein

For inputting the length of the text when

Taking 1 when true, taking 0 when false,

the value of the marker for the ith token is 0 or 1,

as a parameter

;

The likelihood function of labeling the part of the object is specifically defined as:

wherein

In order to input the length of the text,

taking 1 when true, taking 0 when false,

the value of the marker for the ith token is 0 or 1,

as a parameter

-representing objects without correspondence of subject, marks thereof

Is always 0;

the overall loss function can be expressed as:

；

wherein

Training of the model is done by maximizing this function for the weights.

And 3) completing the pre-training of the model through the previous steps, removing the pre-trained network, and training by using formal data:

and removing word prediction codes of the pre-training part, and training by using formal data by using the network after the pre-training is finished. By modeling the code from the decoding layer as a function with the parameters subject and the result object as defined above, the probability that it is the starting and ending position of the subject for the ith character is:

；

wherein

Is a function of the Sigmod and,

are trainable parameters.

The probability of the position of the object is:

；

wherein

Is a vector representation of the kth subject,

is a function of the Sigmod and,

are trainable parameters.

Wherein the specific loss function is as follows:

and an encoding part:

；

wherein

For the purpose of the corresponding text sequence,

in order for the sequence of the corresponding labels,

a decoding part:

wherein,

；

；

wherein

In order to input the length of the text,

taking 1 when true, taking 0 when false,

the value of the marker for the ith token is 0 or 1,

as a parameter

；

wherein

In order to input the length of the text,

taking 1 when true, taking 0 when false,

the value of the marker for the ith token is 0 or 1,

as a parameter

-representing objects without correspondence of subject, marks thereof

Is always 0;

the overall loss function can be expressed as:

；

wherein

Training of the model is done by maximizing this function for the weights.

Compared with the prior art, the invention has the following advantages:

1) the method introduces additional corpora containing entity category, entity representation, entity context and text path information, and can improve the coding capacity of the model compared with the traditional representation-based method and rule-based method.

2) Compared with the method for introducing the additional information, the method of the scheme introduces the additional information into the pre-training language model by pre-training in the coding layer. The method has good adaptability, can be applied to the task, and can also apply the model which is trained in advance to other problems. Secondly, multiple kinds of extra information act on simultaneously, and the problem of insufficient coverage rate of single information can be avoided.

3) The method customizes a reasonable initialization method for all extra information, uses a scheme based on word segmentation and syntactic dependency analysis on entity and relation embedding, adopts a method based on attention mechanism on relation extraction, and can fully utilize semantic information so as to achieve better effect.

4) The method adopts the SDP (shortest dependent path) of the text to introduce an attention mechanism for operation, firstly reserves the original semantics of the text, adds the dependency syntactic characteristic and highlights the importance of the entity in text coding. The weighted sum mode retains the original semantic meaning of the text and adds the dependency syntactic characteristics, which cannot be achieved by the traditional text coding mode.

5) The scheme compresses and screens information by using an attention mechanism on a graph, and the method has the characteristics of high efficiency and strong expression capability. The attention mechanism can effectively mine the association between the entities, can integrate high-order logic expression of an inference chain into the feature representation, and well utilizes the summary experience of human beings.

6) The scheme introduces an attention mechanism and syntax dependency analysis at a decoding layer to perform further pre-training and feature extraction work. In the text, various information introduced in the encoding stage may have a lot of noises, such as entity category identification errors, or a text path introduces irrelevant text, and the information does not have a great effect on extracting relevant entities, so that a relationship data representation generated by syntactic dependency analysis is used as supervision information in a way of attention mechanism, the relevance is calculated for each character, the character features with low relevance are shielded by using mask of attention mechanism, the nodes more prone to the entities are reserved, and the original encoding result is subjected to weighted summation, so that after weighting, the features can be further compressed, and the noises are also filtered.

7) In the scheme, different entity pairs and different additional information have different contribution degrees to the relation extraction task, and the information is filtered by using a mask mechanism of an attention mechanism. The method has the advantages that semantic deviation caused by partial information is avoided, and the model automatically selects the most favorable information of the current entity to carry out relation classification.

Drawings

FIG. 1 is a schematic overall framework diagram of the present solution;

FIG. 2 is a schematic diagram of the encoding part of the pre-training in the present scheme;

fig. 3 is a schematic diagram of formal training in the present scheme.

Detailed Description

The following detailed description of the embodiments of the invention is provided in connection with the accompanying drawings.

Example 1: referring to fig. 1-3, the present invention is a method for extracting remote supervised relationship of mode pre-training, comprising the following steps:

1-1) firstly, carrying out word segmentation on original text of a text by using a related tool, and obtaining entity marking data by a manual fine adjustment method, wherein the entity marking data comprises the category of an entity and the position of the entity. And performing word segmentation, part of speech tagging and dependency syntactic analysis on the text containing the entity pairs. In the dependency syntax tree generated by the dependency syntax analysis, the shortest link between two entities is found. The words and edges on this shortest connected path will be the shortest dependent path of the entity pair in the text. In this way, the relationship of the main and the subordinate is constructed.

2-1) in the boundary prediction task, mapping the pre-training language model, and mapping the vector through BilSTM:

wherein,

vector codes obtained for the BERT pre-training language model.

Sequence annotation prediction using CRF (conditional random field):

；

and obtaining the result of sequence annotation.

2-2) in the word prediction task, the Token of 15% of the entity in the input text is hidden (Mask), and is directly replaced by Mask with a probability of 80%, and replaced by any other word with a probability of 10%, and the original Token is retained with a probability of 10%. A corpus is repeatedly provided to the model in this way for training.

a series of vectors for the text is obtained by step 1), represented as a feature matrix X. Wherein, X is the feature vector representation of all the input nodes, the dimension is N X D, N is the number of the nodes, and D is the dimension of the vector representation of each node.

wherein,

in order to be a mask matrix, the method comprises the following steps of,

a mask operator representing the type of attention,

for the three vectors of the attention mechanism,

is a vector

The dimension (c) of (a) is,

represents the first

A head.

The output of the above steps is a matrix Z of N x F. Denote Z as a sequence of nodes Z = { Z0, Z1.., ZN }, each node being a vector of dimension F. And then, calculating the vector representation of each node according to a weighted average mode to obtain the final vector representation. Combining the results of each calculation, through a feedforward neural network sublayer:

；

wherein

Are parameters of the feed-forward neural network.

The final obtained vector is expressed as follows:

;

wherein

Is a real number between values (0, 1).

Wherein

The original result output by the coding layer.

A relational model is modeled as a function with the parameters subject and the result object, and the probability of the starting and ending positions of the subject for the ith character is:

wherein

Is a function of the Sigmod and,

is a parameter that is trainable,

is as follows

Word vectors of individual positions.

The probability of the position of the object is:

wherein

Is a vector representation of the kth subject,

is a function of the Sigmod and,

are trainable parameters.

Wherein the specific loss function is as follows:

and an encoding part:

the training function in the decoding part is represented as:

wherein,

wherein,

in order to input the length of the text,

taking 1 when true, taking 0 when false,

the value of the marker for the ith token is 0 or 1,

as a parameter

;

wherein

In order to input the length of the text,

taking 1 when true, taking 0 when false,

the value of the marker for the ith token is 0 or 1,

as a parameter

-representing objects without correspondence of subject, marks thereof

Is always 0;

the overall loss function can be expressed as:

；

wherein

Training of the model is done by maximizing this function for the weights.

Step 3) training the model by using formal data;

in formal training, the full-link layer added in the pre-training process is abandoned. Directly coding a text by using a pre-trained BERT model, inputting the text into a transform part of a decoding layer, then mapping the text to an output layer through a full connection layer, and then normalizing an output node value through a softmax function to obtain probability distribution of a text pair relation; through the previous steps, the model is pre-trained, the pre-trained network is removed, and formal data are used for training:

and removing word prediction codes of the pre-training part, and training by using formal data by using the network after the pre-training is finished. The output result is obtained by inputting the code obtained from the decoding layer into a feedforward neural network and predicting the probability by an activation function.

During the training phase, the loss function is as follows:

and (3) defining reference step (2) by specific parameters.

The task of the training phase is to formally train the model, so that training of the code part is not involved here.

It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent substitutions or substitutions made on the basis of the above-mentioned technical solutions belong to the scope of the present invention.

Claims

1. A relation extraction method based on mode pre-training comprises the following steps:

marking entity position information, namely the head, the interior and the tail of the entity, and identifying various entities to provide information for the model;

the expectation of pre-training the coding part model is to obtain the position information of an entity, the head character of the entity represents the information of the whole entity, a BERT pre-training language model is used for coding a text, and the model is pre-trained on the basis of the coding mechanism so as to enable the model to obtain the text information;

pre-training a coding part, wherein the pre-training of the coding part is a multi-training task, namely word prediction and boundary prediction, namely, for a section of text, a character is given, results are added to a neural network through BERT coding, probability mapping is carried out, and BIO labels corresponding to prediction sequences are marked;

the SDP (short dependency path) of the entity-related text is the shortest dependency path, the shortest dependency path between the entities is found by performing word segmentation, part of speech tagging and dependency syntax analysis on the text containing the target entity, and the shortest dependency path contains all semantic information between the entities in the text, so that syntactic dependency analysis is performed on a large number of linguistic data by using the syntactic dependency analysis, and the dependency tree is sampled according to the constructed dependency tree to obtain the relationships of some major predicates and the verb guest types as the training data of a decoding part;

for an original text, a syntax dependency tree is constructed by using a syntax dependency analysis tool to obtain a corresponding adjacent matrix, and the matrix is used as a mask matrix of a decoding part attention mechanism;

applying a mask mechanism in the attribute to mask vectors, wherein in a syntactic dependency tree, a node is regarded as an entity, the importance of the entity in the overall task is highlighted in such a way, the capability of the model for detecting the entity is enhanced, and the two vectors are subjected to weighted summation to obtain the vector for decoding;

the prediction result of the decoding part is represented by a label representation method in the form of a matrix, the elements represented by the rows and the columns of the matrix are the same and are all entities corresponding to the input text, and the relationship between the two entities is represented by the values of the matrix, obviously, the matrix is a symmetrical matrix;

adding a prediction structure of a main and predicate element triple behind a transform part to finish training;

and 3) training the model by using formal data, wherein in the formal training, a full connection layer added in the pre-training process is abandoned, the text is directly encoded by using the pre-trained BERT model and is input to a transform part of a decoding layer, then the triple is modeled into the mapping from a subject to a predicate, and then the output node value is normalized by a softmax function to obtain the probability distribution of the text-to-relation.

2. The method for extracting relationship based on pattern pre-training as claimed in claim 1, wherein the step of obtaining the sequence labeling information of the text in step 1) is as follows:

1-1) firstly, segmenting a section of original text by using a related tool to acquire entity position information contained in the original text, specifically, using a character "B" to represent the head of an entity, using a character "I" to represent the middle part of the entity, and using a character "O" to represent that the character does not belong to any entity;

1-2) pre-training a word vector model, acquiring the pre-trained model in advance by using a BERT or GPT mode,

the step of obtaining the sequence labeling information of the text in the step 2) is as follows:

2-1) performing word segmentation and dependency syntax analysis on a text containing entity pairs, finding the shortest connection path between two entities in a dependency syntax tree generated by the dependency syntax analysis, wherein words and edges on the shortest connection path are used as the shortest dependency paths of the entity pairs in the text;

2-2) the dependency syntax tree contains multiple types of edges, and one or more types of paths are selected.

3. The pattern pre-training based relationship extraction method of claim 1, wherein step 1) pre-trains the BERT pre-trained language model as follows:

3-1) pre-training a BERT model to obtain vectorized representation of a text, obtaining vectorized representation of each entity including position information and semantic information through the BERT, wherein the BERT is a pre-training language model based on a transformer, a vector of the BERT is composed of three parts, token embedding, segmentation embedding and position embedding are carried out, and vectorized representation containing the semantic information is obtained through the BERT;

3-2) performing further pre-training on the existing BERT Model, wherein the task is word prediction (Masked Language Model), a given Model is required to have a character, the Model judges whether the given Model is the first character of an entity or not, and the full text of the word where the character is located is predicted;

3-3) performing further pre-training on the BERT based on the steps, wherein the task is boundary prediction, a character of a model is required to be given, the model judges whether the character belongs to an entity according to the vector of the character, and the boundary of the entity where the character is located is predicted;

3-4) modeling is carried out based on the tasks in the steps, the tasks are multi-task training of word prediction and boundary prediction, in the word prediction task, part of words are randomly covered, and the words are replaced according to the specified probability;

3-5) modeling is carried out based on the tasks of the steps, the tasks are multitask training of word prediction and boundary prediction, in the boundary prediction task, entity boundary prediction is carried out based on obtained word codes, and vectors are mapped by using BilSTM:

wherein,

vector coding obtained for the BERT pre-training language model;

sequence labeling prediction using a CRF conditional random field:

wherein

The position in the sequence is indicated and,

the corresponding sequences are labeled.

4. The pattern pre-training based relationship extraction method according to claim 1, wherein the step 2) models the decoding part as follows:

4-1) constructing a syntactic tree for the processed sequence through syntactic dependency analysis to obtain an adjacency matrix between entities as a transform's mask matrix,

4-2) based on the vectorized expression of the text obtained in the step 3) and the mask matrix in the step 4-1), performing operation through a multihead-attention mechanism, wherein the mask mechanism in the transformer can shield entities which are not 1 in the matrix, highlight information of related entities, and pre-train a transform layer, and the method comprises the following specific steps:

wherein,

in order to be a mask matrix, the method comprises the following steps of,

a mask operator representing the type of attention,

for the three vectors of the attention mechanism,

is a vector

The dimension (c) of (a) is,

represents the first

A head;

combining the results of each calculation, through a feedforward neural network sublayer:

wherein,

in order to feed forward the parameters of the neural network,

the final obtained vector is expressed as follows:

;

wherein

The original result output by the coding layer;

4-3) expressing the predicted relation by using an x matrix taking the entity sequence as rows and columns, and mapping the vector obtained in the 4-2) through a feedforward neural network and an activation function to obtain a predicted relation result.

5. The method for extracting relationship based on pattern pre-training as claimed in claim 1, wherein the step 3) is as follows:

removing the networks added in the pre-training, namely word prediction and boundary prediction of the coding layer and relation prediction of the decoding layer, inputting formal linguistic data for training, specifically, decoding a vector through a decoding layer transformer, and expressing the obtained vector as follows:

；

wherein,

is a real number between values (0, 1).

6. The method for extracting relationship based on pattern pre-training as claimed in claim 1, wherein the step 2) is specifically as follows:

a relationship is modeled as a function with parameters as the subject and the result as the object, and for the ith character, the probability of being the starting and ending positions of the subject is:

；

wherein

Is a function of the Sigmod and,

and

is a parameter that is trainable,

is as follows

A word vector of individual positions;

in a modeling manner, the probability of the position of the object is:

wherein

Is a vector representation of the kth subject,

is a function of the Sigmod and,

and

are trainable parameters.

7. The pattern pre-training based relationship extraction method according to claim 1, wherein the training function of step 3) is as follows: and an encoding part:

；

wherein

For the purpose of the corresponding text sequence,

for the corresponding labeled sequence, the training function in the decoding part is expressed as:

wherein,

;

wherein

In order to input the length of the text,

taking 1 when true, taking 0 when false,

the value of the marker for the ith token is 0 or 1,

as a parameter

;

wherein

In order to input the length of the text,

taking 1 when true, taking 0 when false,

the value of the marker for the ith token is 0 or 1,

as a parameter

-representing objects without correspondence of subject, marks thereof

Is always 0;

the overall loss function can be expressed as:

；

wherein

Training of the model is done by maximizing this function for the weights.

8. The method for extracting relationship based on pattern pre-training as claimed in claim 7, wherein the training process of step 3) is as follows:

the method abandons a pre-trained word prediction and boundary prediction part, and re-initializes modeling of the triples, and specifically comprises the following steps: for the ith character, the probability that it is the starting and ending positions of the subject is: