CN114398489A

CN114398489A - Entity relation joint extraction method, medium and system based on Transformer

Info

Publication number: CN114398489A
Application number: CN202111480107.7A
Authority: CN
Inventors: 张正; 常光辉; 黄海辉; 胡新庭; 陈浪
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-04-26

Abstract

The invention requests to protect an entity relation combined extraction method, medium and system based on a Transformer, wherein the method comprises the following steps: connecting the triples of entity relationships marked in the training data with the training data by using special identifiers; vectorizing mapping is carried out on the words in the processed training data; inputting the mapped training data into an entity relationship joint extraction model based on an attention mechanism, and training the model through a back propagation algorithm to obtain an entity relationship prediction model; and inputting the sentences which need entity relationship joint extraction into the trained model, and predicting the triple relationship in the sentences. The triple extraction task is regarded as a sequence-to-sequence task, and the combined extraction of the model is realized by a parameter sharing method.

Description

Entity relation joint extraction method, medium and system based on Transformer

Technical Field

The invention belongs to deep learning and natural language processing technologies, and particularly relates to a Transformer-based entity relationship joint extraction method and system.

Background

With the advent of the big data age, the data volume on the internet is rapidly rising, and the data volume mainly contains a large amount of natural language texts, so that a large amount of hidden knowledge is contained in the natural language texts, and how to rapidly and efficiently extract the hidden knowledge from the texts in the open field becomes an important problem in front of people. To solve this problem, information extraction was first proposed in the MUC-7 conference of 1998, and entity relationship extraction is a core task of text mining, information extraction, which extracts effective semantic knowledge by modeling text information and automatically extracting semantic relationships between entities.

Therefore, in order to extract hidden knowledge from a huge amount of unstructured data, the concept of a knowledge graph is proposed. In the knowledge graph, proper nouns such as names of people, places and the like in mass data are represented as entities, the relation between any two entities is represented as a relation, and the knowledge graph is constructed through a form of a triple (a main entity, a relation and an auxiliary entity). Therefore, in order to automatically extract triples in a structured text, researchers have proposed methods for information extraction, and methods based on pipeline and joint learning are the two main methods at present.

Currently, entity relationships are classified into a pipeline method and a joint learning method according to an extraction method. The method of the assembly line regards the entity relation extraction as two subtasks, firstly carries out named entity recognition on the text, and then recognizes the relation between the named entities, which is called relation extraction. The method of the joint extraction is to regard named entity identification and relationship extraction as a subtask, and directly extract the triad in the data through a joint learning method. The problem of error accumulation caused by the problem of accuracy of named entity identification is solved, the accuracy of entity relation extraction is improved, and the work of the patent is also based on a joint learning method. A new idea of entity relation joint extraction is provided.

Through retrieval, application publication No. CN111666427A, a method, apparatus, device and medium for entity relationship joint extraction includes: acquiring training sample data; training a pre-built entity relation extraction model by using the training sample data to obtain a trained model; wherein, the entity relation extraction model comprises a self-attention layer; the self-attention layer is used for performing attention calculation on the basis of the influence of other triples in the sentence on the current prediction relation in the training process; and when a target text to be subjected to entity relationship extraction is obtained, outputting a corresponding entity relationship extraction result by using the trained model. Therefore, the entity relationship extraction model comprising the self-attention layer is trained, the influence of other triads on the current prediction relationship can be considered in the extraction process of the entity relationship, and the accuracy of the entity relationship extraction is improved.

The problem of performing entity relationship joint extraction based on the Bert + cnn model proposed in publication No. CN111666427A is as follows:

1. it has higher complexity, is not beneficial to the landing of the model,

2. meanwhile, the word level matrix used by the method is difficult to solve the triple overlapping problem,

3. finally, it relies on the CNN model, which also has many drawbacks in solving the long-time problem, and it cannot capture long-distance information.

The invention aims at the improvement method as follows:

1. first, the present invention introduces a half-label half-pointer network, which has better capability to solve the triple overlapping problem compared with the publication number CN111666427A,

2. secondly, the present patent uses a transformer model as a feature extractor, which is superior to the model proposed by CN111666427A in solving the long-time-series problem.

3. Finally, the model of the invention can reduce the complexity of the model and simultaneously can obtain the effect superior to the model triplet extraction proposed in CN 111666427A.

Application publication No. CN113157936A, a method, an apparatus, an electronic device, and a storage medium for entity relationship joint extraction, where the method includes: obtaining a marker sequence; determining a semantic representation from the tag sequence; determining a feature map matrix according to the tag sequence and the semantic representation; predicting a word level matrix related to entity information, a word level matrix related to the entity and the relationship and a word level matrix related to the triple according to the characteristic diagram matrix; and combining the word level matrixes related to the triples to obtain the target triples. According to the embodiment of the application, the word level matrixes related to entity information, the word level matrixes related to entities and relations and the word level matrixes related to triples are determined in stages, the target triples are extracted by using the semantic division frame through a multi-stage entity relation extraction combination method based on image semantic division, the problems of entity overlapping and error accumulation are solved, and the extraction effect is improved through a multi-stage progressive mode.

The publication number CN113157936A also proposes an entity relationship joint extraction model based on Bert + CRF, which adopts a new labeling scheme, and has the following problems:

1. the triple overlapping problem is difficult to solve, in the entity relationship joint extraction problem, a plurality of relationships may exist between entities, however, the current classifier has the condition of classification confusion, and the invention also has the condition.

2. The method is also based on a Bert model, the model complexity is high, the method depends on a CRF model, the model is a time sequence model, the situation of gradient disappearance or gradient explosion easily occurs, and long-distance information is difficult to capture.

Our solution is as follows:

1. the scheme of half pointer and half mark is adopted, so that the confusion condition of the classifier is avoided, and the overlapping problem of triples is further avoided.

2. The model is based on a transformer model, the complexity of the model is relatively low, the defects of a time sequence model are avoided, and compared with the model proposed in CN113157936A, the model has better capability of solving the long time sequence problem.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. The entity relationship joint extraction method and system based on the Transformer are provided, hidden information in unstructured data is extracted, a knowledge graph is constructed, and meanwhile the performance of entity relationship extraction is improved. The technical scheme of the invention is as follows:

a Transformer-based entity relation joint extraction method comprises the following steps:

the method comprises the steps of obtaining an internet data set, preprocessing the internet data set, connecting sentences in the data set with corresponding triples by using preset identifiers, marking a starting position and an ending position of a main entity, a relation and an auxiliary entity, and needing preset separators when a plurality of triples are involved, and simultaneously training data needing the starting identifiers and the ending identifiers, wherein the processed data are as follows; the special delimiters and special start and end identifiers refer to:

[SOS]h⁽¹⁾,r⁽¹⁾,t⁽¹⁾[S2S_SEQ]

h⁽²⁾,r⁽²⁾,t⁽²⁾[S2S_SEQ]

...

h⁽ⁿ⁾,r⁽ⁿ⁾,t⁽ⁿ⁾[EOS]

vectorization mapping is carried out on each word in the processed data set, meanwhile, a position vector is calculated through the position of each word in a sentence, the position vector is input into a neural network model based on a Transformer, and then training is carried out through a reverse propagation algorithm, so that an entity relation combined extraction model based on an attention mechanism is obtained;

and inputting the sentences needing entity relationship extraction into a trained entity relationship joint extraction model based on a Transformer, and predicting the triples in each sentence.

Further, the training process of the transform-based neural network model comprises:

1) mapping each word or word in the input sentence into a corresponding word vector;

2) in the coding layer, the word vector corresponding to each word in the training sample is used as input,learning the context information of each word in the sentence by adopting a Transformer encoder, and simultaneously obtaining a representation vector H_l；

3) In predicting the main entities in the training examples by means of a classifier, wherein the starting position p of each main entity in the training examples is predicted separately by means of a binary classifier^startAnd an end position p^endAnd a vector representation of the host entity

4) In the decoding layer, the expression vector H output by the encoder is_lHost entity predicted by binary classifier

Processing in a preset mode or simply adding to obtain a new context expression vector M_lIn the pair M_lDecoding is carried out, and the subordinate entities are classified through a binary classifier;

5) calculating according to the obtained vector representation of the label to obtain the starting position and the ending position of the main entity, the relation and the auxiliary entity respectively;

6) selecting the maximum likelihood functions of all samples as the target function of the model;

7) and training the model through a back propagation algorithm, updating all parameters in the model, and finally obtaining a converged entity relation joint extraction model.

Further, the training sample is processed by using a special identifier according to the training sample and the triple information in the training set, wherein the training sample needs at least two identifiers, namely a start identifier and an end identifier; the triplet information of this example requires at least three identifiers, respectively a start identifier, a separator, and an end identifier; the data after triple processing is as follows:

[SOS]h⁽¹⁾,r⁽¹⁾,t⁽¹⁾[S2S_SEQ]

h⁽²⁾,r⁽²⁾,t⁽²⁾[S2S_SEQ]

...

h⁽ⁿ⁾,r⁽ⁿ⁾,t⁽ⁿ⁾[EOS]

where h, r, t represent the primary, relational and secondary entities, [ SOS ], [ S2S _ SEQ ], and [ EOS ] represent the start identifier, triplet delimiter and triplet end identifier, respectively, of the triplet.

Further, the 1) mapping each word or word in the input sentence into a corresponding word vector specifically includes:

the word vector representation of a training set is obtained through word2vec training, each word input into a training sample is mapped into a corresponding word vector, the value with the longest length in the training sample is selected as max _ len, and when the length of a sentence is smaller than max _ len, a special placeholder is used for supplement.

Further, 3) predicting the starting position p of each main entity in the training sample respectively through a binary classifier^startAnd an end position p^endAnd vector representation of the principal entity

The method specifically comprises the following steps:

p^start＝σ(W_startx_i+b_start)

p^end＝σ(W_endx_j+b_end)

wherein p is^startRepresenting the starting position of the main entity, p^endRepresenting the end position of the subordinate entity, W_start,W_end,b_startAnd b_endRespectively, training parameters, and the main entity label obtains better performance by optimizing the following likelihood function:

wherein x is_jRepresenting the jth sentence in the training set, i representing the index of the word in the sentence, N representing the length of the sentence, t representing the index of the main entity in the sentence, s representing the main entity of the sentence, p_θ(s|x_j) Denotes x_jSentenceThe probability of the master entity s, I z, is an indicator function, I z 1 when z is true, otherwise 0,

and

the tags of the ith word respectively represent that the tags are the starting position of the main entity when the value is 1, and represent the ending position otherwise. .

Further, the 6) selecting the maximum likelihood function of all samples as the target function of the model specifically includes:

for all training samples, training a model by maximizing the maximum likelihood function of the samples, updating parameters in the model until convergence, wherein a trained target function Loss is defined as follows:

where x represents a sentence in the training set, s represents a primary entity, r represents a relationship, o represents a secondary entity, and T represents a set of triples.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the transform-based entity-relationship joint extraction method of any one of the above.

A Transformer-based entity relationship joint extraction system comprises:

the data preprocessing module is used for identifying the starting positions of the main entity, the relation and the auxiliary entity in the training data;

the model training module is used for mapping each word in the sentence of the training data into a corresponding word vector, inputting the word vector into a neural network model based on a Transformer, and training the word vector through a back propagation algorithm to obtain an entity relationship extraction model;

and the result processing module is used for inputting the sentences which need to be subjected to entity relationship extraction into the trained entity relationship extraction model and extracting possible triple information in the sentences.

The invention has the following advantages and beneficial effects:

the invention provides an entity relationship combined extraction model based on a Transformer, and simultaneously introduces a semi-pointer semi-marker network, takes an entity relationship combined extraction task as a sequence-to-sequence task, firstly extracts a main entity, and then extracts a secondary entity and a relationship. In the stage of predicting the main entity, the starting position and the ending position of the main entity in the sentence are respectively predicted through a semi-pointer semi-tagged network, the process can simultaneously predict a plurality of main entities, in the stage of predicting the auxiliary entity, each semi-pointer semi-tagged network corresponds to a triple relationship, and the starting position and the ending position of the auxiliary entity are simultaneously predicted. The invention also proposes a corresponding optimization function for the model. Compared with the traditional extraction method of the pipeline, the method has no error propagation problem, and simultaneously considers the correlation of named entity identification and relation extraction.

Drawings

FIG. 1 is a flow diagram of one possible system framework provided by the preferred embodiment of the present invention;

FIG. 2 is a diagram of a transform-based neural network architecture according to an embodiment of the present invention;

fig. 3 is a structural diagram of a transformer according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

the invention provides a Transformer-based entity relationship joint extraction method, medium and system, which comprises the following steps:

firstly, data processing is carried out on training samples in training data, the training samples are separated from triples by preset separators, meanwhile, the training samples need start identifiers and end identifiers, the triples also need connection identifiers, and the start positions and the end positions of main entities, relations and auxiliary entities are marked. The processed training samples are subjected to word vectorization, the problem of OOV can be avoided to a certain extent by word vectors compared with word vectors, but the problem of segmentation boundaries exists, and word vectorization methods, specifically which vectorization method is adopted, can be adopted here, and selection is performed according to different application scenes.

And inputting the word vectors in the processed training samples into a neural network model based on a Transformer, and performing model training through a back propagation algorithm to obtain parameters required by the model.

And inputting the sentences which need to be subjected to entity relationship extraction into the trained entity relationship joint extraction model based on the Transformer, and predicting possible entity relationship triples in the sentences. .

Specifically, as shown in fig. 1, fig. 1 is a flowchart of a method for extracting entity relationship based on a transform in this embodiment, and as shown in the figure, the method mainly includes three stages: a training data preprocessing stage, a model training stage and a model prediction stage.

Step 101 uses a preset identifier to process according to training examples and triplet information in a training set. Wherein the training sample requires at least two identifiers, a start identifier and an end identifier. The triplet information of this example requires at least three identifiers, a start identifier, a separator, and an end identifier. The data after triple processing is as follows:

[SOS]h⁽¹⁾,r⁽¹⁾,t⁽¹⁾[S2S_SEQ]

h⁽²⁾,r⁽²⁾,t⁽²⁾[S2S_SEQ]

...

h⁽ⁿ⁾,r⁽ⁿ⁾,t⁽ⁿ⁾[EOS]

102, using a non-labeled corpus to obtain a word vector representation with semantic information through word2vec training, and providing the word vector representation for a model.

Step 103, with reference to fig. 2, the transform-based entity-relationship joint extraction model includes the following specific steps:

step 1, using word vector representation of a training set obtained through word2vec training to map each word input into a training sample into a corresponding word vector, selecting the value with the longest length in the training sample as max _ len, and supplementing the sentence with a special placeholder when the sentence length is smaller than max _ len.

Step 2, performing special processing on the word vectors and the position vectors, giving position information to the input vectors, inputting the position information as a model, learning the context information of each word in the input sentence by adopting a multi-layer Transformer encoder, and obtaining corresponding vector representation H_l。

Step 3, obtaining vector representation H_lThen, respectively predicting the starting positions p of the main entities by adopting a binary classifier^startAnd an end position p^endThe process can predict a plurality of main entities, and solves the overlapping problem of the triples. The detailed process of the process is as follows:

p^start＝σ(W_startx_i+b_start)

p^end＝σ(W_endx_j+b_end)

wherein p is^startRepresenting the starting position of the main entity, p^endRepresenting the end position of the subordinate entity, W_start,W_end,b_startAnd b_endRespectively training parameters. The main entity label obtains better performance by optimizing the following likelihood function:

where I { z } is an indicator function, I { z } is 1 when z is true, and 0, x otherwise_jRepresents the jth sentence in the training set, i represents the index of the word in the sentence, N represents the length of the sentence, t represents the index of the main entity in the sentence, s represents the main entity of the sentence, p_θ(s|x_j) Denotes x_jThe probability of the main entity s in the sentence,.

And

the tags of the ith word respectively represent that the tags are the starting position of the main entity when the value is 1, and represent the ending position otherwise.

Step 4, after the main entity is obtained, the main entity and the expression vector H obtained by the transform coder are used for_lSpecial processing is carried out to obtain a context expression vector containing main entity information, the context expression vector is input into a decoder to obtain decoded context information, and the decoded context information is used for predicting the starting positions of the auxiliary entities through a binary classifier

And end position of the subordinate entity

The detailed process of the process is as follows:

wherein

Representing the probability that the ith position is the start position of the entity,

representing the probability that the jth location is the end location of the entity. W_start,W_end,b_startAnd b_endRespectively training parameters. The likelihood function for the master entity and the relationship is as follows:

wherein the decoder includes a multi-headed self-attention mechanism, etc., as shown in fig. 3.

And 5, training the model by maximizing the maximum likelihood function of the sample for all training samples, updating parameters in the model until convergence, wherein the training target function Loss is defined as follows:

And 6, training the model through a back propagation algorithm, updating all parameters in the model, and finally obtaining an entity relationship combined extraction model.

And 104, inputting the sentences needing entity relationship joint extraction into the obtained entity relationship joint extraction model, and predicting possible triples in the sentences.

According to the scheme, aiming at the problem that the utilization of the relation information of the whole sentence is insufficient in the entity relation combined extraction model, the transform coder is introduced to extract the depth characteristics, so that the prediction performance of the model can be improved, and the method has good practicability.

Another embodiment of the present invention provides a system for extracting entity relationship based on Transformer jointly, which includes:

and the data preprocessing module is responsible for identifying the starting positions of the main entity, the relation and the auxiliary entity in the training data.

And the model training module is responsible for mapping each word in the sentence of the training data into a corresponding word vector, inputting the word vector into the neural network model based on the Transformer, and training the word vector through a back propagation algorithm to obtain the entity relationship extraction model.

And the result processing module is responsible for inputting the sentences which need to be subjected to entity relation extraction into the trained entity relation extraction model and extracting possible triple information in the sentences.

In the invention, the Transformer networks of the coding layer and the decoding layer can be replaced by other neural network structures.

Preferably, another embodiment is: a Transformer-based entity relationship joint extraction system comprises:

an input layer: the input layer uses the same model input as the transformer to perform special processing on word embedding and position embedding to obtain corresponding text representation and input the text representation into the model.

And (3) coding layer: according to previous researches, the deeper the level of a deep learning model is, the more semantic representations of the deep level of a sentence can be extracted, a 12-level Transformer is designed in a coding layer, and a multi-head attention mechanism in the Transformer can also achieve the effect similar to a multi-channel convolutional neural network.

A decoding layer: the multi-layer decoder can overcome the problem that the addition process of the sentence vector representation after the prediction of the main entity has an influence on the prediction of the auxiliary entity and the relation, and the output layer outputs the starting position and the ending position of the auxiliary entity.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A Transformer-based entity relation joint extraction method is characterized by comprising the following steps:

the method comprises the steps of obtaining an internet data set, preprocessing the internet data set, connecting sentences in the data set with corresponding triples by using preset identifiers, marking the starting positions and the ending positions of a main entity, a relation and an auxiliary entity, and needing preset separators when the triples relate to a plurality of triples, and simultaneously training data needing preset starting and ending identifiers, wherein the processed data are as follows;

vectorization mapping is carried out on each word in the processed data set, meanwhile, a position vector is calculated through the position of each word in a sentence, the position vector is input into a neural network model based on a Transformer, and then training is carried out through a back propagation algorithm, so that an entity relation combined extraction model based on the Transformer is obtained;

and inputting the sentences needing entity relationship extraction into a trained entity relationship joint extraction model based on a Transformer, and predicting triples in the sentences.

2. The method for extracting entity relationship combination based on Transformer according to claim 1, wherein the training process of the neural network model based on Transformer comprises:

2) in the coding layer, a word vector corresponding to each word in the training sample is taken as input, a Transformer encoder is adopted to learn the context information of each word in the sentence, and a representation vector H is obtained at the same time_l；

3) In predicting the main entities in the training examples by means of a classifier, wherein the starting position p of each main entity in the training examples is predicted separately by means of a binary classifier^startAnd an end position p^endAnd vector representation of the principal entity

4) In the decoding layer, the table of the encoder outputVector representation H_lHost entity predicted by binary classifier

Splicing or simply adding the obtained context expression vectors in a preset mode to obtain a new context expression vector M_lIn the pair M_lDecoding is carried out, and the secondary entities are classified through a binary classifier;

7) and training the model through a back propagation algorithm, updating all parameters in the model, and finally obtaining a converged entity relationship joint extraction model.

3. The method for extracting entity relationship combination based on Transformer as claimed in claim 1 or 2, wherein the method uses a special identifier process according to training examples and triplet information in the training set, wherein the training examples require at least two identifiers, namely a start identifier and an end identifier; the triplet information of this example requires at least three identifiers, respectively a start identifier, a separator, and an end identifier; the data after triple processing is as follows:

[SOS]h⁽¹⁾,r⁽¹⁾,t⁽¹⁾[S2S_SEQ]

h⁽²⁾,r⁽²⁾,t⁽²⁾[S2S_SEQ]

...

h⁽ⁿ⁾,r⁽ⁿ⁾,t⁽ⁿ⁾[EOS]

4. The method of claim 2, wherein 1) mapping each word or word in an input sentence to a corresponding word vector comprises:

the word vector representation of the training set is obtained through word2vec training, each word input into the training sample is mapped into a corresponding word vector, the value with the longest length in the training sample is selected as max _ len, and when the sentence length is smaller than max _ len, the sentence length is supplemented by a special placeholder.

5. The method of claim 2, wherein 3) the starting position p of each main entity in the training sample is predicted respectively through a binary classifier^startAnd an end position p^endAnd vector representation of the principal entity

The method specifically comprises the following steps:

p^start＝σ(W_startx_i+b_start)

p^end＝σ(W_endx_j+b_end)

wherein x is_jRepresenting the jth sentence in the training set, i representing the index of the word in the sentence, N representing the length of the sentence, t representing the index of the main entity in the sentence, s representing the main entity of the sentence, p_θ(s|x_j) Denotes x_jThe probability of the dominant entity s in the sentence, I z, represents the indicator function, I z 1 when z is true, otherwise 0,

and

6. The method for entity relationship joint extraction based on Transformer according to claim 2, wherein the 6) selecting the maximum likelihood function of all samples as the objective function of the model specifically comprises:

7. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, implements the method for transform-based entity relationship joint extraction according to any one of claims 1 to 6.

8. A Transformer-based entity relationship joint extraction system is characterized by comprising:

the model training module is used for mapping each word in the sentence of the training data into a corresponding word vector, inputting the word vector into a neural network model based on a Transformer, and training the word vector through a back propagation algorithm to obtain an entity relation extraction model based on the Transformer;