CN114398489A - Entity relation joint extraction method, medium and system based on Transformer - Google Patents

Entity relation joint extraction method, medium and system based on Transformer Download PDF

Info

Publication number
CN114398489A
CN114398489A CN202111480107.7A CN202111480107A CN114398489A CN 114398489 A CN114398489 A CN 114398489A CN 202111480107 A CN202111480107 A CN 202111480107A CN 114398489 A CN114398489 A CN 114398489A
Authority
CN
China
Prior art keywords
entity
training
word
model
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111480107.7A
Other languages
Chinese (zh)
Inventor
张正
常光辉
黄海辉
胡新庭
陈浪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202111480107.7A priority Critical patent/CN114398489A/en
Publication of CN114398489A publication Critical patent/CN114398489A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The invention requests to protect an entity relation combined extraction method, medium and system based on a Transformer, wherein the method comprises the following steps: connecting the triples of entity relationships marked in the training data with the training data by using special identifiers; vectorizing mapping is carried out on the words in the processed training data; inputting the mapped training data into an entity relationship joint extraction model based on an attention mechanism, and training the model through a back propagation algorithm to obtain an entity relationship prediction model; and inputting the sentences which need entity relationship joint extraction into the trained model, and predicting the triple relationship in the sentences. The triple extraction task is regarded as a sequence-to-sequence task, and the combined extraction of the model is realized by a parameter sharing method.

Description

Entity relation joint extraction method, medium and system based on Transformer
Technical Field
The invention belongs to deep learning and natural language processing technologies, and particularly relates to a Transformer-based entity relationship joint extraction method and system.
Background
With the advent of the big data age, the data volume on the internet is rapidly rising, and the data volume mainly contains a large amount of natural language texts, so that a large amount of hidden knowledge is contained in the natural language texts, and how to rapidly and efficiently extract the hidden knowledge from the texts in the open field becomes an important problem in front of people. To solve this problem, information extraction was first proposed in the MUC-7 conference of 1998, and entity relationship extraction is a core task of text mining, information extraction, which extracts effective semantic knowledge by modeling text information and automatically extracting semantic relationships between entities.
Therefore, in order to extract hidden knowledge from a huge amount of unstructured data, the concept of a knowledge graph is proposed. In the knowledge graph, proper nouns such as names of people, places and the like in mass data are represented as entities, the relation between any two entities is represented as a relation, and the knowledge graph is constructed through a form of a triple (a main entity, a relation and an auxiliary entity). Therefore, in order to automatically extract triples in a structured text, researchers have proposed methods for information extraction, and methods based on pipeline and joint learning are the two main methods at present.
Currently, entity relationships are classified into a pipeline method and a joint learning method according to an extraction method. The method of the assembly line regards the entity relation extraction as two subtasks, firstly carries out named entity recognition on the text, and then recognizes the relation between the named entities, which is called relation extraction. The method of the joint extraction is to regard named entity identification and relationship extraction as a subtask, and directly extract the triad in the data through a joint learning method. The problem of error accumulation caused by the problem of accuracy of named entity identification is solved, the accuracy of entity relation extraction is improved, and the work of the patent is also based on a joint learning method. A new idea of entity relation joint extraction is provided.
Through retrieval, application publication No. CN111666427A, a method, apparatus, device and medium for entity relationship joint extraction includes: acquiring training sample data; training a pre-built entity relation extraction model by using the training sample data to obtain a trained model; wherein, the entity relation extraction model comprises a self-attention layer; the self-attention layer is used for performing attention calculation on the basis of the influence of other triples in the sentence on the current prediction relation in the training process; and when a target text to be subjected to entity relationship extraction is obtained, outputting a corresponding entity relationship extraction result by using the trained model. Therefore, the entity relationship extraction model comprising the self-attention layer is trained, the influence of other triads on the current prediction relationship can be considered in the extraction process of the entity relationship, and the accuracy of the entity relationship extraction is improved.
The problem of performing entity relationship joint extraction based on the Bert + cnn model proposed in publication No. CN111666427A is as follows:
1. it has higher complexity, is not beneficial to the landing of the model,
2. meanwhile, the word level matrix used by the method is difficult to solve the triple overlapping problem,
3. finally, it relies on the CNN model, which also has many drawbacks in solving the long-time problem, and it cannot capture long-distance information.
The invention aims at the improvement method as follows:
1. first, the present invention introduces a half-label half-pointer network, which has better capability to solve the triple overlapping problem compared with the publication number CN111666427A,
2. secondly, the present patent uses a transformer model as a feature extractor, which is superior to the model proposed by CN111666427A in solving the long-time-series problem.
3. Finally, the model of the invention can reduce the complexity of the model and simultaneously can obtain the effect superior to the model triplet extraction proposed in CN 111666427A.
Application publication No. CN113157936A, a method, an apparatus, an electronic device, and a storage medium for entity relationship joint extraction, where the method includes: obtaining a marker sequence; determining a semantic representation from the tag sequence; determining a feature map matrix according to the tag sequence and the semantic representation; predicting a word level matrix related to entity information, a word level matrix related to the entity and the relationship and a word level matrix related to the triple according to the characteristic diagram matrix; and combining the word level matrixes related to the triples to obtain the target triples. According to the embodiment of the application, the word level matrixes related to entity information, the word level matrixes related to entities and relations and the word level matrixes related to triples are determined in stages, the target triples are extracted by using the semantic division frame through a multi-stage entity relation extraction combination method based on image semantic division, the problems of entity overlapping and error accumulation are solved, and the extraction effect is improved through a multi-stage progressive mode.
The publication number CN113157936A also proposes an entity relationship joint extraction model based on Bert + CRF, which adopts a new labeling scheme, and has the following problems:
1. the triple overlapping problem is difficult to solve, in the entity relationship joint extraction problem, a plurality of relationships may exist between entities, however, the current classifier has the condition of classification confusion, and the invention also has the condition.
2. The method is also based on a Bert model, the model complexity is high, the method depends on a CRF model, the model is a time sequence model, the situation of gradient disappearance or gradient explosion easily occurs, and long-distance information is difficult to capture.
Our solution is as follows:
1. the scheme of half pointer and half mark is adopted, so that the confusion condition of the classifier is avoided, and the overlapping problem of triples is further avoided.
2. The model is based on a transformer model, the complexity of the model is relatively low, the defects of a time sequence model are avoided, and compared with the model proposed in CN113157936A, the model has better capability of solving the long time sequence problem.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. The entity relationship joint extraction method and system based on the Transformer are provided, hidden information in unstructured data is extracted, a knowledge graph is constructed, and meanwhile the performance of entity relationship extraction is improved. The technical scheme of the invention is as follows:
a Transformer-based entity relation joint extraction method comprises the following steps:
the method comprises the steps of obtaining an internet data set, preprocessing the internet data set, connecting sentences in the data set with corresponding triples by using preset identifiers, marking a starting position and an ending position of a main entity, a relation and an auxiliary entity, and needing preset separators when a plurality of triples are involved, and simultaneously training data needing the starting identifiers and the ending identifiers, wherein the processed data are as follows; the special delimiters and special start and end identifiers refer to:
[SOS]h(1),r(1),t(1)[S2S_SEQ]
h(2),r(2),t(2)[S2S_SEQ]
...
h(n),r(n),t(n)[EOS]
vectorization mapping is carried out on each word in the processed data set, meanwhile, a position vector is calculated through the position of each word in a sentence, the position vector is input into a neural network model based on a Transformer, and then training is carried out through a reverse propagation algorithm, so that an entity relation combined extraction model based on an attention mechanism is obtained;
and inputting the sentences needing entity relationship extraction into a trained entity relationship joint extraction model based on a Transformer, and predicting the triples in each sentence.
Further, the training process of the transform-based neural network model comprises:
1) mapping each word or word in the input sentence into a corresponding word vector;
2) in the coding layer, the word vector corresponding to each word in the training sample is used as input,learning the context information of each word in the sentence by adopting a Transformer encoder, and simultaneously obtaining a representation vector Hl
3) In predicting the main entities in the training examples by means of a classifier, wherein the starting position p of each main entity in the training examples is predicted separately by means of a binary classifierstartAnd an end position pendAnd a vector representation of the host entity
Figure BDA0003394953370000041
4) In the decoding layer, the expression vector H output by the encoder islHost entity predicted by binary classifier
Figure BDA0003394953370000042
Processing in a preset mode or simply adding to obtain a new context expression vector MlIn the pair MlDecoding is carried out, and the subordinate entities are classified through a binary classifier;
5) calculating according to the obtained vector representation of the label to obtain the starting position and the ending position of the main entity, the relation and the auxiliary entity respectively;
6) selecting the maximum likelihood functions of all samples as the target function of the model;
7) and training the model through a back propagation algorithm, updating all parameters in the model, and finally obtaining a converged entity relation joint extraction model.
Further, the training sample is processed by using a special identifier according to the training sample and the triple information in the training set, wherein the training sample needs at least two identifiers, namely a start identifier and an end identifier; the triplet information of this example requires at least three identifiers, respectively a start identifier, a separator, and an end identifier; the data after triple processing is as follows:
[SOS]h(1),r(1),t(1)[S2S_SEQ]
h(2),r(2),t(2)[S2S_SEQ]
...
h(n),r(n),t(n)[EOS]
where h, r, t represent the primary, relational and secondary entities, [ SOS ], [ S2S _ SEQ ], and [ EOS ] represent the start identifier, triplet delimiter and triplet end identifier, respectively, of the triplet.
Further, the 1) mapping each word or word in the input sentence into a corresponding word vector specifically includes:
the word vector representation of a training set is obtained through word2vec training, each word input into a training sample is mapped into a corresponding word vector, the value with the longest length in the training sample is selected as max _ len, and when the length of a sentence is smaller than max _ len, a special placeholder is used for supplement.
Further, 3) predicting the starting position p of each main entity in the training sample respectively through a binary classifierstartAnd an end position pendAnd vector representation of the principal entity
Figure BDA0003394953370000052
The method specifically comprises the following steps:
pstart=σ(Wstartxi+bstart)
pend=σ(Wendxj+bend)
wherein p isstartRepresenting the starting position of the main entity, pendRepresenting the end position of the subordinate entity, Wstart,Wend,bstartAnd bendRespectively, training parameters, and the main entity label obtains better performance by optimizing the following likelihood function:
Figure BDA0003394953370000051
wherein x isjRepresenting the jth sentence in the training set, i representing the index of the word in the sentence, N representing the length of the sentence, t representing the index of the main entity in the sentence, s representing the main entity of the sentence, pθ(s|xj) Denotes xjSentenceThe probability of the master entity s, I z, is an indicator function, I z 1 when z is true, otherwise 0,
Figure BDA0003394953370000061
and
Figure BDA0003394953370000062
the tags of the ith word respectively represent that the tags are the starting position of the main entity when the value is 1, and represent the ending position otherwise. .
Further, the 6) selecting the maximum likelihood function of all samples as the target function of the model specifically includes:
for all training samples, training a model by maximizing the maximum likelihood function of the samples, updating parameters in the model until convergence, wherein a trained target function Loss is defined as follows:
Figure BDA0003394953370000063
where x represents a sentence in the training set, s represents a primary entity, r represents a relationship, o represents a secondary entity, and T represents a set of triples.
A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the transform-based entity-relationship joint extraction method of any one of the above.
A Transformer-based entity relationship joint extraction system comprises:
the data preprocessing module is used for identifying the starting positions of the main entity, the relation and the auxiliary entity in the training data;
the model training module is used for mapping each word in the sentence of the training data into a corresponding word vector, inputting the word vector into a neural network model based on a Transformer, and training the word vector through a back propagation algorithm to obtain an entity relationship extraction model;
and the result processing module is used for inputting the sentences which need to be subjected to entity relationship extraction into the trained entity relationship extraction model and extracting possible triple information in the sentences.
The invention has the following advantages and beneficial effects:
the invention provides an entity relationship combined extraction model based on a Transformer, and simultaneously introduces a semi-pointer semi-marker network, takes an entity relationship combined extraction task as a sequence-to-sequence task, firstly extracts a main entity, and then extracts a secondary entity and a relationship. In the stage of predicting the main entity, the starting position and the ending position of the main entity in the sentence are respectively predicted through a semi-pointer semi-tagged network, the process can simultaneously predict a plurality of main entities, in the stage of predicting the auxiliary entity, each semi-pointer semi-tagged network corresponds to a triple relationship, and the starting position and the ending position of the auxiliary entity are simultaneously predicted. The invention also proposes a corresponding optimization function for the model. Compared with the traditional extraction method of the pipeline, the method has no error propagation problem, and simultaneously considers the correlation of named entity identification and relation extraction.
Drawings
FIG. 1 is a flow diagram of one possible system framework provided by the preferred embodiment of the present invention;
FIG. 2 is a diagram of a transform-based neural network architecture according to an embodiment of the present invention;
fig. 3 is a structural diagram of a transformer according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
the invention provides a Transformer-based entity relationship joint extraction method, medium and system, which comprises the following steps:
a Transformer-based entity relation joint extraction method comprises the following steps:
firstly, data processing is carried out on training samples in training data, the training samples are separated from triples by preset separators, meanwhile, the training samples need start identifiers and end identifiers, the triples also need connection identifiers, and the start positions and the end positions of main entities, relations and auxiliary entities are marked. The processed training samples are subjected to word vectorization, the problem of OOV can be avoided to a certain extent by word vectors compared with word vectors, but the problem of segmentation boundaries exists, and word vectorization methods, specifically which vectorization method is adopted, can be adopted here, and selection is performed according to different application scenes.
And inputting the word vectors in the processed training samples into a neural network model based on a Transformer, and performing model training through a back propagation algorithm to obtain parameters required by the model.
And inputting the sentences which need to be subjected to entity relationship extraction into the trained entity relationship joint extraction model based on the Transformer, and predicting possible entity relationship triples in the sentences. .
Specifically, as shown in fig. 1, fig. 1 is a flowchart of a method for extracting entity relationship based on a transform in this embodiment, and as shown in the figure, the method mainly includes three stages: a training data preprocessing stage, a model training stage and a model prediction stage.
Step 101 uses a preset identifier to process according to training examples and triplet information in a training set. Wherein the training sample requires at least two identifiers, a start identifier and an end identifier. The triplet information of this example requires at least three identifiers, a start identifier, a separator, and an end identifier. The data after triple processing is as follows:
[SOS]h(1),r(1),t(1)[S2S_SEQ]
h(2),r(2),t(2)[S2S_SEQ]
...
h(n),r(n),t(n)[EOS]
where h, r, t represent the primary, relational and secondary entities, [ SOS ], [ S2S _ SEQ ], and [ EOS ] represent the start identifier, triplet delimiter and triplet end identifier, respectively, of the triplet.
102, using a non-labeled corpus to obtain a word vector representation with semantic information through word2vec training, and providing the word vector representation for a model.
Step 103, with reference to fig. 2, the transform-based entity-relationship joint extraction model includes the following specific steps:
step 1, using word vector representation of a training set obtained through word2vec training to map each word input into a training sample into a corresponding word vector, selecting the value with the longest length in the training sample as max _ len, and supplementing the sentence with a special placeholder when the sentence length is smaller than max _ len.
Step 2, performing special processing on the word vectors and the position vectors, giving position information to the input vectors, inputting the position information as a model, learning the context information of each word in the input sentence by adopting a multi-layer Transformer encoder, and obtaining corresponding vector representation Hl
Step 3, obtaining vector representation HlThen, respectively predicting the starting positions p of the main entities by adopting a binary classifierstartAnd an end position pendThe process can predict a plurality of main entities, and solves the overlapping problem of the triples. The detailed process of the process is as follows:
pstart=σ(Wstartxi+bstart)
pend=σ(Wendxj+bend)
wherein p isstartRepresenting the starting position of the main entity, pendRepresenting the end position of the subordinate entity, Wstart,Wend,bstartAnd bendRespectively training parameters. The main entity label obtains better performance by optimizing the following likelihood function:
Figure BDA0003394953370000091
where I { z } is an indicator function, I { z } is 1 when z is true, and 0, x otherwisejRepresents the jth sentence in the training set, i represents the index of the word in the sentence, N represents the length of the sentence, t represents the index of the main entity in the sentence, s represents the main entity of the sentence, pθ(s|xj) Denotes xjThe probability of the main entity s in the sentence,.
Figure BDA0003394953370000092
And
Figure BDA0003394953370000093
the tags of the ith word respectively represent that the tags are the starting position of the main entity when the value is 1, and represent the ending position otherwise.
Step 4, after the main entity is obtained, the main entity and the expression vector H obtained by the transform coder are used forlSpecial processing is carried out to obtain a context expression vector containing main entity information, the context expression vector is input into a decoder to obtain decoded context information, and the decoded context information is used for predicting the starting positions of the auxiliary entities through a binary classifier
Figure BDA0003394953370000094
And end position of the subordinate entity
Figure BDA0003394953370000095
The detailed process of the process is as follows:
Figure BDA0003394953370000096
Figure BDA0003394953370000097
wherein
Figure BDA0003394953370000098
Representing the probability that the ith position is the start position of the entity,
Figure BDA0003394953370000099
representing the probability that the jth location is the end location of the entity. Wstart,Wend,bstartAnd bendRespectively training parameters. The likelihood function for the master entity and the relationship is as follows:
Figure BDA0003394953370000101
wherein the decoder includes a multi-headed self-attention mechanism, etc., as shown in fig. 3.
And 5, training the model by maximizing the maximum likelihood function of the sample for all training samples, updating parameters in the model until convergence, wherein the training target function Loss is defined as follows:
Figure BDA0003394953370000102
where x represents a sentence in the training set, s represents a primary entity, r represents a relationship, o represents a secondary entity, and T represents a set of triples.
And 6, training the model through a back propagation algorithm, updating all parameters in the model, and finally obtaining an entity relationship combined extraction model.
And 104, inputting the sentences needing entity relationship joint extraction into the obtained entity relationship joint extraction model, and predicting possible triples in the sentences.
According to the scheme, aiming at the problem that the utilization of the relation information of the whole sentence is insufficient in the entity relation combined extraction model, the transform coder is introduced to extract the depth characteristics, so that the prediction performance of the model can be improved, and the method has good practicability.
Another embodiment of the present invention provides a system for extracting entity relationship based on Transformer jointly, which includes:
and the data preprocessing module is responsible for identifying the starting positions of the main entity, the relation and the auxiliary entity in the training data.
And the model training module is responsible for mapping each word in the sentence of the training data into a corresponding word vector, inputting the word vector into the neural network model based on the Transformer, and training the word vector through a back propagation algorithm to obtain the entity relationship extraction model.
And the result processing module is responsible for inputting the sentences which need to be subjected to entity relation extraction into the trained entity relation extraction model and extracting possible triple information in the sentences.
In the invention, the Transformer networks of the coding layer and the decoding layer can be replaced by other neural network structures.
Preferably, another embodiment is: a Transformer-based entity relationship joint extraction system comprises:
an input layer: the input layer uses the same model input as the transformer to perform special processing on word embedding and position embedding to obtain corresponding text representation and input the text representation into the model.
And (3) coding layer: according to previous researches, the deeper the level of a deep learning model is, the more semantic representations of the deep level of a sentence can be extracted, a 12-level Transformer is designed in a coding layer, and a multi-head attention mechanism in the Transformer can also achieve the effect similar to a multi-channel convolutional neural network.
A decoding layer: the multi-layer decoder can overcome the problem that the addition process of the sentence vector representation after the prediction of the main entity has an influence on the prediction of the auxiliary entity and the relation, and the output layer outputs the starting position and the ending position of the auxiliary entity.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (8)

1. A Transformer-based entity relation joint extraction method is characterized by comprising the following steps:
the method comprises the steps of obtaining an internet data set, preprocessing the internet data set, connecting sentences in the data set with corresponding triples by using preset identifiers, marking the starting positions and the ending positions of a main entity, a relation and an auxiliary entity, and needing preset separators when the triples relate to a plurality of triples, and simultaneously training data needing preset starting and ending identifiers, wherein the processed data are as follows;
vectorization mapping is carried out on each word in the processed data set, meanwhile, a position vector is calculated through the position of each word in a sentence, the position vector is input into a neural network model based on a Transformer, and then training is carried out through a back propagation algorithm, so that an entity relation combined extraction model based on the Transformer is obtained;
and inputting the sentences needing entity relationship extraction into a trained entity relationship joint extraction model based on a Transformer, and predicting triples in the sentences.
2. The method for extracting entity relationship combination based on Transformer according to claim 1, wherein the training process of the neural network model based on Transformer comprises:
1) mapping each word or word in the input sentence into a corresponding word vector;
2) in the coding layer, a word vector corresponding to each word in the training sample is taken as input, a Transformer encoder is adopted to learn the context information of each word in the sentence, and a representation vector H is obtained at the same timel
3) In predicting the main entities in the training examples by means of a classifier, wherein the starting position p of each main entity in the training examples is predicted separately by means of a binary classifierstartAnd an end position pendAnd vector representation of the principal entity
Figure FDA0003394953360000011
4) In the decoding layer, the table of the encoder outputVector representation HlHost entity predicted by binary classifier
Figure FDA0003394953360000012
Splicing or simply adding the obtained context expression vectors in a preset mode to obtain a new context expression vector MlIn the pair MlDecoding is carried out, and the secondary entities are classified through a binary classifier;
5) calculating according to the obtained vector representation of the label to obtain the starting position and the ending position of the main entity, the relation and the auxiliary entity respectively;
6) selecting the maximum likelihood functions of all samples as the target function of the model;
7) and training the model through a back propagation algorithm, updating all parameters in the model, and finally obtaining a converged entity relationship joint extraction model.
3. The method for extracting entity relationship combination based on Transformer as claimed in claim 1 or 2, wherein the method uses a special identifier process according to training examples and triplet information in the training set, wherein the training examples require at least two identifiers, namely a start identifier and an end identifier; the triplet information of this example requires at least three identifiers, respectively a start identifier, a separator, and an end identifier; the data after triple processing is as follows:
[SOS]h(1),r(1),t(1)[S2S_SEQ]
h(2),r(2),t(2)[S2S_SEQ]
...
h(n),r(n),t(n)[EOS]
where h, r, t represent the primary, relational and secondary entities, [ SOS ], [ S2S _ SEQ ], and [ EOS ] represent the start identifier, triplet delimiter and triplet end identifier, respectively, of the triplet.
4. The method of claim 2, wherein 1) mapping each word or word in an input sentence to a corresponding word vector comprises:
the word vector representation of the training set is obtained through word2vec training, each word input into the training sample is mapped into a corresponding word vector, the value with the longest length in the training sample is selected as max _ len, and when the sentence length is smaller than max _ len, the sentence length is supplemented by a special placeholder.
5. The method of claim 2, wherein 3) the starting position p of each main entity in the training sample is predicted respectively through a binary classifierstartAnd an end position pendAnd vector representation of the principal entity
Figure FDA0003394953360000021
The method specifically comprises the following steps:
pstart=σ(Wstartxi+bstart)
pend=σ(Wendxj+bend)
wherein p isstartRepresenting the starting position of the main entity, pendRepresenting the end position of the subordinate entity, Wstart,Wend,bstartAnd bendRespectively, training parameters, and the main entity label obtains better performance by optimizing the following likelihood function:
Figure FDA0003394953360000031
wherein x isjRepresenting the jth sentence in the training set, i representing the index of the word in the sentence, N representing the length of the sentence, t representing the index of the main entity in the sentence, s representing the main entity of the sentence, pθ(s|xj) Denotes xjThe probability of the dominant entity s in the sentence, I z, represents the indicator function, I z 1 when z is true, otherwise 0,
Figure FDA0003394953360000032
and
Figure FDA0003394953360000033
the tags of the ith word respectively represent that the tags are the starting position of the main entity when the value is 1, and represent the ending position otherwise.
6. The method for entity relationship joint extraction based on Transformer according to claim 2, wherein the 6) selecting the maximum likelihood function of all samples as the objective function of the model specifically comprises:
for all training samples, training a model by maximizing the maximum likelihood function of the samples, updating parameters in the model until convergence, wherein a trained target function Loss is defined as follows:
Figure FDA0003394953360000034
where x represents a sentence in the training set, s represents a primary entity, r represents a relationship, o represents a secondary entity, and T represents a set of triples.
7. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, implements the method for transform-based entity relationship joint extraction according to any one of claims 1 to 6.
8. A Transformer-based entity relationship joint extraction system is characterized by comprising:
the data preprocessing module is used for identifying the starting positions of the main entity, the relation and the auxiliary entity in the training data;
the model training module is used for mapping each word in the sentence of the training data into a corresponding word vector, inputting the word vector into a neural network model based on a Transformer, and training the word vector through a back propagation algorithm to obtain an entity relation extraction model based on the Transformer;
and the result processing module is used for inputting the sentences which need to be subjected to entity relationship extraction into the trained entity relationship extraction model and extracting possible triple information in the sentences.
CN202111480107.7A 2021-12-06 2021-12-06 Entity relation joint extraction method, medium and system based on Transformer Pending CN114398489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111480107.7A CN114398489A (en) 2021-12-06 2021-12-06 Entity relation joint extraction method, medium and system based on Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111480107.7A CN114398489A (en) 2021-12-06 2021-12-06 Entity relation joint extraction method, medium and system based on Transformer

Publications (1)

Publication Number Publication Date
CN114398489A true CN114398489A (en) 2022-04-26

Family

ID=81225409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111480107.7A Pending CN114398489A (en) 2021-12-06 2021-12-06 Entity relation joint extraction method, medium and system based on Transformer

Country Status (1)

Country Link
CN (1) CN114398489A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098617A (en) * 2022-06-10 2022-09-23 杭州未名信科科技有限公司 Method, device and equipment for labeling triple relation extraction task and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098617A (en) * 2022-06-10 2022-09-23 杭州未名信科科技有限公司 Method, device and equipment for labeling triple relation extraction task and storage medium

Similar Documents

Publication Publication Date Title
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
CN111444340A (en) Text classification and recommendation method, device, equipment and storage medium
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN114612759B (en) Video processing method, video query method, model training method and model training device
JP2023535709A (en) Language expression model system, pre-training method, device, device and medium
CN113590784B (en) Triplet information extraction method and device, electronic equipment and storage medium
CN112434533B (en) Entity disambiguation method, entity disambiguation device, electronic device, and computer-readable storage medium
WO2021190662A1 (en) Medical text sorting method and apparatus, electronic device, and storage medium
CN104750677A (en) Speech translation apparatus, speech translation method and speech translation program
CN115146068B (en) Method, device, equipment and storage medium for extracting relation triples
CN115544303A (en) Method, apparatus, device and medium for determining label of video
CN112613293A (en) Abstract generation method and device, electronic equipment and storage medium
CN113076756A (en) Text generation method and device
CN114398489A (en) Entity relation joint extraction method, medium and system based on Transformer
CN111814496B (en) Text processing method, device, equipment and storage medium
CN111460224B (en) Comment data quality labeling method, comment data quality labeling device, comment data quality labeling equipment and storage medium
CN108038109A (en) Method and system, the computer program of Feature Words are extracted from non-structured text
CN116522905A (en) Text error correction method, apparatus, device, readable storage medium, and program product
CN113704466B (en) Text multi-label classification method and device based on iterative network and electronic equipment
CN113392649B (en) Identification method, device, equipment and storage medium
CN112818687B (en) Method, device, electronic equipment and storage medium for constructing title recognition model
CN116976341A (en) Entity identification method, entity identification device, electronic equipment, storage medium and program product
CN114547313A (en) Resource type identification method and device
CN110276001B (en) Checking page identification method and device, computing equipment and medium
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination