CN114564563A - End-to-end entity relationship joint extraction method and system based on relationship decomposition - Google Patents

End-to-end entity relationship joint extraction method and system based on relationship decomposition Download PDF

Info

Publication number
CN114564563A
CN114564563A CN202210166252.6A CN202210166252A CN114564563A CN 114564563 A CN114564563 A CN 114564563A CN 202210166252 A CN202210166252 A CN 202210166252A CN 114564563 A CN114564563 A CN 114564563A
Authority
CN
China
Prior art keywords
entity
relationship
sentence
relation
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210166252.6A
Other languages
Chinese (zh)
Inventor
张璇
高宸
杜鲲鹏
农琼
马秋颖
袁子豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN202210166252.6A priority Critical patent/CN114564563A/en
Publication of CN114564563A publication Critical patent/CN114564563A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses an end-to-end entity relationship joint extraction method based on relationship decomposition, which is characterized by comprising the following steps of: data preprocessing, namely converting the entity and relation triples marked in the training set into a vector form according to a dictionary in the BERT model; model training: and (3) carrying out relation classification according to the text vector output by the BERT model, and then fusing the relation characteristics and sentence characteristics for head and tail entity recognition: and (4) decoding the result: and decoding the entity tags identified under different relation categories, and combining the entity tags with the relations to obtain entity relation triples existing in the sentences. By modeling sentence characteristics under different relationships, the extraction problem of overlapping triples in the sentences can be effectively solved, the performance of entity relationship joint extraction is improved, and the method has good practicability.

Description

End-to-end entity relationship joint extraction method and system based on relationship decomposition
Technical Field
The invention relates to deep learning and natural language processing technologies, in particular to an end-to-end entity relationship joint extraction method and system based on relationship decomposition.
Background
Triple extraction is an important component in information extraction, and is to acquire structured knowledge in the form of (head entity, relation, tail entity) from a group of unstructured texts, which is also called entity relation extraction. This is one of the key tasks for constructing knowledge graphs, and is an important basis for other related natural language processing tasks, such as: machine translation, text summarization, recommendation systems, etc.
In the early extraction methods, entity relation extraction is mostly performed in a pipeline-based manner, and in such methods, an extraction task is regarded as two independent subtasks, namely named entity identification and relation classification. This approach has high flexibility and simplifies the process flow, but also has disadvantages including: error accumulation, physical redundancy, and lack of interaction.
In order to overcome the defects of a pipeline extraction mode, entity-relationship joint extraction uses a model to simultaneously extract entities and relationships. Most of the initial joint extraction methods are feature-based models, and these models require complex preprocessing processes and rely on feature extraction tools, so that not only are the processes complicated, but also other errors are easily introduced.
In order to reduce manual feature engineering, end-to-end entity relationship joint extraction is started by using a neural network, and the method is divided into a joint decoding method and a parameter sharing method. The joint decoding method adopts a new labeling strategy to uniformly label the entities and the relations, and changes the original joint learning model of two subtasks related to named entity identification and relation classification into a sequence labeling problem. The parameter sharing method carries out joint learning by sharing coding layer parameters of a joint model so as to realize the mutual dependence between two subtasks. The end-to-end combined extraction method can utilize the interaction information between the entities and the relations, simultaneously extract the entities and classify the relations of the entity pairs, and well solves the problems brought by the pipeline method. However, the conventional entity-relationship joint extraction scheme only considers the case of extracting a triple in a sentence. In practice, as shown in fig. 4, the sentences extracted by us often contain multiple triples, and these triples may also have overlapping entities and relationships.
Disclosure of Invention
The invention aims to: aiming at the existing problems, an end-to-end entity relationship joint extraction method and system based on relationship decomposition are provided, sentence features under different relationships are respectively extracted, an attention mechanism is combined, and a BERT pre-training model is introduced, so that the information of the whole input sentence is fully utilized, the extraction problem of overlapping triples is solved, and the performance of entity relationship joint extraction is improved.
The technical scheme adopted by the invention is as follows:
the invention relates to an end-to-end entity relationship joint extraction method based on relationship decomposition, which comprises the following steps: data preprocessing, namely converting sentences of the entity relationship to be extracted according to a format required by BERT, converting the sentences into a vector form and inputting the vector form as a BERT model; meanwhile, the triple label is converted into a vector form; respectively marking out the relation, the head entity and the tail entity in the sentence;
model training: combining the text vector output by the BERT model with a sentence vector generated by an attention mechanism to obtain final vector representation of a sentence, and carrying out relation classification through a sigmoid function to identify the relation in the sentence; fusing the obtained relation characteristics with sentence characteristics for head and tail entity recognition;
and (4) decoding the result: and decoding the entity tags identified under different relation categories, and combining the entity tags with the relations to obtain entity relation triples existing in the sentences.
Preferably, each tag in the data pre-processing comprises: the relation type contained in the sentence and the position of the entity in the sentence under the corresponding relation type; and generating two groups of sentence marking sequences according to each relationship type, wherein the two groups of sentence marking sequences respectively represent the positions of the head entity and the tail entity in the triple.
Preferably, if the relationship type is one of the predefined relationship types, the relationship type is represented by two labels of 0 and 1; if the current relationship exists in the sentence, marking the subscript of the corresponding relationship as 1, otherwise, marking the subscript of the corresponding relationship as 0; the position of the entity in the corresponding relation type in the sentence is represented by 0 or 1 or 2 according to the correspondence of the head entity and the tail entity to two different labeling sequences, wherein 0 represents that the word at the current position is not a part of the entity, 1 represents that the word is the starting position of the entity, and 2 represents that the word is the ending position of the entity.
Preferably, the specific process of model training includes:
s21: inputting the text vector representation obtained in the data preprocessing stage into a BERT model, coding by adopting the BERT model based on a transform structure, and learning the context information of each word in a sentence;
s22: carrying out global average pooling on word vectors output by the BERT to obtain sentence-level vector representation; introducing an attention mechanism to learn word expressions having key effects on the sentence classifiers, and merging the word expressions and sentence-level vector expressions obtained after global average pooling to obtain final vector expressions of the sentences;
s23: according to the final vector representation of the sentence, carrying out multi-relation classification through a sigmoid function, and identifying the relation contained in the sentence;
s24: after the relation types contained in the sentences are obtained, one relation is randomly selected, and vector representation is obtained according to the relation embedding; combining the specific relationship vector characteristics with the sentence vector representation based on words output by the BERT model, and identifying entities under specific relationships;
s25: and constructing sentence vector representation under a specific relation for the relation identified in each sentence, and carrying out entity identification on the sentence vector.
Preferably, the method further comprises the following steps: and for all training samples, selecting the maximum likelihood function of the samples, training the model through a back propagation algorithm, and updating parameters in the model.
Preferably, in the training process, one relation in sentences and the corresponding triple are randomly selected for training; all triples in the training set are added to the training by extending the round of model training.
Preferably, the relationship vector features are combined with the word-based sentence vector representation output by the BERT model, and a conditional layer normalization method is adopted: conditional level normalization is performed on the word-based sentence vector representation with the relationship vector features as conditions.
Preferably, the relationship embedding is randomly initialized according to predefined relationship categories.
Preferably, in the result decoding stage, the following three matching methods are classified according to the number of the identified head entity and tail entity: if the number of the head entities is 1, matching the head entities with all tail entities; if the number of tail entities is 1, matching the tail entities with all head entities; and if the number of the head entity and the number of the tail entity are both more than 1, matching the head entity and the tail entity according to a nearby matching principle.
The invention relates to an end-to-end entity relation joint extraction system based on relation decomposition, which comprises:
a data preprocessing module: carrying out word segmentation on the entity and relation triples labeled in the training set according to a dictionary in the BERT model to convert the entities and relation triples into a vector form;
a model training module: inputting a vector corresponding to each word in a sentence in a training set into a BERT model, wherein a text vector output by the BERT model is input into a neural network model jointly extracted based on an end-to-end entity relationship of relationship decomposition, and is trained through a back propagation algorithm to obtain a label prediction model;
a result decoding module: inputting sentences needing entity relation extraction into a trained label prediction model, predicting the relation type of the sentences and labels corresponding to each word in the sentences under the corresponding relation; and obtaining entity relationship triples existing in the sentences according to the relationship type tags and the entity tags in the corresponding relationship.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. the method solves the problem that the traditional entity relation joint extraction scheme is difficult to solve the problem of overlapping triples in sentences.
2. The invention respectively extracts the sentence characteristics under different relations, introduces a BERT pre-training model by combining an attention mechanism and fully utilizes the information of the whole input sentence.
3. The method and the device improve the performance of entity relation joint extraction and have good practicability.
Drawings
The invention will now be described, by way of example, with reference to the accompanying drawings, in which:
fig. 1 is a flowchart of an end-to-end entity relationship joint extraction method based on relationship decomposition according to the present invention.
FIG. 2 is a diagram of an embodiment of a neural network model structure based on end-to-end entity relationship joint extraction of relationship decomposition.
FIG. 3 is a flowchart of a specific labeling process in the embodiment.
FIG. 4 is a diagram illustrating the overlapping of triples in a sentence being extracted.
Detailed Description
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
Any feature disclosed in this specification (including any accompanying claims, abstract) may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.
As shown in FIG. 1, the present invention discloses a method for extracting end-to-end entity relationship combination based on relationship decomposition, which comprises the following steps:
data preprocessing, namely converting sentences of the entity relationship to be extracted into a vector form according to a format required by BERT, and taking the vector form as the input of a BERT model; meanwhile, the triple label is converted into a vector form; respectively marking out the relation, the head entity and the tail entity in the sentence;
model training: combining the text vector output by the BERT model with a sentence vector generated by an attention mechanism to obtain final vector representation of a sentence, and carrying out relationship classification through a sigmoid function to identify the relationship in the sentence; fusing the obtained relation characteristics with sentence characteristics for head and tail entity recognition;
and (4) decoding the result: and decoding the entity tags identified under different relation categories, and combining the entity tags with the relations to obtain entity relation triples existing in the sentences.
The invention relates to an end-to-end entity relationship joint extraction method based on relationship decomposition, which can be used for extracting entity relationship triples of any natural language, wherein entities are not limited to specific texts and include data contents with common characteristics, such as news, microblogs, encyclopedias and the like.
The invention relates to a concrete realization process of an end-to-end entity relation joint extraction method based on relation decomposition, which comprises the following steps: a data preprocessing stage; a model training phase of the end-to-end entity relationship joint extraction neural network model based on relationship decomposition as shown in fig. 2; and matching the predicted relationship category and the entity label sequence to obtain a relationship entity triple stage.
S1: and in the data preprocessing stage, input data are NYT and WebNLG respectively. NYT is a large data set constructed based on the news corpus of New York Times, which contains 66194 sentences and 24 types of relationship categories, wherein 56195 sentences are used as training sets, 4999 sentences are used as verification sets, and the rest 5000 sentences are used as test sets. The data of WebNLG originated from articles in Wikipedia, forming a standard dataset from annotators manual annotations, containing 6222 sentences in total and 246 types of relationship categories.
S11: and converting into a tag sequence according to the triple information given in the labeled corpus. Firstly, the initial sentence is re-divided according to the input requirement of BERT, and the words outside the built-in dictionary of the BERT are split, so as to obtain a new sentence sequence. The new sentence sequence is then vectorized, which is divided into two phases: respectively, a relationship label and an entity label. The relation marking refers to marking all relations existing in the sentence according to predefined relation types and relations contained in the triples. The entity label is divided into a head entity and a tail entity, and two entity label sequences are constructed for each sentence. The words in the sentence are represented by types of 0, 1 and 2 according to the specific positions of the words in the sentence, wherein: 0 indicates that the current word is other words, 1 indicates that the current word belongs to the beginning of the entity, and 2 indicates that the current word belongs to the end of the entity. The specific labeling process is shown in fig. 3.
S2: the model training stage comprises the following specific steps:
s21: and (3) inputting the text vector representation obtained in the data preprocessing stage into a BERT pre-training model, wherein the length of input sentences is unified to max _ len, the sentences with the length smaller than the max _ len are supplemented by filling characters, and the sentences larger than the max _ len are truncated. And (4) coding and learning the text vector through a BERT model to obtain the context information of each word in the sentence, and outputting the vector representation of the sentence. The calculation formula is as follows:
xt=Wordpiece(wi)t∈[1,n],i∈[1,m]
ht=BERT(xt)t∈[1,n]
wherein
Figure BDA0003511837890000051
And dωThe dimension representing the hidden state of BERT.
Then, H ═ H is used1,h2,…,hn]To represent sentence features based on the context level of the word.
S22: global average pooling, root, of word vectors output by BERTA sentence-vector representation is derived from the word-based vector representation. In addition, the attention mechanism learning is introduced to express words having key action on the sentence classifier, and the words and the sentence vector expression obtained after global average pooling are combined to obtain the final vector expression of the sentence. For the k-th sentence input, the effective vector output by the BERT model is expressed by htSentence level vector S obtained after global average poolinghWith sentence vector representation S based on attention mechanismaCombining to form final sentence level vector representation S as input, and calculating vector representation of relation category labels; calculating according to the obtained vector representation of the relation category label to obtain the probability that the relation in each sentence corresponds to each category label;
the calculation formula is as follows:
Sh=GlobalAveragePooling(H)
M=tanh(H)
α=softmax(ωTM)
Sa=HαT
S=concat[Sh,Sa]
wherein
Figure BDA0003511837890000061
dωDimension representing BERT hidden state, dSRepresenting the dimensions of the sentence vector feature representation, ω being the trained parameter vector, ωTIs a transpose. The size of omega, alpha and T is dω,dα,T。
S23: and performing multi-classification task on the sentence through a sigmoid function according to the final vector representation of the sentence, thereby identifying the predefined relation category existing in the sentence. The calculation formula is as follows:
vj=σ(W1S+b1)
wherein
Figure BDA0003511837890000062
k represents the total number of relationship classes, and σ represents the sigmoid activation function. The function returns a value in the range of 0 to 1, which can be used as a predicate fingerA threshold value for whether a relationship exists is determined. According to the formula, all relationship types contained in the triples in the current sentence can be obtained.
S24: through the previous relation classification module, the relation classes contained in the sentences are obtained. Then, an entity identification module is carried out, and the specific steps are as follows:
s241: performing relation embedding, and generating vector representation Rel [ Rel ] of all relation classes1,Rel2,…,Relk,]And then acquiring corresponding relation vector representation according to the identified relation category.
S242: the sentence and the particular relationship vector representation are combined to generate a relationship-based sentence vector representation. Finally, entity identification is performed under a specific relationship. It should be noted that during the training process, the sentences are combined with one relationship at a time by randomly extracting the sentences. In the prediction process, sentences are copied according to the relation number in the sentences, and each sentence is combined with different relations. Therefore, the training round needs to be delayed to ensure that all relationships are selected for training. The calculation formula is as follows:
Figure BDA0003511837890000071
Figure BDA0003511837890000072
Figure BDA0003511837890000073
Figure BDA0003511837890000074
Figure BDA0003511837890000075
Figure BDA0003511837890000076
wherein the content of the first and second substances,
Figure BDA0003511837890000077
k represents the number of categories of the relationship, ReljRepresenting the relation vector representation combined with the current sentence, tag _ num represents the number of categories of tags, including three categories of 0, 1 and 2.
Figure BDA0003511837890000078
And
Figure BDA0003511837890000079
respectively represent the ith character in the relation ReljThe prediction is head entity and tail entity label probability under the condition.
S25: and for all training samples, training the model by maximizing the maximum likelihood function of the samples, and updating parameters in the model. Because the model comprises two parts of relation extraction and entity identification, the training loss of the model also comprises two parts, namely: a relational classification loss function and an entity identification loss function. Wherein, the entity identification loss part comprises a head entity loss and a tail entity loss. Loss of training of the model
Figure BDA00035118378900000710
(to minimize) the sum of the relationship label and the entity label negative log probability, defined as the prediction distribution, is given by the formula:
Figure BDA00035118378900000711
s3: and a result decoding stage:
s31: inputting sentences needing entity relationship extraction into the combined extraction model, and identifying the relationship contained in the sentences;
s32: obtaining different relation characteristic vector representations according to the relation quantity and relation embedding predicted in the S31; copying an input sentence, combining different relation characteristic vectors, and identifying entities under different relation categories;
s33: the entity identification is divided into two parts of head entity identification and tail entity identification.
The recognition process defines two rules: 1. head entities and tail entities in the same triple cannot contain each other; the 2-triplet head-to-tail entity length limit cannot be null or exceed 5. And matching according to the number of the head entity and the tail entity to obtain a triple, and realizing the extraction of the overlapped triple.
The scheme shows that the method introduces a BERT pre-training model and an attention mechanism to code sentences aiming at the problem that overlapping triples are difficult to process in entity relationship joint extraction, provides an end-to-end entity relationship joint extraction model based on relationship decomposition, can effectively improve the prediction performance of the overlapping triples, and has good practicability.
The end-to-end entity relationship joint extraction method based on relationship decomposition used by the invention is compared with the prior seven technologies, and is shown in the following table 1.
TABLE 1 comparison of the accuracy, recall and F1 index of the extraction method of the present invention with seven prior art techniques
Figure BDA0003511837890000081
As seen from Table 1, the method of the present invention achieves the best performance of entity relationship joint extraction in both datasets.
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims (10)

1. An end-to-end entity relationship joint extraction method based on relationship decomposition is characterized by comprising the following steps:
data preprocessing, namely converting sentences of the entity relationship to be extracted according to a format required by BERT, converting the sentences into a vector form and inputting the vector form as a BERT model; meanwhile, the triple label is converted into a vector form; respectively marking out the relation, the head entity and the tail entity in the sentence;
model training: combining the text vector output by the BERT model with a sentence vector generated by an attention mechanism to obtain final vector representation of a sentence, and carrying out relationship classification through a sigmoid function to identify the relationship in the sentence; fusing the obtained relation characteristics with sentence characteristics for head and tail entity recognition;
and (4) decoding the result: and decoding the entity tags identified under different relation categories, and combining the entity tags with the relations to obtain entity relation triples existing in the sentences.
2. The method for extracting end-to-end entity relationship joint based on relationship decomposition as claimed in claim 1, wherein each label in data preprocessing comprises: the relation type contained in the sentence and the position of the entity in the sentence under the corresponding relation type; and generating two groups of sentence marking sequences according to each relationship type, wherein the two groups of sentence marking sequences respectively represent the positions of the head entity and the tail entity in the triple.
3. The end-to-end entity relationship joint extraction method based on relationship decomposition as claimed in claim 2, wherein if the relationship type is one of predefined relationship types, the relationship type is represented according to two labels of 0 and 1; if the current relationship exists in the sentence, marking the subscript of the corresponding relationship as 1, otherwise, marking the subscript of the corresponding relationship as 0; the position of the entity in the corresponding relation type in the sentence is represented by 0 or 1 or 2 according to the correspondence of the head entity and the tail entity to two different labeling sequences, wherein 0 represents that the word at the current position is not a part of the entity, 1 represents that the word is the starting position of the entity, and 2 represents that the word is the ending position of the entity.
4. The method for extracting end-to-end entity relationship jointly based on relationship decomposition as claimed in claim 1, wherein the specific process of model training includes:
s21: inputting the text vector representation obtained in the data preprocessing stage into a BERT model, coding by adopting the BERT model based on a transformer structure, and learning the context information of each word in the sentence;
s22: carrying out global average pooling on word vectors output by the BERT to obtain sentence-level vector representation; introducing an attention mechanism to learn word expressions having key effects on the sentence classifiers, and merging the word expressions and sentence-level vector expressions obtained after global average pooling to obtain final vector expressions of the sentences;
s23: according to the final vector representation of the sentence, carrying out multi-relation classification through a sigmoid function, and identifying the relation contained in the sentence;
s24: after the relation types contained in the sentences are obtained, one relation is randomly selected, and vector representation is obtained according to the relation embedding; combining the specific relationship vector characteristics with the sentence vector representation based on words output by the BERT model, and identifying entities under specific relationships;
s25: and constructing sentence vector representation under a specific relation for the relation identified in each sentence, and carrying out entity identification on the sentence vector.
5. The method for extracting jointly end-to-end entity relationship based on relationship decomposition as claimed in claim 4, further comprising: and for all training samples, selecting the maximum likelihood function of the samples, training the model through a back propagation algorithm, and updating parameters in the model.
6. The end-to-end entity relationship joint extraction method based on relationship decomposition as claimed in claim 4, wherein in the training process, one relationship in sentences and the corresponding triple thereof are randomly selected for training; all triples in the training set are added to the training by extending the round of model training.
7. The method of claim 4, wherein the relationship vector features are combined with the word-based sentence vector representation output by the BERT model, and a conditional level normalization method is employed: conditional level normalization is performed on the word-based sentence vector representation with the relationship vector features as conditions.
8. The method for extracting jointly end-to-end entity relationship based on relationship decomposition as claimed in claim 4, wherein relationship embedding is initialized randomly according to predefined relationship categories.
9. The method for extracting end-to-end entity relationship joint based on relationship decomposition as claimed in claim 1, wherein at the stage of result decoding, the following three matching methods are classified according to the number of identified head entity and tail entity: if the number of the head entities is 1, matching the head entities with all tail entities; if the number of tail entities is 1, matching the tail entities with all head entities; and if the number of the head entity and the number of the tail entity are both more than 1, matching the head entity and the tail entity according to a nearby matching principle.
10. An end-to-end entity relationship joint extraction system based on relationship decomposition, comprising: a data preprocessing module: carrying out word segmentation on the entity and relation triples labeled in the training set according to a dictionary in the BERT model to convert the entities and relation triples into a vector form;
a model training module: inputting a vector corresponding to each word in a sentence in a training set into a BERT model, wherein a text vector output by the BERT model is input into a neural network model jointly extracted based on an end-to-end entity relationship of relationship decomposition, and is trained through a back propagation algorithm to obtain a label prediction model;
a result decoding module: inputting sentences needing entity relation extraction into a trained label prediction model, predicting the relation type of the sentences and labels corresponding to each word in the sentences under the corresponding relation; and obtaining entity relationship triples existing in the sentences according to the relationship type tags and the entity tags in the corresponding relationship.
CN202210166252.6A 2022-02-21 2022-02-21 End-to-end entity relationship joint extraction method and system based on relationship decomposition Pending CN114564563A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210166252.6A CN114564563A (en) 2022-02-21 2022-02-21 End-to-end entity relationship joint extraction method and system based on relationship decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210166252.6A CN114564563A (en) 2022-02-21 2022-02-21 End-to-end entity relationship joint extraction method and system based on relationship decomposition

Publications (1)

Publication Number Publication Date
CN114564563A true CN114564563A (en) 2022-05-31

Family

ID=81713412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210166252.6A Pending CN114564563A (en) 2022-02-21 2022-02-21 End-to-end entity relationship joint extraction method and system based on relationship decomposition

Country Status (1)

Country Link
CN (1) CN114564563A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114841151A (en) * 2022-07-04 2022-08-02 武汉纺织大学 Medical text entity relation joint extraction method based on decomposition-recombination strategy
CN115130466A (en) * 2022-09-02 2022-09-30 杭州火石数智科技有限公司 Classification and entity recognition combined extraction method, computer equipment and storage medium
CN115168599A (en) * 2022-06-20 2022-10-11 北京百度网讯科技有限公司 Multi-triple extraction method, device, equipment, medium and product

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115168599A (en) * 2022-06-20 2022-10-11 北京百度网讯科技有限公司 Multi-triple extraction method, device, equipment, medium and product
CN115168599B (en) * 2022-06-20 2023-06-20 北京百度网讯科技有限公司 Multi-triplet extraction method, device, equipment, medium and product
CN114841151A (en) * 2022-07-04 2022-08-02 武汉纺织大学 Medical text entity relation joint extraction method based on decomposition-recombination strategy
CN115130466A (en) * 2022-09-02 2022-09-30 杭州火石数智科技有限公司 Classification and entity recognition combined extraction method, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN112347268A (en) Text-enhanced knowledge graph joint representation learning method and device
CN111190997B (en) Question-answering system implementation method using neural network and machine learning ordering algorithm
CN112732916B (en) BERT-based multi-feature fusion fuzzy text classification system
CN114564563A (en) End-to-end entity relationship joint extraction method and system based on relationship decomposition
CN110969020A (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN112115238A (en) Question-answering method and system based on BERT and knowledge base
CN112069811A (en) Electronic text event extraction method with enhanced multi-task interaction
CN114896388A (en) Hierarchical multi-label text classification method based on mixed attention
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
CN111027595A (en) Double-stage semantic word vector generation method
CN110728151B (en) Information depth processing method and system based on visual characteristics
CN116127090B (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN113255320A (en) Entity relation extraction method and device based on syntax tree and graph attention machine mechanism
CN111191031A (en) Entity relation classification method of unstructured text based on WordNet and IDF
CN113515632A (en) Text classification method based on graph path knowledge extraction
CN114691864A (en) Text classification model training method and device and text classification method and device
CN116258137A (en) Text error correction method, device, equipment and storage medium
CN114925205A (en) GCN-GRU text classification method based on comparative learning
CN114612921A (en) Form recognition method and device, electronic equipment and computer readable medium
CN114048314A (en) Natural language steganalysis method
CN111259106A (en) Relation extraction method combining neural network and feature calculation
CN114398903B (en) Intention recognition method, device, electronic equipment and storage medium
CN115827871A (en) Internet enterprise classification method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination