CN114707494A - End-to-end entity link model training method, entity link method and device - Google Patents

End-to-end entity link model training method, entity link method and device Download PDF

Info

Publication number
CN114707494A
CN114707494A CN202210154521.7A CN202210154521A CN114707494A CN 114707494 A CN114707494 A CN 114707494A CN 202210154521 A CN202210154521 A CN 202210154521A CN 114707494 A CN114707494 A CN 114707494A
Authority
CN
China
Prior art keywords
entity
model
vector representation
knowledge base
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210154521.7A
Other languages
Chinese (zh)
Inventor
李劼
蒲仁杰
于艳华
丁琳萱
马昂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202210154521.7A priority Critical patent/CN114707494A/en
Publication of CN114707494A publication Critical patent/CN114707494A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an end-to-end entity link model training method, an entity link method and an entity link device, which are used for constructing an initial entity link model comprising a first BERT model, a second BERT model and a Global Pointer layer, performing named recognition based on the first BERT model and the Global Pointer layer and executing entity disambiguation based on the first BERT model and the second BERT model. In the training process of the model, the loss functions of the two parts of the designated identification and the entity disambiguation are combined, and the parameters of the two parts of the initial entity link model for executing the designated identification and the entity disambiguation are propagated reversely and adjusted, so that errors of the two parts are effectively transmitted and adjusted, the two parts of errors are dependent on each other, and the overall effect is improved. In the entity disambiguation process, the preset knowledge base is introduced as external knowledge, so that the disambiguation effect can be greatly improved.

Description

End-to-end entity link model training method, entity link method and device
Technical Field
The invention relates to the technical field of natural language processing, in particular to an end-to-end entity link model training method, an entity link method and an entity link device.
Background
Due to the diversity of natural language expression, the problem of synonymy of multiple words and multiple words exists, and the entity link technology is an effective method for solving the problem of information ambiguity. There are two main methods for linking entities: one is entity linking by a two-phase method of entity identification and entity disambiguation. The second is to use deep neural network to perform end-to-end entity link. The first technique divides the physical link into two separate stages, with the result of the first stage being the input to the second stage; the second technique uses a bidirectional LSTM network (bidirectional long-and-short term memory network) to obtain all possible indexes, and then calculates the similarity with all candidate entities, wherein the prediction result is obtained when the similarity is higher than a certain threshold value. However, the above methods have more or less respective problems, and the first method divides the entity linking task into two independent stages, and does not utilize the dependency relationship between the two stages, so that the error of the first stage is transmitted to the second stage in an uncorrectable manner. And compared with a pre-training model, the bidirectional LSTM network has limited capability of expressing features and is very dependent on the achievement of former work. Therefore, a new entity linking method is needed.
Disclosure of Invention
In view of this, embodiments of the present invention provide an end-to-end entity link model training method, an entity link method, and an apparatus, so as to eliminate or improve one or more defects in the prior art, and solve the problem in the prior art that errors in two stages of entity identification and entity disambiguation are relatively independent and cannot be conducted to repair, resulting in insufficient accuracy.
The technical scheme of the invention is as follows:
in one aspect, the present invention provides a method for training an end-to-end entity link model, including:
acquiring a training sample set, wherein the training sample set comprises a plurality of samples, each sample is a paragraph containing one sentence or a plurality of continuous sentences, and all entities and corresponding description information in the samples are marked as labels;
acquiring a preset knowledge base, wherein description information of a plurality of known entities is recorded in the preset knowledge base, and the known entities comprise a plurality of relationships of one word with multiple meanings and multiple synonyms;
acquiring an initial entity link model, wherein the initial entity link model comprises a first BERT (bidirectional Encoder retrieval from transformer) model, a second BERT model and a Global Pointer layer; the first BERT model is connected with the Global Pointer layer and used for executing nominal identification and obtaining all predicted entities in sentences or paragraphs in each sample; the first BERT model and the second BERT model are connected and used for carrying out entity disambiguation on all predicted entities one by one, wherein for a specified predicted entity, all output by the first BERT model are used as a first vector representation, a part corresponding to the specified predicted entity in the first vector representation is used as an entity vector representation, and the entity vector representation and the first vector representation are subjected to weighted summation to obtain an integral vector representation; searching the preset knowledge base according to the specified prediction entity, finding a positive example with the same meaning as the specified prediction entity and two negative examples with different meanings, and respectively carrying out vector representation operation on the description information of the positive example and the negative example corresponding to the specified prediction entity by the second BERT model to obtain corresponding second vector representation; connecting the overall vector representation with second vector representations of a positive example and a negative example corresponding to the specified prediction entity respectively, and inputting the overall vector representations into a full-connection layer for grading and judging whether the overall vector representations are synonymous;
and training the initial entity link model by adopting the training sample set, wherein all the predicted entities in each sample are subjected to entity disambiguation one by one with positive examples and negative examples in the preset knowledge base corresponding to the predicted entities, a joint loss function is calculated and propagated in a reverse direction, and parameters of the first BERT model, the second BERT model and the Global Pointer layer are integrally adjusted to obtain a target entity link model.
In some embodiments, the predetermined knowledge base configures candidate words having a word-polysemous or multi-word synonymy relationship with the known entity.
In some embodiments, in training the initial entity-link model using the training sample set, a learning rate is set to 2e-5, and a gradient descent is performed using the Adam algorithm.
In some embodiments, the joint LOSS function LOSS is calculated as:
Loss=λlossmd+(1-λ)lossed
therein, lossmdLoss of reference identification for the initial entity-link modeledLoss of entity disambiguation for the initial entity-linked model, λ being the weight coefficient, 0<λ<1。
In some embodiments, lossmdAnd lossedA cross entropy loss function is adopted, and lambda is 0.1.
In some embodiments, a weighted sum of the entity vector representation and the first vector representation results in an overall vector representation, the entity vector representation having a weight ratio of 0.7 and the first vector representation having a weight ratio of 0.3.
On the other hand, the invention also provides an end-to-end entity linking method, which comprises the following steps:
acquiring linguistic data to be processed and a preset knowledge base, wherein the preset knowledge base is recorded with description information of a plurality of known entities, and the known entities comprise a plurality of relations of one word multiple meaning and multiple word synonymy;
inputting the corpus to be processed into a target entity link model obtained in the end-to-end entity link model training method, performing named recognition by a first BERT model and a Global Pointer layer which are sequentially connected in the target entity link model, inquiring a preset knowledge base one by one for an entity certificate obtained by recognition to obtain a plurality of candidate words of each entity with a word-polysemous relationship, and performing entity disambiguation on all the entities one by the first BERT model and a second BERT model in the target entity link model, wherein for a specified entity, all output by the first BERT model are used as a first vector representation, a part corresponding to the specified entity in the first vector representation is used as an entity vector representation, and the entity vector representation and the first vector representation are subjected to weighted summation to obtain an overall vector representation; the second BERT model carries out embedding operation on the description information of the candidate words corresponding to the specified entity respectively to obtain corresponding second vector representations, the whole vector representations are connected with the second vector representations of the candidate words corresponding to the specified entity respectively, and the whole vector representations are input into a full connection layer to be used for grading and judging whether the two are synonymous or not so as to eliminate ambiguity;
and outputting the plurality of entities in the corpus to be processed, which are identified by the target entity link model, and the description information corresponding to each entity in the preset knowledge base.
In some embodiments, the preset knowledge base respectively encodes and marks a plurality of pieces of description information including a word ambiguity in each known entity, and encodes the corresponding description information of the entity label output by the target entity link model.
In another aspect, the present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of the method.
In another aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.
The invention has the beneficial effects that:
in the end-to-end entity link model training method, the entity link method and the device, an initial entity link model comprising a first BERT model, a second BERT model and a Global Pointer layer is constructed, named recognition is carried out based on the first BERT model and the Global Pointer layer, and entity disambiguation is carried out based on the first BERT model and the second BERT model. In the training process of the model, the loss functions of the two parts of the nominal identification and the entity disambiguation are combined, and meanwhile, the parameters of the two parts of the initial entity link model for executing the nominal identification and the entity disambiguation are propagated and adjusted in the opposite direction, so that errors of the two parts are effectively transmitted and adjusted, the errors are mutually dependent, and the overall effect is improved. In the entity disambiguation process, a preset knowledge base is introduced as external knowledge, so that the disambiguation effect can be greatly improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
fig. 1 is a schematic structural diagram of an initial entity link model in an end-to-end entity link model training method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an initial entity link model training process in the end-to-end entity link model training method according to another embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating an initial entity-link model testing process in the end-to-end entity-link model training method according to another embodiment of the present invention;
fig. 4 is a schematic diagram illustrating an initial entity link model testing process in the end-to-end entity link model training method according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
It should be noted that, in order to avoid obscuring the present invention with unnecessary detail, only the structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, and other details not closely related to the present invention are omitted.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
It is also noted that, unless otherwise specified, the term "coupled" is used herein to refer not only to a direct connection, but also to an indirect connection with an intermediate.
It should be noted that, entity linking (entity linking) is to map some character strings in a piece of text to corresponding entities in the knowledge base. There are many times when there is a same or different name, and therefore the mapping process needs to be disambiguated, for example, for the text "i are reading" harry baud ", where" harry baud "shall mean" harry baud "(book) rather than" harry baud "series of movies". Current entity links have typically identified a range of entity names (commonly referred to as instances), and the work that needs to be done is primarily disambiguation of entities (referred to as entities). There are also some efforts to do both entity identification and entity disambiguation, which becomes an end-to-end task.
In the prior art, the entity link model mostly adopts an independent two-stage method, firstly, the designated identification is carried out, then, the dependence between the two stages is lost, the irreversible error transmission problem is generated, and the error of the designated identification stage is transmitted to the entity disambiguation stage and cannot be adjusted. The invention carries out end-to-end Chinese entity link based on the pre-training model, jointly trains the two stages of nominal identification and entity disambiguation by designing the end-to-end pre-training model, fully utilizes the connection and dependence between the two stages, and simultaneously optimizes the results of the two stages, so that the information of the two stages are effectively fused with each other, and experiments prove that the effects of the nominal identification and the entity link are improved by using the end-to-end method.
Specifically, the present invention provides an end-to-end entity link model training method, referring to fig. 1, including steps S101 to S104:
step S101: the method comprises the steps of obtaining a training sample set, wherein the training sample set comprises a plurality of samples, each sample is a paragraph containing one sentence or a plurality of continuous sentences, and all entities and corresponding description information in the samples are marked to serve as labels.
Step S102: the method comprises the steps of obtaining a preset knowledge base, wherein description information of a plurality of known entities is recorded in the preset knowledge base, and the known entities comprise a plurality of relations of word polysemy and synonymy of multiple words.
Step S103: obtaining an initial entity link model, wherein the initial entity link model comprises a first BERT model, a second BERT model and a Global Pointer layer; the first BERT model is connected with the Global Pointer layer and used for executing the nominal identification and obtaining all predicted entities in sentences or paragraphs in each sample; the first BERT model and the second BERT model are connected and used for carrying out entity disambiguation on all the prediction entities one by one, wherein for a specified prediction entity, all output by the first BERT model are used as a first vector representation, a part corresponding to the specified prediction entity in the first vector representation is used as an entity vector representation, and the entity vector representation and the first vector representation are subjected to weighted summation to obtain an overall vector representation; searching a preset knowledge base according to the specified prediction entity, finding a positive example with the same meaning as the specified prediction entity and two negative examples with different meanings, and respectively carrying out vector representation operation on the description information of the positive example and the negative example corresponding to the specified prediction entity by using a second BERT model to obtain corresponding second vector representation; and connecting the overall vector representation with the second vector representations of the positive example and the negative example corresponding to the specified prediction entity respectively, and inputting the overall vector representations into the full-connection layer for grading and judging whether the overall vector representations are synonymous or not.
Step S104: and training the initial entity link model by adopting a training sample set, wherein all the prediction entities in each sample are subjected to entity disambiguation with positive examples and negative examples in a preset knowledge base corresponding to each prediction entity one by one, a joint loss function is calculated and propagated in a reverse direction, and parameters of the first BERT model, the second BERT model and the Global Pointer layer are integrally adjusted to obtain a target entity link model.
In step S101, a sample set is trained, each sample is a corpus labeled manually, and in some embodiments, an existing corpus may be directly used to construct a database, or a sample may be constructed by labeling the corpus of a target category again. Specifically, the named entities in each sample are labeled, and the labels should at least include the location and description information of each entity. Furthermore, the description information may be represented according to a preset rule, or may be directly linked to a corresponding known entity in a preset knowledge base, and marked by a code.
In step S102, a preset knowledge base is used to record structural information of a plurality of corpora, and corresponding description information is configured for a known entity in each corpus. Further, a relation of one word polysemous and multiple words synonymy exists among a plurality of known entities in the preset knowledge base. In some embodiments, the predetermined knowledge base configures candidate words having a word-polysemous or multi-word synonymy relationship with the known entity. And for each known entity, taking the terms with one-word multi-meaning and multi-word synonymy relation as candidate words to construct a query subset for improving the retrieval speed. In addition, a plurality of description information of the same known entity and the same words with a plurality of meanings are respectively coded and marked. In step S101, the coded mark of each entity in the sample should be consistent according to the mark of the corresponding known entity in the preset knowledge base.
In step S103, the present embodiment constructs an initial entity link model that combines the nominal identification and the entity disambiguation, referring to the structure shown in fig. 1. The initial entity link model comprises a first BERT model, a second BERT model and a Global Pointer layer; wherein, the BERT model is fully called as: the goal of the Bidirectional Encoder retrieval from Transformer, BERT model, is to obtain the semantic representation of the text containing rich semantic information by using large-scale unmarked corpus training. The Global Pointer layer utilizes the Global normalization idea to perform Named Entity Recognition (NER), and can recognize nested entities and non-nested entities without distinction. Correspondingly, the corpus in the sample is input into the first BERT model or the second BERT model and the output form is the input and output form of the universal BERT model.
Specifically, referring to fig. 2, in the present embodiment, the first BERT model is connected to the Global Pointer layer for performing the named recognition task, and the first BERT model is common to the named recognition task and the entity disambiguation task. In the training process, after the nominal identification part of the initial entity link model obtains the predicted entities, the predicted entities are searched one by one in a preset knowledge base, and a positive case and two negative cases corresponding to each predicted entity are obtained. In the actual training process, the entity in each sample is marked with the code corresponding to the known entity and the corresponding meaning description information in the preset knowledge base. Known entities that are the same as the predicted entity word and have the same meaning are positive examples, and known entities that are the same as the predicted entity word and have different meanings are negative examples.
In the training process of the entity disambiguation part, each prediction entity in each sample selects a positive example and two negative examples to be trained respectively, so that the prediction entity of one sample corresponds to three groups of data for training.
In some embodiments, a weighted sum of the entity vector representation and the first vector representation yields an overall vector representation, the entity vector representation having a weight ratio of 0.7 and the first vector representation having a weight ratio of 0.3.
In step S104, the initial entity link model is trained by using a training sample set, and in each training iteration process, a joint loss is constructed by using losses of the two parts of the initial entity link model, namely the nominal identification and the entity disambiguation, and is propagated in a reverse direction, and model parameters of the two parts of the nominal identification and the entity disambiguation are adjusted.
In some embodiments, the joint LOSS function LOSS is calculated as:
Loss=λlossmd+(1-λ)lossed; (1)
therein, lossmdLoss of reference recognition for the initial entity-link modeledLoss of entity disambiguation for the initial entity-linked model, λ being the weighting coefficient, 0<λ<1. In some embodiments, lossmdAnd lossedA cross entropy loss function is adopted, and lambda is 0.1.
In some embodiments, in training the initial entity-link model with a training sample set, the learning rate is set to 2e-5 and the Adam algorithm is used for gradient descent.
By using the model provided by the embodiment, the problems of error transmission and dependence deficiency existing in two stages can be effectively solved. The nominal recognition model and the entity disambiguation model are trained together, errors in the nominal recognition stage can be optimized in the training process, and the overall effect is not reduced by further conducting to the entity disambiguation stage. By utilizing the dependency relationship between the two stages, the candidate entity description of the entity disambiguation model and the knowledge base information introduce external knowledge for the named recognition model, and great improvement is brought to the effect of the named recognition stage. By using this model, the F1 values for both nominal identification and entity disambiguation are improved.
On the other hand, the invention also provides an end-to-end entity linking method, which comprises the following steps S201 to S203:
step S201: the method comprises the steps of obtaining linguistic data to be processed and a preset knowledge base, wherein description information of a plurality of known entities is recorded in the preset knowledge base, and the known entities comprise a plurality of relations of one word multiple meaning and multiple word synonymy.
Step S202: inputting linguistic data to be processed into a target entity link model obtained in the end-to-end entity link model training method in the steps S101 to S104, executing named recognition through a first BERT model and a Global Pointer layer which are sequentially connected in the target entity link model, inquiring a preset knowledge base one by one for an entity certificate obtained through recognition, obtaining a plurality of candidate words of which each entity has a word polysemous relation, and carrying out entity disambiguation on all the entities one by one through a first BERT model and a second BERT model in the target entity link model, wherein for a specified entity, the whole output by the first BERT model is used as a first vector to represent, the part corresponding to the specified entity in the first vector to represent is used as an entity vector to represent, and the entity vector representation and the first vector representation are subjected to weighted summation to obtain an overall vector representation; and the second BERT model carries out embedding operation on the description information of the candidate words corresponding to the specified entity respectively to obtain corresponding second vector representations, the whole vector representations are respectively connected with the second vector representations of the candidate words corresponding to the specified entity, and the whole vector representations are input into a full connection layer to be used for grading and judging whether the two representations are synonymous or not so as to eliminate ambiguity.
Step S203: and outputting a plurality of entities in the linguistic data to be processed obtained by the identification of the target entity link model and description information corresponding to each entity in a preset identification library.
Specifically, in step S201, the target entity link model trained based on steps S101 to S104 needs to be used in cooperation with a preset knowledge base in the processes of performing designation recognition and entity disambiguation. Therefore, when the entity link is performed, the corpus to be processed and the preset knowledge base are obtained first, where the preset knowledge base used in the method should be consistent with the form used in the training of the model in step S102, and there may be a difference in content. Specifically, the preset knowledge base adopted in step S201 may be configured with multiple candidate words having a synonymy relation with one or more words for each known entity, and mark the candidate words to improve the search efficiency. In some embodiments, the preset knowledge base respectively encodes and marks a plurality of pieces of description information including a word ambiguity in each known entity, and encodes the corresponding description information of the entity label output by the target entity link model.
In step S202, the corpus to be processed is input into the target entity link model obtained in the end-to-end entity link model training method in steps S101 to S104, wherein the part for reference recognition first obtains the entity in the corpus to be processed, and the entity identified is disambiguated by the entity disambiguation part. In the disambiguation process, known entities in a preset knowledge base are used for carrying out disambiguation processing.
In step S203, the entity in the corpus to be processed is identified and output by the target entity link model, and the description information corresponding to each entity after disambiguation is output. Specifically, the entity link relationship in the preset knowledge base corresponding to each entity obtained by the output recognition may be marked in a form of encoding.
In another aspect, the present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of the method.
In another aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.
The invention is illustrated below with reference to specific examples:
the embodiment provides an end-to-end entity link model, and the working process thereof is divided into the following steps:
1) the input sentence is subjected to the designation recognition, and since the input is a whole sentence, the designation in the sentence needs to be recognized first. The method of named recognition is to use BERT + Global Pointer. 2) And performing entity disambiguation on the identified index, splicing the vector representation of the input sentence and the vector representation of the index needing disambiguation with the vector representations of all possible candidate entity descriptions respectively by extracting the vector representation of the input sentence and the vector representation of the index needing disambiguation, and predicting a final link result through a full link layer. 3) The design is a key technical point of the model, the model of the nominal recognition model and the model of the entity disambiguation form a double-tower structure, the nominal recognition model is used for nominal recognition, then the model of the nominal recognition is multiplexed, the vector representation of the input sentence is extracted to participate in the entity disambiguation, and the joint training and the mutual optimization of the nominal recognition model and the entity disambiguation model in the training process are realized.
The named recognition stage can be regarded as a named entity recognition process, a sentence of text is input, and a predicted entity in the text is output. Entities identified in entity links are referred to as being ambiguous, often requiring link disambiguation at a later stage. Illustratively, the index recognition model is as shown in fig. 3, the input text of the sentence "where" big seedling song "is rated by a certain Liu actor Zhao, which is to be recognized is passed through the BERT model and then through the Global Pointer layer, and the predicted entity" Liu Yi, Zhao and Dazhui song "is output. Global Pointer utilizes Global normalization idea to perform Named Entity Recognition (NER), and can well solve the problem that nested entities exist in a text and are difficult to recognize.
In the entity disambiguation stage, as shown in fig. 3, for the index identified in the index identification stage, all the indexes are sequentially traversed to be objects to be disambiguated, one positive example and two negative examples are selected from a candidate entity set corresponding to the index in a knowledge base to be used as samples to participate in the training of the model, and the description text of the selected entity is used as the input of the right BERT model.
And (3) weighting and summing the embedding (word embedding and vector representation) corresponding to the left BERT model "Liu (left object model)" and the embedding of the last layer of the left BERT model to obtain the entry embedding, splicing the entry embedding with the sensor embedding of the right BERT model, and predicting whether the link is correct through a full connection layer. The embedding of the left BERT model represents left-side named vector representation, the embedding of the right BERT model represents vector representation described by the candidate entity, and the embedding corresponding to the splicing designation introduces named information, so that the effect of the model is effectively improved through experimental verification.
In the testing stage, as shown in fig. 4, all the references identified by the reference identification module need to be linked through the disambiguation model, all candidate entities of the references are also sampled during the linking, and finally, the entity with the highest score is selected as the finally linked entity. And calculating the entity link accuracy to obtain the test result.
In this embodiment, for training of the model, the dimension of the BERT word vector is set to 768 dimensions, the maximum length of the sentence is 256, the learning rate is set to 2e-5, and the weights of the entity disambiguation stage increment embedding and the sensor embedding are 0.7 and 0.3. The loss functions of the nominal identification model and the entity disambiguation model both adopt cross entropy loss functions, and the combined loss function of the whole model is as follows:
loss=λlossmd+(1-λ)lossed; (1)
therein, lossmdLoss of reference recognition for the initial entity-link modeledLoss of entity disambiguation for the initial entity-linked model, λ being the weighting coefficient, 0<λ<1. In some embodiments, lossmdAnd lossedA cross entropy loss function is used, with λ being 0.1.
The training data adopted by the embodiment is a ccks2019(2019 national knowledge graph and semantic calculation conference) data set facing an entity chain of a Chinese short text and indicating a task, and the knowledge base adopts a hundredth-degree arrangement knowledge base provided in the data set. The Adam algorithm is used for gradient descent.
The end-to-end model of the embodiment combines the nominal recognition model and the entity disambiguation model, trains the nominal recognition model and the entity disambiguation model simultaneously in the training stage, and makes full use of the dependency relationship between the two stages. The model has the following advantages: the two models are jointly trained, the error of the named recognition model can be optimized and corrected in the training process, and the error of the named recognition in the independent two stages is transmitted to the entity disambiguation stage in an uncorrectable manner. The disambiguation model introduces external knowledge base information, on one hand, extra information can be introduced to help the reference recognition model to better predict the reference in the text, and on the other hand, the introduction of the knowledge base information can well optimize the problem of boundary recognition error in the reference recognition. In an end-to-end disambiguation model, more training data are introduced into an entity disambiguation stage through a large number of positive and negative samples generated in an index identification stage, on one hand, more indexes can avoid entity omission, and on the other hand, more data improve the effect and generalization performance of the disambiguation model. The verification proves that the end-to-end model improves the effects of the two stages of the designation identification and the entity disambiguation, and fully shows the effectiveness of the model.
In summary, the end-to-end entity link model training method, the entity link method and the device construct an initial entity link model including a first BERT model, a second BERT model and a Global Pointer layer, perform designation recognition based on the first BERT model and the Global Pointer layer, and perform entity disambiguation based on the first BERT model and the second BERT model. In the training process of the model, the loss functions of the two parts of the nominal identification and the entity disambiguation are combined, and meanwhile, the parameters of the two parts of the initial entity link model for executing the nominal identification and the entity disambiguation are propagated and adjusted in the opposite direction, so that errors of the two parts are effectively transmitted and adjusted, the errors are mutually dependent, and the overall effect is improved. In the entity disambiguation process, a preset knowledge base is introduced as external knowledge, so that the disambiguation effect can be greatly improved.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are programs or code segments that perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments noted in this disclosure describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the above-mentioned order of the steps, that is, the steps may be executed in the order mentioned in the embodiments, may be executed in an order different from the order in the embodiments, or may be executed simultaneously.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. An end-to-end entity link model training method is characterized by comprising the following steps:
acquiring a training sample set, wherein the training sample set comprises a plurality of samples, each sample is a paragraph containing one sentence or a plurality of continuous sentences, and all entities and corresponding description information in the samples are marked as labels;
acquiring a preset knowledge base, wherein description information of a plurality of known entities is recorded in the preset knowledge base, and the known entities comprise a plurality of relationships of one word with multiple meanings and multiple synonyms;
obtaining an initial entity link model, wherein the initial entity link model comprises a first BERT model, a second BERT model and a Global Pointer layer; the first BERT model is connected with the Global Pointer layer and used for executing nominal identification and obtaining all predicted entities in sentences or paragraphs in each sample; the first BERT model and the second BERT model are connected and used for carrying out entity disambiguation on all the prediction entities one by one, wherein for a specified prediction entity, all output by the first BERT model are used as a first vector representation, a part corresponding to the specified prediction entity in the first vector representation is used as an entity vector representation, and the entity vector representation and the first vector representation are subjected to weighted summation to obtain an integral vector representation; searching the preset knowledge base according to the specified prediction entity, finding a positive example with the same meaning as the specified prediction entity and two negative examples with different meanings, and respectively carrying out vector representation operation on the description information of the positive example and the negative example corresponding to the specified prediction entity by the second BERT model to obtain corresponding second vector representation; connecting the overall vector representation with second vector representations of a positive example and a negative example corresponding to the specified prediction entity respectively, and inputting the overall vector representations into a full-connection layer for grading and judging whether the overall vector representations are synonymous;
and training the initial entity link model by adopting the training sample set, wherein all the prediction entities in each sample are subjected to entity disambiguation with positive examples and negative examples in the preset knowledge base corresponding to the prediction entities one by one, a joint loss function is calculated and propagated in reverse, and parameters of the first BERT model, the second BERT model and the Global Pointer layer are integrally adjusted to obtain a target entity link model.
2. The method as claimed in claim 1, wherein the predetermined knowledge base configures candidate words having a synonymy relation of one word or more words for the known entity.
3. The end-to-end entity-to-entity link model training method of claim 1, wherein in training the initial entity-to-entity link model by using the training sample set, a learning rate is set to 2e-5, and an Adam algorithm is used for gradient descent.
4. The method of claim 1, wherein the joint LOSS function LOSS is calculated as:
Loss=λlossmd+(1-λ)lossed
therein, lossmdIs the initial seedLoss of designation recognition by the body-linked model, lossedLoss of entity disambiguation for the initial entity-linked model, λ being the weight coefficient, 0<λ<1。
5. The end-to-end entity-to-end link model training method of claim 4, wherein lossmdAnd lossedA cross entropy loss function is adopted, and lambda is 0.1.
6. The end-to-end entity-link model training method of claim 1, wherein the entity vector representation and the first vector representation are weighted and summed to obtain an overall vector representation, wherein the entity vector representation has a weight ratio of 0.7, and the first vector representation has a weight ratio of 0.3.
7. An end-to-end entity linking method, comprising:
acquiring linguistic data to be processed and a preset knowledge base, wherein the preset knowledge base is recorded with description information of a plurality of known entities, and the known entities comprise a plurality of relations of one word multiple meaning and multiple word synonymy;
inputting the corpus to be processed into a target entity link model obtained by the end-to-end entity link model training method according to any one of claims 1 to 6, performing named recognition by a first BERT model and a Global Pointer layer which are sequentially connected in the target entity link model, inquiring a preset knowledge base one by one for the entity certificate obtained by identification to obtain a plurality of candidate words of each entity with a word-polysemous relationship, carrying out entity disambiguation on all the entities one by the first BERT model and the second BERT model in the target entity link model, for a specified entity, taking all output of a first BERT model as a first vector representation, taking a part corresponding to the specified entity in the first vector representation as an entity vector representation, and performing weighted summation on the entity vector representation and the first vector representation to obtain an integral vector representation; the second BERT model carries out embedding operation on the description information of the candidate words corresponding to the specified entity respectively to obtain corresponding second vector representations, the whole vector representations are connected with the second vector representations of the candidate words corresponding to the specified entity respectively, and a full connection layer is input for grading and judging whether the description information is synonymous or not so as to eliminate ambiguity;
and outputting the plurality of entities in the corpus to be processed, which are identified by the target entity link model, and the description information corresponding to each entity in the preset knowledge base.
8. The end-to-end entity linking method according to claim 7, wherein the predetermined knowledge base respectively encodes and marks a plurality of description information including a word ambiguity in each known entity, and encodes the entity label corresponding description information output by the target entity linking model.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN202210154521.7A 2022-02-21 2022-02-21 End-to-end entity link model training method, entity link method and device Pending CN114707494A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210154521.7A CN114707494A (en) 2022-02-21 2022-02-21 End-to-end entity link model training method, entity link method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210154521.7A CN114707494A (en) 2022-02-21 2022-02-21 End-to-end entity link model training method, entity link method and device

Publications (1)

Publication Number Publication Date
CN114707494A true CN114707494A (en) 2022-07-05

Family

ID=82166192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210154521.7A Pending CN114707494A (en) 2022-02-21 2022-02-21 End-to-end entity link model training method, entity link method and device

Country Status (1)

Country Link
CN (1) CN114707494A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329755A (en) * 2022-08-18 2022-11-11 腾讯科技(深圳)有限公司 Entity link model processing method and device and entity link processing method and device
CN116306925A (en) * 2023-03-14 2023-06-23 中国人民解放军总医院 Method and system for generating end-to-end entity link

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329755A (en) * 2022-08-18 2022-11-11 腾讯科技(深圳)有限公司 Entity link model processing method and device and entity link processing method and device
CN115329755B (en) * 2022-08-18 2023-10-31 腾讯科技(深圳)有限公司 Entity link model processing method and device and entity link processing method and device
CN116306925A (en) * 2023-03-14 2023-06-23 中国人民解放军总医院 Method and system for generating end-to-end entity link
CN116306925B (en) * 2023-03-14 2024-05-03 中国人民解放军总医院 Method and system for generating end-to-end entity link

Similar Documents

Publication Publication Date Title
CN111522910B (en) Intelligent semantic retrieval method based on cultural relic knowledge graph
CN111401049A (en) Entity linking method and device
Qin et al. A survey on text-to-sql parsing: Concepts, methods, and future directions
CN114707494A (en) End-to-end entity link model training method, entity link method and device
CN117076653B (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
CN101878476A (en) Machine translation for query expansion
CN112183094A (en) Chinese grammar debugging method and system based on multivariate text features
US20220114340A1 (en) System and method for an automatic search and comparison tool
CN112883199A (en) Collaborative disambiguation method based on deep semantic neighbor and multi-entity association
CN115658846A (en) Intelligent search method and device suitable for open-source software supply chain
CN116225526A (en) Code clone detection method based on graph representation learning
JP6145059B2 (en) Model learning device, morphological analysis device, and method
KR102517971B1 (en) Context sensitive spelling error correction system or method using Autoregressive language model
KR102531114B1 (en) Context sensitive spelling error correction system or method using masked language model
Hakimov et al. Evaluating architectural choices for deep learning approaches for question answering over knowledge bases
Zhang et al. Beqain: An effective and efficient identifier normalization approach with bert and the question answering system
CN115203438B (en) Entity linking method and storage medium
CN111104520B (en) Personage entity linking method based on personage identity
CN112685538A (en) Text vector retrieval method combined with external knowledge
CN111881264A (en) Method and electronic equipment for searching long text in question-answering task in open field
CN116302953A (en) Software defect positioning method based on enhanced embedded vector semantic representation
CN112528003B (en) Multi-item selection question-answering method based on semantic sorting and knowledge correction
CN114781381A (en) Standard index extraction method based on rule and neural network model fusion
Arslan et al. Graph-based lemmatization of Turkish words by using morphological similarity
Maiti et al. A novel method for performance evaluation of text chunking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination