CN116663563A

CN116663563A - Cross-domain entity linking method, device and medium based on multitasking negative sampling

Info

Publication number: CN116663563A
Application number: CN202310931885.6A
Authority: CN
Inventors: 徐童; 陈恩红; 陈超; 吴世伟; 许德容
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2023-07-27
Filing date: 2023-07-27
Publication date: 2023-08-29
Anticipated expiration: 2043-07-27
Also published as: CN116663563B

Abstract

The invention discloses a multi-task negative sampling-based cross-domain entity linking method, a device and a medium, wherein the cross-domain entity linking method is divided into an auxiliary task stage, a multi-task learning model stage and a multi-task learning model parameter training stage, the auxiliary task stage selects entity type prediction as an auxiliary task, and generates labels required by the auxiliary task of the multi-task learning stage; the main body framework of the multi-task learning model is divided into an anchor point sampling module, a bottom layer global sharing module and a high layer local sharing module. The invention utilizes a multi-task learning and negative sampling mode to migrate the capability of identifying the entity obtained in the training field to different testing fields, thereby obtaining the optimal generalization performance of cross-field entity link.

Description

Cross-domain entity linking method, device and medium based on multitasking negative sampling

Technical Field

The invention relates to the field of entity linking in natural language processing and knowledge maps, in particular to a multi-domain entity linking method, device and medium based on a multi-task learning and negative sampling strategy.

Background

Entity linking aims at linking ambiguous entity references to related entities in an already built knowledge base, and the ability of such roughly described entity references to align to precisely described entities themselves in the knowledge base is an important basis for many natural language processing tasks such as knowledge questions and answers, information extraction, text analysis, etc. In order to make this capability more closely approximate to the heterogeneous characteristics of data in real scenes, more challenging cross-domain entity linking tasks are proposed, which utilize entity mention pairs for training in multiple training domains, identify real entities, and test the domain generalization performance of entity links and the ability to identify entities in multiple disparate test domains.

Currently, existing cross-domain entity linking methods focus on the ability to learn complementary and negative sampling by multiple tasks to identify real entities, but they face many challenges: 1) In the complementary representation of the multi-task learning, the problem of insufficient multi-task fusion interaction often exists; 2) The negative sampling method improves the capability of identifying the real entity, but introduces a lot of irrelevant data at the same time, thereby bringing huge calculation cost. Therefore, it is difficult to obtain a satisfactory effect in the prior art.

Aiming at the problems in the prior art, the invention combines the characteristics of the multi-task learning and negative sampling method, designs a cross-domain entity link frame based on the multi-task learning and anchor point sampling method, and performs a large number of experiments on widely used cross-domain entity link data sets, thereby obtaining better effect on objective evaluation indexes and exceeding the previous optimal model.

Disclosure of Invention

The invention aims to solve the defects of the prior art, and provides a cross-domain entity linking method, device and medium based on multitask negative sampling, which utilize multitask learning and negative sampling modes to migrate the capability of identifying entities obtained in the training domain to different testing domains, so as to obtain the optimal generalization performance of the cross-domain entity linking.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a multi-task negative sampling-based cross-domain entity linking method, where the cross-domain entity linking method is divided into an auxiliary task stage, a multi-task learning model stage and a multi-task learning model parameter training stage, where the auxiliary task stage selects entity type prediction as an auxiliary task, and generates a label required by the auxiliary task of the multi-task learning stage; the main body framework of the multi-task learning model is divided into an anchor point sampling module, a bottom global sharing module and a high-level local sharing module, the anchor point sampling module performs anchor point sampling on training data, randomly selects entity reference pairs with the same entity type, binds the entity type with the entity reference pairs to obtain an anchor point set, takes the anchor point set as a unit as training data, and sends the training data to the bottom global sharing module for training; the bottom global sharing module acquires the entity and the text representation mentioned by the entity by using a text encoder, and aggregates the acquired text representations to acquire global features of the entity and the entity mention fused interaction at the bottom respectively; the high-level local sharing module generates text representations which are finer in granularity and deep and used for different-degree task interaction sharing by using the entity and the text representations mentioned by the entity output by the bottom-level global sharing module; the training multitask learning model parameter stage adopts training data of auxiliary tasks to train a type prediction model, a multitask label is generated, and the training pairing completed entity and entity are utilized to mention parameters in the training multitask learning model.

Furthermore, the multi-task negative sampling-based cross-domain entity linking method specifically comprises the following steps:

s1, in an auxiliary task stage, preprocessing input entity type predicted text data, splicing entity mention and entity context to serve as input of auxiliary task training, training an entity type prediction model, inputting entity linked data after training, and generating a corresponding entity type to serve as a label of an entity type prediction task in subsequent multi-task learning;

s2, in a stage of establishing a multi-task learning model, performing multi-task learning by taking entity links as main tasks and taking entity type prediction as auxiliary tasks, wherein the multi-task learning comprises the following steps in sequence:

(1) Before the training data is subjected to multi-task learning, the anchor point sampling module selects a priori condition as a standard, entities under the same condition are divided into one class, then entity mention pairs in the original training data are randomly selected as negative sample pairs according to the selected priori condition, and the negative sample pairs and the original entity mention pairs are statically bound to obtain a final data form as the input of the bottom global sharing module;

(2) The underlying global sharing module carries out multi-granularity and multi-level coding on text description by using a multi-layer text coder in a pre-training model through a given entity and entity mention, respectively obtains text global features fused with different tasks through an attention mechanism, fuses the preferences of a plurality of tasks on global sharing and task specific information, and simultaneously serves as input of a high-level local sharing module;

(3) The high-level local sharing module utilizes a plurality of fine-grained information extractors to deeply extract the bottom layer features mentioned by the entities and the entities to obtain multi-task fusion interaction features which are shared in different degrees and are unique, then utilizes task characteristics to send the text global features output by the bottom layer global sharing module into a gating network of each task, thereby obtaining the preferences of different tasks for sharing in different degrees and privacy, realizing the deep fusion interaction of the bottom layer and the high level, and finally aggregating the preferences with the characterization obtained by the corresponding fine-grained information extractors to obtain high-level fine-grained task sharing interaction characterization;

s3, in a parameter stage of the training multi-task learning model, training of the multi-task learning model utilizes cross entropy loss training entity to link tasks and two classes of cross entropy loss training entity type prediction tasks, and a random gradient descent algorithm is used for updating model parameters.

Furthermore, in the step S1, an entity mention and corresponding context information are spliced and then sent to a pre-trained encoder, the encoder selects a pre-trained model BERT and soft prompts, and the model is subjected to a classifier and then is subjected to label calculation loss to obtain a trained type generation model, and entities in entity link data are sent to obtain type labels of all the entities.

Still further, each sample pair in the set of anchor points is itself a positive sample pair for each sample pair, and when the loss is calculated, all sample pairs in the same batch are negative sample pairs with respect to each other.

Furthermore, the specific implementation steps of the stage of establishing the multi-task learning model are as follows:

(1) The anchor point sampling module refers to a pair for one entityA priori criteria is selected as entity type and entity domain, and the pair set is mentioned from the entities of the same entity type or entity domain +.>Is selected at random->The anchor point samples form an anchor point set +.>Expressed as: />Then taking the anchor point set as a basic unit as training data to be sent into a bottom global sharing module for training;

(2) The bottom global sharing module is used for anchor point collectionOne entity in (a) mentions p->Splicing entity mention and entity context, and splicing entity name and description to obtain input +.>Expressed as:，then ∈>Respectively sending the text semantic features into entity and pre-training model BERT mentioned by the entity to obtain serialized bottom global sharing characterization +.>Expressed as: />，/>As the input of the depth sharing interaction of the high-level local sharing module;

(3) The high-level local sharing module uses three single-layer Transformer encoders as extractors for sharing characterization obtained through the bottom-layer global sharing module, and obtains different local fine granularity information characterization from the bottom-layer sharing characterization, wherein the different local fine granularity information characterization is respectively expressed as:

modeling sharing interaction of different tasks on local information, and sending bottom shared information into a plurality of gating networks consisting of a linear layer and a normalization layer to obtain the representation of different tasks on different local fine granularity information->Then the corresponding weights and the characteristics are added by dot multiplication to obtain high-level shared characteristics of different tasks +.>High-level sharing characteristics obtained by each taskAnd underlying sharing feature->Respectively and correspondingly splicing to obtain the final high-level sharing characteristicsThen, respectively sending different optimization targets, < + >>Expressed as:wherein->Representing a stitching operation.

Further, for entity type prediction of auxiliary tasks, in particular, the final high-level sharing featureFeeding into classifier to obtain->Then, calculating the loss function with the tags, respectively>And obtaining a type of optimization result, wherein the two loss functions are respectively expressed as: />，Wherein->Representing the number of entity categories->Indicate->Tag prediction value->Representing entity mention pair->Middle entity->Inputting the result obtained by the text encoder, +.>Representing entity mention pair->In->Inputting a result obtained by the text encoder; for the main task entity linking, in particular the final high-level sharing feature +.>Calculate scoring function->Expressed as: />By cross entropy loss function->Optimizing entity linking tasks->Expressed as: />Wherein->A data size representing each batch sample; finally, combining the loss functions of the auxiliary task and the main task to obtain a joint training function +.>Expressed as: />=/>Wherein->The weight of the two task loss functions is used for measuring the importance of the tasks and balancing different convergence speeds and numerical scales of a plurality of tasks.

Furthermore, in the training process of the multi-task learning model, a random gradient descent algorithm is used for optimizing multi-task joint loss consisting of a cross entropy loss function and a two-class cross entropy loss function, and the used optimizer is a random gradient descent optimizer and propagates optimization parameters reversely.

Still further, the random gradient descent optimizer size of each batchFor 128, the initial learning rate is set to 0.00002, the number of negative samples +.>3.

In a second aspect, the present invention provides a cross-domain entity linking apparatus, including a memory and a processor, where the memory stores computer executable instructions, and the processor is configured to execute the computer executable instructions, and the computer executable instructions implement the multi-task negative sampling-based cross-domain entity linking method when executed by the processor.

In a third aspect, the present invention provides a computer readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the multi-tasking negative sampling based cross-domain entity linking method.

Compared with the prior art, the invention has the beneficial effects that:

the main framework of the multi-task learning model is divided into an anchor point sampling module, a bottom global sharing module and a high-level local sharing module, wherein the anchor point sampling module improves the capability of identifying real entities by utilizing a negative sampling mode before multi-task learning of training data, and improves the cross-domain capability by utilizing the diversity of conditional data distribution. The underlying global sharing module acquires the entity and text characterization mentioned by the entity by using a text encoder, and aggregates the acquired characterization to acquire global features of the entity and the entity mention fused interaction at the underlying layer respectively. The high-level local sharing module generates text representations which are finer-grained and deeper and can be used for different-degree task interaction sharing for the output entities and the representations of the entity references. The invention fully utilizes the complementary representation of multi-task learning, simultaneously fully utilizes the information interaction with different thickness granularities and the information interaction sharing among different tasks, and considers the balance of the capability of identifying the real entity and the calculation cost brought by a negative sampling mode, thereby obtaining better effect on the inter-domain entity link.

The invention can train the type prediction model by using the training data of the auxiliary task, generate the multi-task label, and train the parameters in the multi-task frame by using the paired entities and entity mention in the training field. When model parameter training is completed, entity link detection can be performed in completely different test fields, so that cross-field performance of the framework is verified. The invention utilizes a multi-task learning and negative sampling mode to migrate the capability of identifying the entity obtained in the training field to different testing fields, thereby obtaining the optimal generalization performance of cross-field entity link. The invention can fully utilize the existing data to learn out the cross-domain entity link model with good effect, and achieves the best result of the cross-domain entity link.

Drawings

Fig. 1 is a schematic diagram of auxiliary task tag generation in embodiment 1.

Fig. 2 is a schematic diagram of a main body frame of the multi-task learning model in embodiment 1.

Fig. 3 is a diagram showing the comparison effect of the anchor point sampling module in embodiment 1 and the conventional method.

Detailed Description

Example 1:

the embodiment discloses a multi-task negative sampling-based cross-domain entity linking method, which comprises an auxiliary task stage, a multi-task learning model establishment stage and a multi-task learning model parameter training stage, wherein the auxiliary task stage selects entity type prediction as an auxiliary task and generates labels required by the auxiliary task of the multi-task learning stage; the main body framework of the multi-task learning model is divided into an anchor point sampling module, a bottom global sharing module and a high-level local sharing module, the anchor point sampling module performs anchor point sampling on training data, entity reference pairs with the same entity type are randomly selected, the entity type and the entity reference pairs are bound, an anchor point set is obtained and then used as training data by taking the anchor point set as a unit, each sample pair in the anchor point set is sent into the bottom global sharing module for training, each sample pair is a positive sample pair, and when loss is calculated, all sample pairs in the same batch are negative sample pairs. The bottom global sharing module acquires the entity and the text representation mentioned by the entity by using a text encoder, and aggregates the acquired text representations to acquire global features of the entity and the entity mention fusion interaction at the bottom respectively; the high-level local sharing module generates text representations which are finer in granularity and deep and used for different-degree task interaction sharing by using the entity and the text representations mentioned by the entity output by the bottom-level global sharing module; and in the stage of training the parameters of the multi-task learning model, training a type prediction model by using training data of auxiliary tasks, generating a multi-task label, and referring to the parameters in the multi-task learning model by using the entity and the entity of which the training pairing is completed.

Referring to fig. 1 to 3, in this embodiment, the cross-domain entity linking method specifically includes the following steps:

s1, in an auxiliary task stage, preprocessing input entity type predicted text data, splicing entity mention and entity context to serve as input of auxiliary task training, training an entity type prediction model, inputting entity linked data after training, and generating a corresponding entity type to serve as a label of an entity type prediction task in subsequent multi-task learning; in the embodiment, an entity mention and corresponding context information are spliced and then sent to a pre-trained encoder, the encoder selects a pre-trained model BERT and soft prompt, the model BERT and the soft prompt are subjected to a classifier and then are subjected to label calculation loss to obtain a trained type generation model, and entities in entity link data are sent to obtain type labels of all the entities.

s3, in a parameter stage of the training multi-task learning model, training of the multi-task learning model utilizes cross entropy loss training entity to link tasks and two classes of cross entropy loss training entity type prediction tasks, and a random gradient descent algorithm is used for updating model parameters. In the training process of the multi-task learning model, the cross entropy loss function and the two-class cross entropy loss function contained in the multi-task joint loss are optimized by using a random gradient descent algorithm, and the used optimizers are random gradient descent optimizers and back-propagation optimization parameters. The size of each batch of the random gradient descent optimizer is 128, the initial learning rate is set to 0.00002, and the number of negative samples is calculated3.

In this embodiment, the specific implementation steps of the stage of establishing the multi-task learning model are as follows:

(1) The anchor point sampling module references pairs for an entityA priori criteria is selected as entity type and entity domain, and the pair set is mentioned from the entities of the same entity type or entity domain +.>Is selected at random->The anchor point samples form an anchor point set +.>Expressed as: />Then taking the anchor point set as a basic unit as training data to be sent into a bottom global sharing module for training;

(2) The underlying global sharing module is used for anchor point collectionOne entity in (a) mentions p->Splicing entity mention and entity context, and splicing entity name and description to obtain input +.>Expressed as:，then ∈>Respectively sending the text semantic features into entity and pre-training model BERT mentioned by the entity to obtain serialized bottom global sharing characterization +.>Expressed as: />，/>As the input of the depth sharing interaction of the high-level local sharing module;

(3) The high-level local sharing module uses three single-layer Transformer encoders as extractors for sharing characterization obtained by the bottom-level global sharing module, and obtains different from the bottom-level sharing characterizationIs represented by the following partial fine-grained information representation:modeling sharing interaction of different tasks on local information, sending bottom shared information into a plurality of gating networks formed by a linear layer and a normalization layer to obtain weights of different tasks on different local fine granularity information characterization, and then carrying out dot multiplication addition on the corresponding weights and features to obtain high-level shared features of different tasks->High-level sharing characteristics obtained by each taskAnd underlying sharing feature->Respectively and correspondingly splicing to obtain the final high-level sharing characteristicsThen, respectively sending different optimization targets, < + >>Expressed as:wherein->Representing a stitching operation.

Wherein for entity type prediction of auxiliary tasks, in particular, the final high-level sharing characteristicsFeeding into classifier to obtain->Then, calculating the loss function with the tags, respectively>And obtaining a type of optimization result, wherein the two loss functions are respectively expressed as: />，Wherein->Representing the number of entity categories->Indicate->Tag prediction value->Representing entity mention pair->Middle entity->Inputting the result obtained by the text encoder, +.>Representing entity mention pair->In->Inputting a result obtained by the text encoder; for the main task entity linking, in particular the final high-level sharing feature +.>Calculate scoring function->Expressed as: />By cross entropy loss function->Optimizing entity linking tasks->Expressed as:wherein->A data size representing each batch sample; finally, combining the loss functions of the auxiliary task and the main task to obtain a joint training function +.>Expressed as: />=Wherein->The weight of the two task loss functions is used for measuring the importance of the tasks and balancing different convergence speeds and numerical scales of a plurality of tasks.

Example 2:

the embodiment discloses a cross-domain entity linking device, which comprises a memory and a processor, wherein the memory stores computer executable instructions, the processor is configured to run the computer executable instructions, and the computer executable instructions are executed by the processor to realize the cross-domain entity linking method based on the multi-task negative sampling disclosed in the embodiment 1.

Example 3:

the present embodiment discloses a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the multi-task negative sampling-based cross-domain entity linking method disclosed in the present embodiment 1.

Claims

1. The cross-domain entity linking method based on the multi-task negative sampling is characterized by comprising an auxiliary task stage, a multi-task learning model establishment stage and a multi-task learning model parameter training stage, wherein the auxiliary task stage selects entity type prediction as an auxiliary task and generates labels required by the auxiliary task of the multi-task learning stage; the main body framework of the multi-task learning model is divided into an anchor point sampling module, a bottom global sharing module and a high-level local sharing module, the anchor point sampling module performs anchor point sampling on training data, randomly selects entity reference pairs with the same entity type, binds the entity type with the entity reference pairs to obtain an anchor point set, takes the anchor point set as a unit as training data, and sends the training data to the bottom global sharing module for training; the bottom global sharing module acquires the entity and the text representation mentioned by the entity by using a text encoder, and aggregates the acquired text representations to acquire global features of the entity and the entity mention fused interaction at the bottom respectively; the high-level local sharing module generates text representations which are finer in granularity and deep and used for different-degree task interaction sharing by utilizing the entity and the text representations mentioned by the entity output by the bottom-level global sharing module; the training multitask learning model parameter stage adopts training data of auxiliary tasks to train a type prediction model, a multitask label is generated, and the training pairing completed entity and entity are utilized to mention parameters in the training multitask learning model.

2. The multi-task negative sampling-based cross-domain entity linking method according to claim 1, comprising the specific steps of:

s2, in a stage of establishing a multi-task learning model, taking an entity link as a main task and an entity type prediction task as an auxiliary task to perform multi-task learning, and sequentially dividing the multi-task learning into the following steps:

3. The multi-task negative sampling-based cross-domain entity linking method according to claim 2, wherein in the step S1, an entity mention and corresponding context information are spliced and sent to a pre-trained encoder, the encoder selects a pre-trained model BERT and soft prompts, and the model is subjected to loss calculation with a label after passing through a classifier to obtain a trained type generation model, and entities in entity linking data are sent to obtain type labels of all the entities.

4. The multi-tasking negative sampling based cross-domain entity linking method of claim 3 wherein each sample pair in the set of anchor points is itself a positive sample pair for each sample pair and when loss is calculated, all sample pairs in the same batch are negative sample pairs to each other.

5. The multi-task negative sampling-based cross-domain entity linking method according to claim 3, wherein the specific implementation steps of the stage of establishing the multi-task learning model are as follows:

(1) The anchor point sampling module refers to a pair for one entityA priori criteria is first selected as entity type or entity domain, the set of pairs is mentioned from the entities of the same entity type or entity domain +.>Is selected at random->The anchor point samples form an anchor point set +.>Expressed as: />Then byThe anchor point set is used as a basic unit and is used as training data to be sent to a bottom global sharing module for training;

(2) The bottom global sharing module is used for anchor point collectionOne entity in (a) mentions p->Splicing entity mention and entity context, and splicing entity name and description to obtain input +.>Expressed as:，then ∈>Respectively sending the text semantic features into entity and pre-training model BERT mentioned by the entity to obtain serialized bottom global sharing characterization +.>Expressed as:，/>as the input of the depth sharing interaction of the high-level local sharing module;

(3) The high-level local sharing module uses three single-layer Transformer encoders as extractors for sharing characterization obtained through the bottom-layer global sharing module, and obtains different local fine granularity information characterization from the bottom-layer sharing characterization, wherein the different local fine granularity information characterization is respectively expressed as:modeling sharing interaction of different tasks on local information, sending bottom shared information into a plurality of gating networks formed by a linear layer and a normalization layer to obtain weights of different tasks on different local fine granularity information characterization, and then carrying out dot multiplication addition on the corresponding weights and features to obtain high-level shared features of different tasks->High-level sharing characteristics obtained by each taskAnd underlying sharing feature->Respectively and correspondingly splicing to obtain the final high-level sharing characteristicsThen, respectively sending different optimization targets, < + >>Expressed as:wherein->Representing a stitching operation.

6. The multi-tasking negative sampling based cross-domain entity linking method of claim 5 wherein for entity type prediction of auxiliary tasks, in particular, final high-level sharing features are to be performedFeeding into classifier to obtain->Then, calculating the loss function with the tags, respectively>And obtaining a type of optimization result, wherein the two loss functions are respectively expressed as: />，Wherein->Representing the number of entity categories->Indicate->Tag prediction value->Representing entity mention pair->Middle entity->Inputting the result obtained by the text encoder, +.>Representing entity mention pair->In->Inputting a result obtained by the text encoder; for main task entity link, haveThe body is to share the characteristics of the final higher layer +.>Calculate scoring function->Expressed as: />By cross entropy loss function->Optimizing entity linking tasks->Expressed as:wherein->A data size representing each batch sample; finally, combining the loss functions of the auxiliary task and the main task to obtain a joint training function +.>Expressed as: />=Wherein->The weight of the two task loss functions is used for measuring the importance of the tasks and balancing different convergence speeds and numerical scales of a plurality of tasks.

7. The multi-domain entity linking method based on multi-task negative sampling according to claim 6, wherein in the training process of the multi-task learning model, a random gradient descent algorithm is used to optimize a multi-task joint loss function consisting of a cross entropy loss function and a two-class cross entropy loss function, and the used optimizer is a random gradient descent optimizer and the optimized parameters are back-propagated.

8. The multi-tasking negative sampling based cross-domain entity linking method of claim 7 wherein the random gradient descent optimizer size of each lotFor 128, the initial learning rate is set to 0.00002, the number of negative samples +.>3.

9. A cross-domain entity linking apparatus comprising a memory storing computer executable instructions and a processor configured to execute the computer executable instructions, wherein the computer executable instructions when executed by the processor implement the multi-tasking negative sampling based cross-domain entity linking method of any of claims 1 to 8.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the multi-tasking negative sampling based cross-domain entity linking method of any of claims 1 to 8.