CN116663563A - Cross-domain entity linking method, device and medium based on multitasking negative sampling - Google Patents

Cross-domain entity linking method, device and medium based on multitasking negative sampling Download PDF

Info

Publication number
CN116663563A
CN116663563A CN202310931885.6A CN202310931885A CN116663563A CN 116663563 A CN116663563 A CN 116663563A CN 202310931885 A CN202310931885 A CN 202310931885A CN 116663563 A CN116663563 A CN 116663563A
Authority
CN
China
Prior art keywords
entity
task
training
sharing
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310931885.6A
Other languages
Chinese (zh)
Other versions
CN116663563B (en
Inventor
徐童
陈恩红
陈超
吴世伟
许德容
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202310931885.6A priority Critical patent/CN116663563B/en
Publication of CN116663563A publication Critical patent/CN116663563A/en
Application granted granted Critical
Publication of CN116663563B publication Critical patent/CN116663563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Analysing Materials By The Use Of Radiation (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a multi-task negative sampling-based cross-domain entity linking method, a device and a medium, wherein the cross-domain entity linking method is divided into an auxiliary task stage, a multi-task learning model stage and a multi-task learning model parameter training stage, the auxiliary task stage selects entity type prediction as an auxiliary task, and generates labels required by the auxiliary task of the multi-task learning stage; the main body framework of the multi-task learning model is divided into an anchor point sampling module, a bottom layer global sharing module and a high layer local sharing module. The invention utilizes a multi-task learning and negative sampling mode to migrate the capability of identifying the entity obtained in the training field to different testing fields, thereby obtaining the optimal generalization performance of cross-field entity link.

Description

Cross-domain entity linking method, device and medium based on multitasking negative sampling
Technical Field
The invention relates to the field of entity linking in natural language processing and knowledge maps, in particular to a multi-domain entity linking method, device and medium based on a multi-task learning and negative sampling strategy.
Background
Entity linking aims at linking ambiguous entity references to related entities in an already built knowledge base, and the ability of such roughly described entity references to align to precisely described entities themselves in the knowledge base is an important basis for many natural language processing tasks such as knowledge questions and answers, information extraction, text analysis, etc. In order to make this capability more closely approximate to the heterogeneous characteristics of data in real scenes, more challenging cross-domain entity linking tasks are proposed, which utilize entity mention pairs for training in multiple training domains, identify real entities, and test the domain generalization performance of entity links and the ability to identify entities in multiple disparate test domains.
Currently, existing cross-domain entity linking methods focus on the ability to learn complementary and negative sampling by multiple tasks to identify real entities, but they face many challenges: 1) In the complementary representation of the multi-task learning, the problem of insufficient multi-task fusion interaction often exists; 2) The negative sampling method improves the capability of identifying the real entity, but introduces a lot of irrelevant data at the same time, thereby bringing huge calculation cost. Therefore, it is difficult to obtain a satisfactory effect in the prior art.
Aiming at the problems in the prior art, the invention combines the characteristics of the multi-task learning and negative sampling method, designs a cross-domain entity link frame based on the multi-task learning and anchor point sampling method, and performs a large number of experiments on widely used cross-domain entity link data sets, thereby obtaining better effect on objective evaluation indexes and exceeding the previous optimal model.
Disclosure of Invention
The invention aims to solve the defects of the prior art, and provides a cross-domain entity linking method, device and medium based on multitask negative sampling, which utilize multitask learning and negative sampling modes to migrate the capability of identifying entities obtained in the training domain to different testing domains, so as to obtain the optimal generalization performance of the cross-domain entity linking.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a multi-task negative sampling-based cross-domain entity linking method, where the cross-domain entity linking method is divided into an auxiliary task stage, a multi-task learning model stage and a multi-task learning model parameter training stage, where the auxiliary task stage selects entity type prediction as an auxiliary task, and generates a label required by the auxiliary task of the multi-task learning stage; the main body framework of the multi-task learning model is divided into an anchor point sampling module, a bottom global sharing module and a high-level local sharing module, the anchor point sampling module performs anchor point sampling on training data, randomly selects entity reference pairs with the same entity type, binds the entity type with the entity reference pairs to obtain an anchor point set, takes the anchor point set as a unit as training data, and sends the training data to the bottom global sharing module for training; the bottom global sharing module acquires the entity and the text representation mentioned by the entity by using a text encoder, and aggregates the acquired text representations to acquire global features of the entity and the entity mention fused interaction at the bottom respectively; the high-level local sharing module generates text representations which are finer in granularity and deep and used for different-degree task interaction sharing by using the entity and the text representations mentioned by the entity output by the bottom-level global sharing module; the training multitask learning model parameter stage adopts training data of auxiliary tasks to train a type prediction model, a multitask label is generated, and the training pairing completed entity and entity are utilized to mention parameters in the training multitask learning model.
Furthermore, the multi-task negative sampling-based cross-domain entity linking method specifically comprises the following steps:
s1, in an auxiliary task stage, preprocessing input entity type predicted text data, splicing entity mention and entity context to serve as input of auxiliary task training, training an entity type prediction model, inputting entity linked data after training, and generating a corresponding entity type to serve as a label of an entity type prediction task in subsequent multi-task learning;
s2, in a stage of establishing a multi-task learning model, performing multi-task learning by taking entity links as main tasks and taking entity type prediction as auxiliary tasks, wherein the multi-task learning comprises the following steps in sequence:
(1) Before the training data is subjected to multi-task learning, the anchor point sampling module selects a priori condition as a standard, entities under the same condition are divided into one class, then entity mention pairs in the original training data are randomly selected as negative sample pairs according to the selected priori condition, and the negative sample pairs and the original entity mention pairs are statically bound to obtain a final data form as the input of the bottom global sharing module;
(2) The underlying global sharing module carries out multi-granularity and multi-level coding on text description by using a multi-layer text coder in a pre-training model through a given entity and entity mention, respectively obtains text global features fused with different tasks through an attention mechanism, fuses the preferences of a plurality of tasks on global sharing and task specific information, and simultaneously serves as input of a high-level local sharing module;
(3) The high-level local sharing module utilizes a plurality of fine-grained information extractors to deeply extract the bottom layer features mentioned by the entities and the entities to obtain multi-task fusion interaction features which are shared in different degrees and are unique, then utilizes task characteristics to send the text global features output by the bottom layer global sharing module into a gating network of each task, thereby obtaining the preferences of different tasks for sharing in different degrees and privacy, realizing the deep fusion interaction of the bottom layer and the high level, and finally aggregating the preferences with the characterization obtained by the corresponding fine-grained information extractors to obtain high-level fine-grained task sharing interaction characterization;
s3, in a parameter stage of the training multi-task learning model, training of the multi-task learning model utilizes cross entropy loss training entity to link tasks and two classes of cross entropy loss training entity type prediction tasks, and a random gradient descent algorithm is used for updating model parameters.
Furthermore, in the step S1, an entity mention and corresponding context information are spliced and then sent to a pre-trained encoder, the encoder selects a pre-trained model BERT and soft prompts, and the model is subjected to a classifier and then is subjected to label calculation loss to obtain a trained type generation model, and entities in entity link data are sent to obtain type labels of all the entities.
Still further, each sample pair in the set of anchor points is itself a positive sample pair for each sample pair, and when the loss is calculated, all sample pairs in the same batch are negative sample pairs with respect to each other.
Furthermore, the specific implementation steps of the stage of establishing the multi-task learning model are as follows:
(1) The anchor point sampling module refers to a pair for one entityA priori criteria is selected as entity type and entity domain, and the pair set is mentioned from the entities of the same entity type or entity domain +.>Is selected at random->The anchor point samples form an anchor point set +.>Expressed as: />Then taking the anchor point set as a basic unit as training data to be sent into a bottom global sharing module for training;
(2) The bottom global sharing module is used for anchor point collectionOne entity in (a) mentions p->Splicing entity mention and entity context, and splicing entity name and description to obtain input +.>Expressed as:then ∈>Respectively sending the text semantic features into entity and pre-training model BERT mentioned by the entity to obtain serialized bottom global sharing characterization +.>Expressed as: />,/>As the input of the depth sharing interaction of the high-level local sharing module;
(3) The high-level local sharing module uses three single-layer Transformer encoders as extractors for sharing characterization obtained through the bottom-layer global sharing module, and obtains different local fine granularity information characterization from the bottom-layer sharing characterization, wherein the different local fine granularity information characterization is respectively expressed as:
modeling sharing interaction of different tasks on local information, and sending bottom shared information into a plurality of gating networks consisting of a linear layer and a normalization layer to obtain the representation of different tasks on different local fine granularity information->Then the corresponding weights and the characteristics are added by dot multiplication to obtain high-level shared characteristics of different tasks +.>High-level sharing characteristics obtained by each taskAnd underlying sharing feature->Respectively and correspondingly splicing to obtain the final high-level sharing characteristicsThen, respectively sending different optimization targets, < + >>Expressed as:wherein->Representing a stitching operation.
Further, for entity type prediction of auxiliary tasks, in particular, the final high-level sharing featureFeeding into classifier to obtain->Then, calculating the loss function with the tags, respectively>And obtaining a type of optimization result, wherein the two loss functions are respectively expressed as: />Wherein->Representing the number of entity categories->Indicate->Tag prediction value->Representing entity mention pair->Middle entity->Inputting the result obtained by the text encoder, +.>Representing entity mention pair->In->Inputting a result obtained by the text encoder; for the main task entity linking, in particular the final high-level sharing feature +.>Calculate scoring function->Expressed as: />By cross entropy loss function->Optimizing entity linking tasks->Expressed as: />Wherein->A data size representing each batch sample; finally, combining the loss functions of the auxiliary task and the main task to obtain a joint training function +.>Expressed as: />=/>Wherein->The weight of the two task loss functions is used for measuring the importance of the tasks and balancing different convergence speeds and numerical scales of a plurality of tasks.
Furthermore, in the training process of the multi-task learning model, a random gradient descent algorithm is used for optimizing multi-task joint loss consisting of a cross entropy loss function and a two-class cross entropy loss function, and the used optimizer is a random gradient descent optimizer and propagates optimization parameters reversely.
Still further, the random gradient descent optimizer size of each batchFor 128, the initial learning rate is set to 0.00002, the number of negative samples +.>3.
In a second aspect, the present invention provides a cross-domain entity linking apparatus, including a memory and a processor, where the memory stores computer executable instructions, and the processor is configured to execute the computer executable instructions, and the computer executable instructions implement the multi-task negative sampling-based cross-domain entity linking method when executed by the processor.
In a third aspect, the present invention provides a computer readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the multi-tasking negative sampling based cross-domain entity linking method.
Compared with the prior art, the invention has the beneficial effects that:
the main framework of the multi-task learning model is divided into an anchor point sampling module, a bottom global sharing module and a high-level local sharing module, wherein the anchor point sampling module improves the capability of identifying real entities by utilizing a negative sampling mode before multi-task learning of training data, and improves the cross-domain capability by utilizing the diversity of conditional data distribution. The underlying global sharing module acquires the entity and text characterization mentioned by the entity by using a text encoder, and aggregates the acquired characterization to acquire global features of the entity and the entity mention fused interaction at the underlying layer respectively. The high-level local sharing module generates text representations which are finer-grained and deeper and can be used for different-degree task interaction sharing for the output entities and the representations of the entity references. The invention fully utilizes the complementary representation of multi-task learning, simultaneously fully utilizes the information interaction with different thickness granularities and the information interaction sharing among different tasks, and considers the balance of the capability of identifying the real entity and the calculation cost brought by a negative sampling mode, thereby obtaining better effect on the inter-domain entity link.
The invention can train the type prediction model by using the training data of the auxiliary task, generate the multi-task label, and train the parameters in the multi-task frame by using the paired entities and entity mention in the training field. When model parameter training is completed, entity link detection can be performed in completely different test fields, so that cross-field performance of the framework is verified. The invention utilizes a multi-task learning and negative sampling mode to migrate the capability of identifying the entity obtained in the training field to different testing fields, thereby obtaining the optimal generalization performance of cross-field entity link. The invention can fully utilize the existing data to learn out the cross-domain entity link model with good effect, and achieves the best result of the cross-domain entity link.
Drawings
Fig. 1 is a schematic diagram of auxiliary task tag generation in embodiment 1.
Fig. 2 is a schematic diagram of a main body frame of the multi-task learning model in embodiment 1.
Fig. 3 is a diagram showing the comparison effect of the anchor point sampling module in embodiment 1 and the conventional method.
Detailed Description
Example 1:
the embodiment discloses a multi-task negative sampling-based cross-domain entity linking method, which comprises an auxiliary task stage, a multi-task learning model establishment stage and a multi-task learning model parameter training stage, wherein the auxiliary task stage selects entity type prediction as an auxiliary task and generates labels required by the auxiliary task of the multi-task learning stage; the main body framework of the multi-task learning model is divided into an anchor point sampling module, a bottom global sharing module and a high-level local sharing module, the anchor point sampling module performs anchor point sampling on training data, entity reference pairs with the same entity type are randomly selected, the entity type and the entity reference pairs are bound, an anchor point set is obtained and then used as training data by taking the anchor point set as a unit, each sample pair in the anchor point set is sent into the bottom global sharing module for training, each sample pair is a positive sample pair, and when loss is calculated, all sample pairs in the same batch are negative sample pairs. The bottom global sharing module acquires the entity and the text representation mentioned by the entity by using a text encoder, and aggregates the acquired text representations to acquire global features of the entity and the entity mention fusion interaction at the bottom respectively; the high-level local sharing module generates text representations which are finer in granularity and deep and used for different-degree task interaction sharing by using the entity and the text representations mentioned by the entity output by the bottom-level global sharing module; and in the stage of training the parameters of the multi-task learning model, training a type prediction model by using training data of auxiliary tasks, generating a multi-task label, and referring to the parameters in the multi-task learning model by using the entity and the entity of which the training pairing is completed.
Referring to fig. 1 to 3, in this embodiment, the cross-domain entity linking method specifically includes the following steps:
s1, in an auxiliary task stage, preprocessing input entity type predicted text data, splicing entity mention and entity context to serve as input of auxiliary task training, training an entity type prediction model, inputting entity linked data after training, and generating a corresponding entity type to serve as a label of an entity type prediction task in subsequent multi-task learning; in the embodiment, an entity mention and corresponding context information are spliced and then sent to a pre-trained encoder, the encoder selects a pre-trained model BERT and soft prompt, the model BERT and the soft prompt are subjected to a classifier and then are subjected to label calculation loss to obtain a trained type generation model, and entities in entity link data are sent to obtain type labels of all the entities.
S2, in a stage of establishing a multi-task learning model, performing multi-task learning by taking entity links as main tasks and taking entity type prediction as auxiliary tasks, wherein the multi-task learning comprises the following steps in sequence:
(1) Before the training data is subjected to multi-task learning, the anchor point sampling module selects a priori condition as a standard, entities under the same condition are divided into one class, then entity mention pairs in the original training data are randomly selected as negative sample pairs according to the selected priori condition, and the negative sample pairs and the original entity mention pairs are statically bound to obtain a final data form as the input of the bottom global sharing module;
(2) The underlying global sharing module carries out multi-granularity and multi-level coding on text description by using a multi-layer text coder in a pre-training model through a given entity and entity mention, respectively obtains text global features fused with different tasks through an attention mechanism, fuses the preferences of a plurality of tasks on global sharing and task specific information, and simultaneously serves as input of a high-level local sharing module;
(3) The high-level local sharing module utilizes a plurality of fine-grained information extractors to deeply extract the bottom layer features mentioned by the entities and the entities to obtain multi-task fusion interaction features which are shared in different degrees and are unique, then utilizes task characteristics to send the text global features output by the bottom layer global sharing module into a gating network of each task, thereby obtaining the preferences of different tasks for sharing in different degrees and privacy, realizing the deep fusion interaction of the bottom layer and the high level, and finally aggregating the preferences with the characterization obtained by the corresponding fine-grained information extractors to obtain high-level fine-grained task sharing interaction characterization;
s3, in a parameter stage of the training multi-task learning model, training of the multi-task learning model utilizes cross entropy loss training entity to link tasks and two classes of cross entropy loss training entity type prediction tasks, and a random gradient descent algorithm is used for updating model parameters. In the training process of the multi-task learning model, the cross entropy loss function and the two-class cross entropy loss function contained in the multi-task joint loss are optimized by using a random gradient descent algorithm, and the used optimizers are random gradient descent optimizers and back-propagation optimization parameters. The size of each batch of the random gradient descent optimizer is 128, the initial learning rate is set to 0.00002, and the number of negative samples is calculated3.
In this embodiment, the specific implementation steps of the stage of establishing the multi-task learning model are as follows:
(1) The anchor point sampling module references pairs for an entityA priori criteria is selected as entity type and entity domain, and the pair set is mentioned from the entities of the same entity type or entity domain +.>Is selected at random->The anchor point samples form an anchor point set +.>Expressed as: />Then taking the anchor point set as a basic unit as training data to be sent into a bottom global sharing module for training;
(2) The underlying global sharing module is used for anchor point collectionOne entity in (a) mentions p->Splicing entity mention and entity context, and splicing entity name and description to obtain input +.>Expressed as:then ∈>Respectively sending the text semantic features into entity and pre-training model BERT mentioned by the entity to obtain serialized bottom global sharing characterization +.>Expressed as: />,/>As the input of the depth sharing interaction of the high-level local sharing module;
(3) The high-level local sharing module uses three single-layer Transformer encoders as extractors for sharing characterization obtained by the bottom-level global sharing module, and obtains different from the bottom-level sharing characterizationIs represented by the following partial fine-grained information representation:modeling sharing interaction of different tasks on local information, sending bottom shared information into a plurality of gating networks formed by a linear layer and a normalization layer to obtain weights of different tasks on different local fine granularity information characterization, and then carrying out dot multiplication addition on the corresponding weights and features to obtain high-level shared features of different tasks->High-level sharing characteristics obtained by each taskAnd underlying sharing feature->Respectively and correspondingly splicing to obtain the final high-level sharing characteristicsThen, respectively sending different optimization targets, < + >>Expressed as:wherein->Representing a stitching operation.
Wherein for entity type prediction of auxiliary tasks, in particular, the final high-level sharing characteristicsFeeding into classifier to obtain->Then, calculating the loss function with the tags, respectively>And obtaining a type of optimization result, wherein the two loss functions are respectively expressed as: />Wherein->Representing the number of entity categories->Indicate->Tag prediction value->Representing entity mention pair->Middle entity->Inputting the result obtained by the text encoder, +.>Representing entity mention pair->In->Inputting a result obtained by the text encoder; for the main task entity linking, in particular the final high-level sharing feature +.>Calculate scoring function->Expressed as: />By cross entropy loss function->Optimizing entity linking tasks->Expressed as:wherein->A data size representing each batch sample; finally, combining the loss functions of the auxiliary task and the main task to obtain a joint training function +.>Expressed as: />=Wherein->The weight of the two task loss functions is used for measuring the importance of the tasks and balancing different convergence speeds and numerical scales of a plurality of tasks.
Example 2:
the embodiment discloses a cross-domain entity linking device, which comprises a memory and a processor, wherein the memory stores computer executable instructions, the processor is configured to run the computer executable instructions, and the computer executable instructions are executed by the processor to realize the cross-domain entity linking method based on the multi-task negative sampling disclosed in the embodiment 1.
Example 3:
the present embodiment discloses a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the multi-task negative sampling-based cross-domain entity linking method disclosed in the present embodiment 1.

Claims (10)

1. The cross-domain entity linking method based on the multi-task negative sampling is characterized by comprising an auxiliary task stage, a multi-task learning model establishment stage and a multi-task learning model parameter training stage, wherein the auxiliary task stage selects entity type prediction as an auxiliary task and generates labels required by the auxiliary task of the multi-task learning stage; the main body framework of the multi-task learning model is divided into an anchor point sampling module, a bottom global sharing module and a high-level local sharing module, the anchor point sampling module performs anchor point sampling on training data, randomly selects entity reference pairs with the same entity type, binds the entity type with the entity reference pairs to obtain an anchor point set, takes the anchor point set as a unit as training data, and sends the training data to the bottom global sharing module for training; the bottom global sharing module acquires the entity and the text representation mentioned by the entity by using a text encoder, and aggregates the acquired text representations to acquire global features of the entity and the entity mention fused interaction at the bottom respectively; the high-level local sharing module generates text representations which are finer in granularity and deep and used for different-degree task interaction sharing by utilizing the entity and the text representations mentioned by the entity output by the bottom-level global sharing module; the training multitask learning model parameter stage adopts training data of auxiliary tasks to train a type prediction model, a multitask label is generated, and the training pairing completed entity and entity are utilized to mention parameters in the training multitask learning model.
2. The multi-task negative sampling-based cross-domain entity linking method according to claim 1, comprising the specific steps of:
s1, in an auxiliary task stage, preprocessing input entity type predicted text data, splicing entity mention and entity context to serve as input of auxiliary task training, training an entity type prediction model, inputting entity linked data after training, and generating a corresponding entity type to serve as a label of an entity type prediction task in subsequent multi-task learning;
s2, in a stage of establishing a multi-task learning model, taking an entity link as a main task and an entity type prediction task as an auxiliary task to perform multi-task learning, and sequentially dividing the multi-task learning into the following steps:
(1) Before the training data is subjected to multi-task learning, the anchor point sampling module selects a priori condition as a standard, entities under the same condition are divided into one class, then entity mention pairs in the original training data are randomly selected as negative sample pairs according to the selected priori condition, and the negative sample pairs and the original entity mention pairs are statically bound to obtain a final data form as the input of the bottom global sharing module;
(2) The underlying global sharing module carries out multi-granularity and multi-level coding on text description by using a multi-layer text coder in a pre-training model through a given entity and entity mention, respectively obtains text global features fused with different tasks through an attention mechanism, fuses the preferences of a plurality of tasks on global sharing and task specific information, and simultaneously serves as input of a high-level local sharing module;
(3) The high-level local sharing module utilizes a plurality of fine-grained information extractors to deeply extract the bottom layer features mentioned by the entities and the entities to obtain multi-task fusion interaction features which are shared in different degrees and are unique, then utilizes task characteristics to send the text global features output by the bottom layer global sharing module into a gating network of each task, thereby obtaining the preferences of different tasks for sharing in different degrees and privacy, realizing the deep fusion interaction of the bottom layer and the high level, and finally aggregating the preferences with the characterization obtained by the corresponding fine-grained information extractors to obtain high-level fine-grained task sharing interaction characterization;
s3, in a parameter stage of the training multi-task learning model, training of the multi-task learning model utilizes cross entropy loss training entity to link tasks and two classes of cross entropy loss training entity type prediction tasks, and a random gradient descent algorithm is used for updating model parameters.
3. The multi-task negative sampling-based cross-domain entity linking method according to claim 2, wherein in the step S1, an entity mention and corresponding context information are spliced and sent to a pre-trained encoder, the encoder selects a pre-trained model BERT and soft prompts, and the model is subjected to loss calculation with a label after passing through a classifier to obtain a trained type generation model, and entities in entity linking data are sent to obtain type labels of all the entities.
4. The multi-tasking negative sampling based cross-domain entity linking method of claim 3 wherein each sample pair in the set of anchor points is itself a positive sample pair for each sample pair and when loss is calculated, all sample pairs in the same batch are negative sample pairs to each other.
5. The multi-task negative sampling-based cross-domain entity linking method according to claim 3, wherein the specific implementation steps of the stage of establishing the multi-task learning model are as follows:
(1) The anchor point sampling module refers to a pair for one entityA priori criteria is first selected as entity type or entity domain, the set of pairs is mentioned from the entities of the same entity type or entity domain +.>Is selected at random->The anchor point samples form an anchor point set +.>Expressed as: />Then byThe anchor point set is used as a basic unit and is used as training data to be sent to a bottom global sharing module for training;
(2) The bottom global sharing module is used for anchor point collectionOne entity in (a) mentions p->Splicing entity mention and entity context, and splicing entity name and description to obtain input +.>Expressed as:then ∈>Respectively sending the text semantic features into entity and pre-training model BERT mentioned by the entity to obtain serialized bottom global sharing characterization +.>Expressed as:,/>as the input of the depth sharing interaction of the high-level local sharing module;
(3) The high-level local sharing module uses three single-layer Transformer encoders as extractors for sharing characterization obtained through the bottom-layer global sharing module, and obtains different local fine granularity information characterization from the bottom-layer sharing characterization, wherein the different local fine granularity information characterization is respectively expressed as:modeling sharing interaction of different tasks on local information, sending bottom shared information into a plurality of gating networks formed by a linear layer and a normalization layer to obtain weights of different tasks on different local fine granularity information characterization, and then carrying out dot multiplication addition on the corresponding weights and features to obtain high-level shared features of different tasks->High-level sharing characteristics obtained by each taskAnd underlying sharing feature->Respectively and correspondingly splicing to obtain the final high-level sharing characteristicsThen, respectively sending different optimization targets, < + >>Expressed as:wherein->Representing a stitching operation.
6. The multi-tasking negative sampling based cross-domain entity linking method of claim 5 wherein for entity type prediction of auxiliary tasks, in particular, final high-level sharing features are to be performedFeeding into classifier to obtain->Then, calculating the loss function with the tags, respectively>And obtaining a type of optimization result, wherein the two loss functions are respectively expressed as: />Wherein->Representing the number of entity categories->Indicate->Tag prediction value->Representing entity mention pair->Middle entity->Inputting the result obtained by the text encoder, +.>Representing entity mention pair->In->Inputting a result obtained by the text encoder; for main task entity link, haveThe body is to share the characteristics of the final higher layer +.>Calculate scoring function->Expressed as: />By cross entropy loss function->Optimizing entity linking tasks->Expressed as:wherein->A data size representing each batch sample; finally, combining the loss functions of the auxiliary task and the main task to obtain a joint training function +.>Expressed as: />=Wherein->The weight of the two task loss functions is used for measuring the importance of the tasks and balancing different convergence speeds and numerical scales of a plurality of tasks.
7. The multi-domain entity linking method based on multi-task negative sampling according to claim 6, wherein in the training process of the multi-task learning model, a random gradient descent algorithm is used to optimize a multi-task joint loss function consisting of a cross entropy loss function and a two-class cross entropy loss function, and the used optimizer is a random gradient descent optimizer and the optimized parameters are back-propagated.
8. The multi-tasking negative sampling based cross-domain entity linking method of claim 7 wherein the random gradient descent optimizer size of each lotFor 128, the initial learning rate is set to 0.00002, the number of negative samples +.>3.
9. A cross-domain entity linking apparatus comprising a memory storing computer executable instructions and a processor configured to execute the computer executable instructions, wherein the computer executable instructions when executed by the processor implement the multi-tasking negative sampling based cross-domain entity linking method of any of claims 1 to 8.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the multi-tasking negative sampling based cross-domain entity linking method of any of claims 1 to 8.
CN202310931885.6A 2023-07-27 2023-07-27 Cross-domain entity linking method, device and medium based on multitasking negative sampling Active CN116663563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310931885.6A CN116663563B (en) 2023-07-27 2023-07-27 Cross-domain entity linking method, device and medium based on multitasking negative sampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310931885.6A CN116663563B (en) 2023-07-27 2023-07-27 Cross-domain entity linking method, device and medium based on multitasking negative sampling

Publications (2)

Publication Number Publication Date
CN116663563A true CN116663563A (en) 2023-08-29
CN116663563B CN116663563B (en) 2023-11-17

Family

ID=87712107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310931885.6A Active CN116663563B (en) 2023-07-27 2023-07-27 Cross-domain entity linking method, device and medium based on multitasking negative sampling

Country Status (1)

Country Link
CN (1) CN116663563B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032585A (en) * 2021-05-31 2021-06-25 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Document-level entity relation extraction method based on document structure and external knowledge
US20220358292A1 (en) * 2021-07-20 2022-11-10 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for recognizing entity, electronic device and storage medium
US20230229859A1 (en) * 2022-01-14 2023-07-20 International Business Machines Corporation Zero-shot entity linking based on symbolic information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032585A (en) * 2021-05-31 2021-06-25 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Document-level entity relation extraction method based on document structure and external knowledge
US20220358292A1 (en) * 2021-07-20 2022-11-10 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for recognizing entity, electronic device and storage medium
US20230229859A1 (en) * 2022-01-14 2023-07-20 International Business Machines Corporation Zero-shot entity linking based on symbolic information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIAO CHENG等: "KB-QA based on multi-task learning and negative sample generation", 《INFORMATION SCIENCES》 *
SHUANGLI LI等: "Multi-Temporal Relationship Inference in Urban Areas", 《MACHINE LEARNING》 *

Also Published As

Publication number Publication date
CN116663563B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
US20220129731A1 (en) Method and apparatus for training image recognition model, and method and apparatus for recognizing image
CN109960810B (en) Entity alignment method and device
CN111339255B (en) Target emotion analysis method, model training method, medium, and device
Qianna Evaluation model of classroom teaching quality based on improved RVM algorithm and knowledge recommendation
Penha et al. Curriculum learning strategies for IR: An empirical study on conversation response ranking
CN111666416A (en) Method and apparatus for generating semantic matching model
CN115131698B (en) Video attribute determining method, device, equipment and storage medium
CN115827954B (en) Dynamic weighted cross-modal fusion network retrieval method, system and electronic equipment
CN111400473A (en) Method and device for training intention recognition model, storage medium and electronic equipment
CN114820871A (en) Font generation method, model training method, device, equipment and medium
CN113806487A (en) Semantic search method, device, equipment and storage medium based on neural network
CN116956116A (en) Text processing method and device, storage medium and electronic equipment
CN111737438B (en) Data processing method and device based on text similarity and electronic equipment
CN116776855A (en) LLaMA model-based method, device and equipment for solving autonomous learning of vocational education machine
CN116663563B (en) Cross-domain entity linking method, device and medium based on multitasking negative sampling
CN116452895A (en) Small sample image classification method, device and medium based on multi-mode symmetrical enhancement
Kumar et al. Relevance of data mining techniques in edification sector
CN112598662B (en) Image aesthetic description generation method based on hidden information learning
CN113221577A (en) Education text knowledge induction method, system, equipment and readable storage medium
Geng et al. FEAIS: Facial Emotion Recognition Enabled Education Aids IoT System for Online Learning
Wang et al. Self-paced knowledge distillation for real-time image guided depth completion
Wang et al. Multi‐Task and Attention Collaborative Network for Facial Emotion Recognition
CN114328797B (en) Content search method, device, electronic apparatus, storage medium, and program product
CN116030526B (en) Emotion recognition method, system and storage medium based on multitask deep learning
Shen Analysis and Research on the Characteristics of Modern English Classroom Learners’ Concentration Based on Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant