CN111613341B

CN111613341B - Entity linking method and device based on semantic components

Info

Publication number: CN111613341B
Application number: CN202010443446.7A
Authority: CN
Inventors: 史亚飞
Original assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2024-02-02
Anticipated expiration: 2040-05-22
Also published as: CN111613341A

Abstract

The invention provides an entity linking method and device based on semantic components, which relate to the technical field of computers and comprise the steps of acquiring an entity to be linked from a medical data set; determining a standard entity candidate set of the entity to be linked in a medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities which are most similar to the entity to be linked; determining the semantic similarity scores of all standard entities in the candidate set and the entity to be linked through a pre-trained semantic enhancement model; and linking the entity to be linked to a standard entity with the highest semantic similarity score in the candidate set. Therefore, the semantic component information can be added into the entity link process, so that the semantic enhancement model can learn more information, and the entity link accuracy is higher.

Description

Entity linking method and device based on semantic components

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a semantic component-based entity linking method and apparatus.

Background

In the processing of clinical medical record big data, due to differences of regions, hospitals, doctors, standards and the like, the same entity often has a large number of different expression modes, and the data can be effectively counted and calculated only by accurately identifying the same entity and aiming at a limited entity space. Thus, the medical term entity linking is an essential part of the data processing process.

The existing entity linking method generally reduces the number of candidates through algorithms such as classification, and then obtains the closest candidate through similarity calculation. As a core algorithm of the existing entity link system, similarity calculation generally models object features, converts the features into vectors, and measures the similarity by calculating the distance between the vectors.

In the existing entity linking method, a large amount of labeling corpus is generally required, and professional medical knowledge is difficult to add into the features for calculation. In addition, the entity linking method based on similarity calculation can well handle the situation that the candidate difference is large, but is generally difficult to handle when facing the situation that the candidates are close. Especially algorithms based on neural networks cannot well utilize medical related knowledge, and the calculation process of the algorithms cannot be explained. Therefore, in big data processing for the medical field, there is a need for a medical term entity linking method to solve the above-mentioned problems.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The invention aims to provide a semantic component-based entity linking method and device, so as to solve the technical problem of low entity linking accuracy in the prior art.

In a first aspect, an embodiment of the present invention provides a semantic component-based entity linking method, including:

acquiring an entity to be linked from a medical data set;

determining a standard entity candidate set of the entity to be linked in a medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities which are most similar to the entity to be linked;

determining the semantic similarity scores of all standard entities in the candidate set and the entities to be linked through a pre-trained semantic enhancement model, wherein the semantic enhancement model comprises a first coding layer and a second coding layer, the first coding layer is used for coding based on semantic component information, and the first coding layer is used for coding based on a bi-directional encoder BERT;

and linking the entity to be linked to a standard entity with the highest semantic similarity score in the candidate set.

In an alternative embodiment, the step of determining the semantic similarity score between each standard entity in the candidate set and the entity to be linked through a pre-trained semantic enhancement model includes:

determining semantic enhancement codes of all standard entities in the candidate set and semantic enhancement codes of entities to be linked through a pre-trained semantic enhancement model;

and determining the semantic similarity scores of the standard entities in the candidate set and the entity to be linked according to the semantic enhancement codes of the entity to be linked and the semantic enhancement codes of the standard entities in the candidate set.

In an alternative embodiment, the step of determining, by means of a pre-trained semantic enhancement model, the semantic enhancement coding of each standard entity in the candidate set and the semantic enhancement coding of the entity to be linked comprises:

selecting current entities from the entity to be linked and the standard entity candidate set, and executing the following steps for each current entity until the semantic enhancement coding of each current entity is determined:

predicting the current entity based on a pre-trained semantic component analysis model, and determining a tag sequence corresponding to the current entity, wherein the semantic component analysis model comprises a first coding layer and a labeling layer, the first coding layer is used for generating a first code based on the current entity, the labeling layer is used for generating the tag sequence based on the first code, and the tag is used for indicating semantic components of the training text;

determining, by the second encoding layer, a second encoding based on the tag sequence;

and combining the first code and the second code to obtain the semantic enhancement code.

In an alternative embodiment, the step of determining, by the second encoding layer, a second encoding based on the tag sequence comprises:

based on a plurality of tag sequences, establishing a corresponding relation between the tag and the ID;

converting the tag sequence into an ID sequence based on the correspondence;

converting the ID sequence to one-hot encoding;

filling the single thermal code based on the code size of the code layer;

and taking the one-hot code after filling as a second code.

In an alternative embodiment, the labeling layer includes a BiLSTM layer and a CRF layer.

In an alternative embodiment, the method further comprises:

training an initial semantic component analysis model based on a predetermined training sample to obtain a pre-trained semantic component analysis model, wherein the initial semantic component analysis model comprises a first coding layer and an initial labeling layer of the labeling layer, and the first coding layer is pre-trained.

In alternative embodiments, the entity to be linked includes one or more of disease terms, surgical terms, symptom terms, pharmaceutical terms, examination terms.

In a second aspect, an embodiment of the present invention provides a semantic component-based entity linking apparatus, including:

the acquisition module is used for acquiring the entity to be linked from the medical data set;

the determining module is used for determining a standard entity candidate set of the entity to be linked in the medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities which are most similar to the entity to be linked;

the scoring module is used for determining the semantic similarity score of each standard entity in the candidate set and the entity to be linked through a pre-trained semantic enhancement model, wherein the semantic enhancement model comprises a first coding layer and a second coding layer, the first coding layer is used for coding based on semantic component information, and the first coding layer is used for coding based on a bi-directional encoder BERT;

and the link module is used for linking the entity to be linked to the standard entity with the highest semantic similarity score in the candidate set.

In a third aspect, an embodiment of the present invention provides a computer, including a thermometer, a memory, and a processor, where the memory stores a computer program executable on the processor, and the processor implements the steps of the method according to any one of the foregoing embodiments when the processor executes the computer program.

In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to perform the method of any of the preceding embodiments.

The invention provides an entity linking method and device based on semantic components. Obtaining an entity to be linked from the medical data set; determining a standard entity candidate set of the entity to be linked in a medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities which are most similar to the entity to be linked; determining the semantic similarity scores of all standard entities in the candidate set and the entity to be linked through a pre-trained semantic enhancement model; and linking the entity to be linked to a standard entity with the highest semantic similarity score in the candidate set. Therefore, the semantic component information can be added into the entity link process, so that the semantic enhancement model can learn more information, and the entity link accuracy is higher.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an entity linking method based on semantic components according to an embodiment of the present application;

FIG. 2 is an example of a semantic component based entity linking method provided by embodiments of the present application;

FIG. 3 is another example of a semantic component based entity linking method provided by embodiments of the present application;

FIG. 4 is an example of a bid point in a semantic component based entity linking method provided by embodiments of the present application;

fig. 5 is a schematic structural diagram of an entity linking device based on semantic components according to an embodiment of the present application;

fig. 6 is a schematic diagram of a computer structure according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

Some embodiments of the present invention are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Fig. 1 is a schematic flow chart of an entity linking method based on semantic components according to an embodiment of the present invention. The method may be applied to a computer, as shown in fig. 1, and may include:

s110, acquiring entities to be linked from the medical data set;

the medical data set mainly refers to text data which is generated in the process of medical activities and needs to be subjected to entity linking, and can be medical activity record texts such as medical records, medical orders, nursing documents, examination reports and the like. The entity to be linked mainly refers to medical terms with different expression modes, and can be one or more of disease terms, operation terms, symptom terms, medicine terms and examination terms.

S120, determining a standard entity candidate set of the entity to be linked in the medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities most similar to the entity to be linked;

according to the characteristics of the entity to be linked, performing preliminary screening in the medical knowledge graph to obtain a plurality of standard entities which are most similar to the entity to be linked, wherein the number of the standard entities can be determined according to actual needs. The closest plurality of standard entities together form a candidate set. The feature of the entity to be linked can be any feature with identification function and distinguishing function, such as its own structural feature, part-of-speech feature, semantic feature, and contextual feature in medical text.

For example, the entity to be linked may be subjected to word segmentation to obtain one or more word segmentation units. The word segmentation process may be implemented by a word segmentation tool, such as jieba. The candidate set can be obtained through screening of N-gram characteristics in the medical knowledge graph, and the answer space of a subsequent model can be effectively reduced through screening, so that the efficiency of the entity link flow is improved. For example, first, 5-order N-gram features of an entity to be linked in a medical knowledge graph may be extracted, and an inverted row between the features and a standard entity may be established, where the more candidates are included, the lower the feature weight value is. And calculating the weight sum of all the characteristics of the entity to be linked during searching to obtain the score of the phrase to be normalized, and using the score as a search similarity score for final calculation.

S130, determining semantic similarity scores of all standard entities in the candidate set and the entities to be linked through a pre-trained semantic enhancement model, wherein the semantic enhancement model comprises a first coding layer and a second coding layer, the first coding layer is used for coding based on semantic component information, and the first coding layer is used for coding based on a bi-directional encoder BERT.

Each standard entity and the entity to be linked in the candidate set can be input into a pre-trained semantic enhancement model to obtain semantic enhancement codes, and then a semantic similarity score is determined based on the similarity between the semantic enhancement codes. For example, the semantic similarity is determined based on cosine distances between semantic enhancement encodings of different entities.

And S140, linking the entity to be linked to the standard entity with the highest semantic similarity score in the candidate set.

For example, the semantic similarity scores in the candidate set determined in step S130 may be ranked, and the first ranked standard entity may be determined to be linked with the entity to be linked.

According to the embodiment of the invention, the semantic component information can be added into the entity linking process, so that the semantic enhancement model can learn more information, and the accuracy rate of entity linking is higher.

In some embodiments, as shown in fig. 2, the step S130 may be specifically implemented by the following steps:

s210, determining semantic enhancement codes of all standard entities in the candidate set and semantic enhancement codes of entities to be linked through a pre-trained semantic enhancement model;

s220, determining semantic similarity scores of all standard entities in the candidate set and the entity to be linked according to the semantic enhancement codes of the entity to be linked and the semantic enhancement codes of all standard entities in the candidate set.

In some embodiments, as shown in fig. 3, the step S210 may be specifically implemented by the following steps:

s310, selecting current entities from the entity to be linked and the standard entity candidate set, and executing the following steps for each current entity until determining the semantic enhancement coding of each current entity:

s320, predicting the current entity based on a pre-trained semantic component analysis model, and determining a label sequence corresponding to the current entity.

The semantic component analysis model comprises a first coding layer and a labeling layer, wherein the first coding layer is used for generating a first code based on a current entity, the labeling layer is used for generating a label sequence based on the first code, and the label is used for indicating semantic components of training texts;

s330, determining a second code based on the tag sequence through a second code layer;

s340, combining the first code and the second code to obtain the semantic enhancement code.

In some embodiments, the step S330 may specifically include the following steps:

step 1), based on a plurality of tag sequences, establishing a corresponding relation between a tag and an ID;

step 2), converting the tag sequence into an ID sequence based on the corresponding relation;

step 3), converting the ID sequence into one-hot coding;

step 4), filling the single thermal code based on the code size of the code layer;

and 5) taking the filled single thermal code as a second code.

As an example, as shown in fig. 4, the semantic component parsing model may be a bert+bilstm+crf model. The marking layer comprises a BiLSTM layer and a CRF layer, and the first coding layer is a BERT layer.

The invention is further described below with reference to the "right cup otology" surgical entity.

Step a), adopting BERT+BiLSTM+CRF model to analyze semantic components of the right cup ear rectification operation entity, and analyzing the semantic components as 'azimuth': right side, part: cup-shaped ear, surgical formula: orthotics. BIO-based labels, the semantic label sequences corresponding to each character of the entity are 'B-side, I-side, B-body part, I-body part, B-shhi, I-shhi and I-shhi'. The separators [ CLS ], [ SEP ] can be added before and after, respectively.

And b), performing one-hot coding (first coding) on the tag sequence, and adding the one-hot coding into the coding of the BERT (second coding) to generate the semantic enhancement coding. Wherein, the step b) can be realized by the following steps:

step B-1), based on the labels [ O, B-side, I-side, B-body part, I-body part, B-shhi, I-shhi, [ CLS ], [ SEP ] ] included in the label sequence, establishing a corresponding relation with IDs [0,1,2,3,4,5,6,7,8 ].

Step b-2) using the tag in the ID tag sequence to obtain the ID sequence [7,1,2,3,4,4,5,6,6,8].

Step b-3), converting the ID in the ID sequence into a one-hot coding sequence to obtain [ [000000010], [010000000], [001000000], [000100000], [000010000], [000010000], [000001000], [000000100], [000000100], [000000100] ].

Step b-4), since the size (size) of the code of BERT is larger than the size of the one-hot code, it is necessary to fill all the one-hot codes in the one-hot code sequence with "0". If the BERT model code size is 768 and the one-ho code size is 9, the "768-9=759" 0 "s need to be filled to obtain the label one-hot code.

Step b-5), adding the label one-hot code and the token code of the BERT itself, position code and segment code as semantic enhancement code. Representing the semantic enhancement code e with equation (one) _enhance The following are provided:

e _enhance ＝e _token +e _postion +e _segment +e _label formula 1

In addition, in the embodiment of the invention, the initial semantic component analysis model can be trained based on a predetermined training sample to obtain a pre-trained semantic component analysis model, wherein the initial semantic component analysis model comprises a first coding layer and an initial labeling layer of the labeling layer, and the first coding layer is pre-trained.

Fig. 5 is a schematic structural diagram of an entity linking device based on semantic components according to an embodiment of the present invention. As shown in fig. 5, the apparatus may include:

an obtaining module 501, configured to obtain an entity to be linked from a medical data set;

a determining module 502, configured to determine a standard entity candidate set of the entity to be linked in the medical knowledge graph, where the standard entity candidate set includes a plurality of standard entities most similar to the entity to be linked;

a scoring module 503, configured to determine, through a pre-trained semantic enhancement model, a semantic similarity score between each standard entity in the candidate set and the entity to be linked, where the semantic enhancement model includes a first encoding layer and a second encoding layer, the first encoding layer is configured to encode based on semantic component information, and the first encoding layer encodes based on a bi-directional encoder BERT;

a linking module 504, configured to link the entity to be linked to the standard entity with the highest semantic similarity score in the candidate set.

In some embodiments, scoring module 503 is specifically configured to:

and determining semantic similarity scores of all the standard entities in the candidate set and the entity to be linked according to the semantic enhancement codes of the entity to be linked and the semantic enhancement codes of all the standard entities in the candidate set.

In some embodiments, scoring module 503 is specifically configured to:

predicting a current entity based on a pre-trained semantic component analysis model, and determining a tag sequence corresponding to the current entity, wherein the semantic component analysis model comprises a first coding layer and a labeling layer, the first coding layer is used for generating a first code based on the current entity, the labeling layer is used for generating the tag sequence based on the first code, and the tag is used for indicating semantic components of a training text;

In some embodiments, scoring module 503 is specifically configured to:

converting the tag sequence into an ID sequence based on the correspondence;

converting the ID sequence into a single-hot code;

filling the single thermal code based on the code size of the code layer;

and taking the filled one-hot code as a second code.

In some embodiments, the labeling layers include a BiLSTM layer and a CRF layer.

In some embodiments, the method further includes a training module, configured to train an initial semantic component analysis model based on a predetermined training sample, to obtain a pre-trained semantic component analysis model, where the initial semantic component analysis model includes a first coding layer and an initial labeling layer of the labeling layer, and the first coding layer is pre-trained.

In some embodiments, the entity to be linked includes one or more of a disease term, a surgical term, a symptom term, a medication term, an examination term.

The entity linking device based on the semantic components provided by the embodiment of the application has the same technical characteristics as the entity linking method based on the semantic components provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.

As shown in fig. 6, a computer 700 provided in an embodiment of the present application includes: a processor 701, a memory 702 and a bus, the memory 702 storing machine readable instructions executable by the processor 701, the processor 701 and the memory 702 communicating via the bus when the computer 700 is running, the processor 701 executing machine readable instructions to perform the steps of the semantic component based entity linking method as described above.

In particular, the memory 702 and the processor 701 can be general-purpose memories and processors, which are not limited herein, and the entity linking method based on semantic components can be performed when the processor 701 runs a computer program stored in the memory 702.

Corresponding to the above-mentioned entity linking method based on semantic components, the embodiments of the present application further provide a computer readable storage medium storing machine executable instructions that, when invoked and executed by a processor, cause the processor to execute the steps of the above-mentioned entity linking method based on semantic components.

The entity linking device based on the semantic component provided by the embodiment of the application can be specific hardware on the device or software or firmware installed on the device. The device provided in the embodiments of the present application has the same implementation principle and technical effects as those of the foregoing method embodiments, and for a brief description, reference may be made to corresponding matters in the foregoing method embodiments where the device embodiment section is not mentioned. It will be clear to those skilled in the art that, for convenience and brevity, the specific operation of the system, apparatus and unit described above may refer to the corresponding process in the above method embodiment, which is not described in detail herein.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the mobile control method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the foregoing examples are merely illustrative of specific embodiments of the present application, and are not intended to limit the scope of the present application, although the present application is described in detail with reference to the foregoing examples, it will be understood by those skilled in the art that: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application. Are intended to be encompassed within the scope of this application.

Claims

1. A semantic component-based entity linking method, comprising:

acquiring an entity to be linked from a medical data set;

determining semantic similarity scores of all standard entities in the candidate set and entities to be linked through a pre-trained semantic enhancement model, wherein the semantic enhancement model comprises a first coding layer and a second coding layer, the first coding layer is used for coding based on semantic component information, and the first coding layer is used for coding based on a bi-directional encoder BERT;

the step of determining semantic similarity scores of each standard entity in the candidate set and the entity to be linked through a pre-trained semantic enhancement model comprises the following steps:

determining semantic enhancement codes of all standard entities in the candidate set and semantic enhancement codes of entities to be linked through a pre-trained semantic enhancement model; determining the semantic similarity scores of all the standard entities in the candidate set and the entity to be linked according to the semantic enhancement codes of the entity to be linked and the semantic enhancement codes of all the standard entities in the candidate set;

the step of determining the semantic enhancement codes of the standard entities and the semantic enhancement codes of the entities to be linked in the candidate set through a pre-trained semantic enhancement model comprises the following steps:

predicting the current entity based on a pre-trained semantic component analysis model, and determining a tag sequence corresponding to the current entity, wherein the semantic component analysis model comprises a first coding layer and a labeling layer, the first coding layer is used for generating a first code based on the current entity, the labeling layer is used for generating the tag sequence based on the first code, and the tag is used for indicating semantic components of training text; determining, by the second encoding layer, a second encoding based on the tag sequence; combining the first code and the second code to obtain the semantic enhancement code;

2. The method of claim 1, wherein determining, by the second encoding layer, a second encoding based on the tag sequence comprises:

converting the tag sequence into an ID sequence based on the correspondence;

converting the ID sequence to one-hot encoding;

filling the single thermal code based on the code size of the code layer;

and taking the one-hot code after filling as a second code.

3. The method of claim 1, the labeling layer comprising a BiLSTM layer and a CRF layer.

4. The method as recited in claim 1, further comprising:

5. The method of claim 1, wherein the entity to be linked comprises one or more of a disease term, a surgical term, a symptom term, a pharmaceutical term, an inspection term.

6. An entity linking apparatus based on semantic components, comprising:

the scoring module is used for determining semantic similarity scores of all standard entities in the candidate set and entities to be linked through a pre-trained semantic enhancement model, wherein the semantic enhancement model comprises a first coding layer and a second coding layer, the first coding layer is used for coding based on semantic component information, and the first coding layer is used for coding based on a bi-directional encoder BERT;

the scoring module is further used for determining semantic enhancement codes of all standard entities in the candidate set and semantic enhancement codes of entities to be linked through a pre-trained semantic enhancement model; determining the semantic similarity scores of all the standard entities in the candidate set and the entity to be linked according to the semantic enhancement codes of the entity to be linked and the semantic enhancement codes of all the standard entities in the candidate set;

the scoring module is further configured to select a current entity from the candidate set of entities to be linked and the standard entity, and perform the following steps for each current entity until determining a semantic enhancement encoding of each current entity:

7. A computer comprising a memory, a processor, the memory having stored therein a computer program executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the method of any of the preceding claims 1 to 5.

8. A computer readable storage medium storing machine executable instructions which, when invoked and executed by a processor, cause the processor to perform the method of any one of claims 1 to 5.