CN111613341B - Entity linking method and device based on semantic components - Google Patents

Entity linking method and device based on semantic components Download PDF

Info

Publication number
CN111613341B
CN111613341B CN202010443446.7A CN202010443446A CN111613341B CN 111613341 B CN111613341 B CN 111613341B CN 202010443446 A CN202010443446 A CN 202010443446A CN 111613341 B CN111613341 B CN 111613341B
Authority
CN
China
Prior art keywords
entity
semantic
linked
candidate set
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010443446.7A
Other languages
Chinese (zh)
Other versions
CN111613341A (en
Inventor
史亚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202010443446.7A priority Critical patent/CN111613341B/en
Publication of CN111613341A publication Critical patent/CN111613341A/en
Application granted granted Critical
Publication of CN111613341B publication Critical patent/CN111613341B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an entity linking method and device based on semantic components, which relate to the technical field of computers and comprise the steps of acquiring an entity to be linked from a medical data set; determining a standard entity candidate set of the entity to be linked in a medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities which are most similar to the entity to be linked; determining the semantic similarity scores of all standard entities in the candidate set and the entity to be linked through a pre-trained semantic enhancement model; and linking the entity to be linked to a standard entity with the highest semantic similarity score in the candidate set. Therefore, the semantic component information can be added into the entity link process, so that the semantic enhancement model can learn more information, and the entity link accuracy is higher.

Description

Entity linking method and device based on semantic components
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a semantic component-based entity linking method and apparatus.
Background
In the processing of clinical medical record big data, due to differences of regions, hospitals, doctors, standards and the like, the same entity often has a large number of different expression modes, and the data can be effectively counted and calculated only by accurately identifying the same entity and aiming at a limited entity space. Thus, the medical term entity linking is an essential part of the data processing process.
The existing entity linking method generally reduces the number of candidates through algorithms such as classification, and then obtains the closest candidate through similarity calculation. As a core algorithm of the existing entity link system, similarity calculation generally models object features, converts the features into vectors, and measures the similarity by calculating the distance between the vectors.
In the existing entity linking method, a large amount of labeling corpus is generally required, and professional medical knowledge is difficult to add into the features for calculation. In addition, the entity linking method based on similarity calculation can well handle the situation that the candidate difference is large, but is generally difficult to handle when facing the situation that the candidates are close. Especially algorithms based on neural networks cannot well utilize medical related knowledge, and the calculation process of the algorithms cannot be explained. Therefore, in big data processing for the medical field, there is a need for a medical term entity linking method to solve the above-mentioned problems.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The invention aims to provide a semantic component-based entity linking method and device, so as to solve the technical problem of low entity linking accuracy in the prior art.
In a first aspect, an embodiment of the present invention provides a semantic component-based entity linking method, including:
acquiring an entity to be linked from a medical data set;
determining a standard entity candidate set of the entity to be linked in a medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities which are most similar to the entity to be linked;
determining the semantic similarity scores of all standard entities in the candidate set and the entities to be linked through a pre-trained semantic enhancement model, wherein the semantic enhancement model comprises a first coding layer and a second coding layer, the first coding layer is used for coding based on semantic component information, and the first coding layer is used for coding based on a bi-directional encoder BERT;
and linking the entity to be linked to a standard entity with the highest semantic similarity score in the candidate set.
In an alternative embodiment, the step of determining the semantic similarity score between each standard entity in the candidate set and the entity to be linked through a pre-trained semantic enhancement model includes:
determining semantic enhancement codes of all standard entities in the candidate set and semantic enhancement codes of entities to be linked through a pre-trained semantic enhancement model;
and determining the semantic similarity scores of the standard entities in the candidate set and the entity to be linked according to the semantic enhancement codes of the entity to be linked and the semantic enhancement codes of the standard entities in the candidate set.
In an alternative embodiment, the step of determining, by means of a pre-trained semantic enhancement model, the semantic enhancement coding of each standard entity in the candidate set and the semantic enhancement coding of the entity to be linked comprises:
selecting current entities from the entity to be linked and the standard entity candidate set, and executing the following steps for each current entity until the semantic enhancement coding of each current entity is determined:
predicting the current entity based on a pre-trained semantic component analysis model, and determining a tag sequence corresponding to the current entity, wherein the semantic component analysis model comprises a first coding layer and a labeling layer, the first coding layer is used for generating a first code based on the current entity, the labeling layer is used for generating the tag sequence based on the first code, and the tag is used for indicating semantic components of the training text;
determining, by the second encoding layer, a second encoding based on the tag sequence;
and combining the first code and the second code to obtain the semantic enhancement code.
In an alternative embodiment, the step of determining, by the second encoding layer, a second encoding based on the tag sequence comprises:
based on a plurality of tag sequences, establishing a corresponding relation between the tag and the ID;
converting the tag sequence into an ID sequence based on the correspondence;
converting the ID sequence to one-hot encoding;
filling the single thermal code based on the code size of the code layer;
and taking the one-hot code after filling as a second code.
In an alternative embodiment, the labeling layer includes a BiLSTM layer and a CRF layer.
In an alternative embodiment, the method further comprises:
training an initial semantic component analysis model based on a predetermined training sample to obtain a pre-trained semantic component analysis model, wherein the initial semantic component analysis model comprises a first coding layer and an initial labeling layer of the labeling layer, and the first coding layer is pre-trained.
In alternative embodiments, the entity to be linked includes one or more of disease terms, surgical terms, symptom terms, pharmaceutical terms, examination terms.
In a second aspect, an embodiment of the present invention provides a semantic component-based entity linking apparatus, including:
the acquisition module is used for acquiring the entity to be linked from the medical data set;
the determining module is used for determining a standard entity candidate set of the entity to be linked in the medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities which are most similar to the entity to be linked;
the scoring module is used for determining the semantic similarity score of each standard entity in the candidate set and the entity to be linked through a pre-trained semantic enhancement model, wherein the semantic enhancement model comprises a first coding layer and a second coding layer, the first coding layer is used for coding based on semantic component information, and the first coding layer is used for coding based on a bi-directional encoder BERT;
and the link module is used for linking the entity to be linked to the standard entity with the highest semantic similarity score in the candidate set.
In a third aspect, an embodiment of the present invention provides a computer, including a thermometer, a memory, and a processor, where the memory stores a computer program executable on the processor, and the processor implements the steps of the method according to any one of the foregoing embodiments when the processor executes the computer program.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to perform the method of any of the preceding embodiments.
The invention provides an entity linking method and device based on semantic components. Obtaining an entity to be linked from the medical data set; determining a standard entity candidate set of the entity to be linked in a medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities which are most similar to the entity to be linked; determining the semantic similarity scores of all standard entities in the candidate set and the entity to be linked through a pre-trained semantic enhancement model; and linking the entity to be linked to a standard entity with the highest semantic similarity score in the candidate set. Therefore, the semantic component information can be added into the entity link process, so that the semantic enhancement model can learn more information, and the entity link accuracy is higher.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an entity linking method based on semantic components according to an embodiment of the present application;
FIG. 2 is an example of a semantic component based entity linking method provided by embodiments of the present application;
FIG. 3 is another example of a semantic component based entity linking method provided by embodiments of the present application;
FIG. 4 is an example of a bid point in a semantic component based entity linking method provided by embodiments of the present application;
fig. 5 is a schematic structural diagram of an entity linking device based on semantic components according to an embodiment of the present application;
fig. 6 is a schematic diagram of a computer structure according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
Some embodiments of the present invention are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
Fig. 1 is a schematic flow chart of an entity linking method based on semantic components according to an embodiment of the present invention. The method may be applied to a computer, as shown in fig. 1, and may include:
s110, acquiring entities to be linked from the medical data set;
the medical data set mainly refers to text data which is generated in the process of medical activities and needs to be subjected to entity linking, and can be medical activity record texts such as medical records, medical orders, nursing documents, examination reports and the like. The entity to be linked mainly refers to medical terms with different expression modes, and can be one or more of disease terms, operation terms, symptom terms, medicine terms and examination terms.
S120, determining a standard entity candidate set of the entity to be linked in the medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities most similar to the entity to be linked;
according to the characteristics of the entity to be linked, performing preliminary screening in the medical knowledge graph to obtain a plurality of standard entities which are most similar to the entity to be linked, wherein the number of the standard entities can be determined according to actual needs. The closest plurality of standard entities together form a candidate set. The feature of the entity to be linked can be any feature with identification function and distinguishing function, such as its own structural feature, part-of-speech feature, semantic feature, and contextual feature in medical text.
For example, the entity to be linked may be subjected to word segmentation to obtain one or more word segmentation units. The word segmentation process may be implemented by a word segmentation tool, such as jieba. The candidate set can be obtained through screening of N-gram characteristics in the medical knowledge graph, and the answer space of a subsequent model can be effectively reduced through screening, so that the efficiency of the entity link flow is improved. For example, first, 5-order N-gram features of an entity to be linked in a medical knowledge graph may be extracted, and an inverted row between the features and a standard entity may be established, where the more candidates are included, the lower the feature weight value is. And calculating the weight sum of all the characteristics of the entity to be linked during searching to obtain the score of the phrase to be normalized, and using the score as a search similarity score for final calculation.
S130, determining semantic similarity scores of all standard entities in the candidate set and the entities to be linked through a pre-trained semantic enhancement model, wherein the semantic enhancement model comprises a first coding layer and a second coding layer, the first coding layer is used for coding based on semantic component information, and the first coding layer is used for coding based on a bi-directional encoder BERT.
Each standard entity and the entity to be linked in the candidate set can be input into a pre-trained semantic enhancement model to obtain semantic enhancement codes, and then a semantic similarity score is determined based on the similarity between the semantic enhancement codes. For example, the semantic similarity is determined based on cosine distances between semantic enhancement encodings of different entities.
And S140, linking the entity to be linked to the standard entity with the highest semantic similarity score in the candidate set.
For example, the semantic similarity scores in the candidate set determined in step S130 may be ranked, and the first ranked standard entity may be determined to be linked with the entity to be linked.
According to the embodiment of the invention, the semantic component information can be added into the entity linking process, so that the semantic enhancement model can learn more information, and the accuracy rate of entity linking is higher.
In some embodiments, as shown in fig. 2, the step S130 may be specifically implemented by the following steps:
s210, determining semantic enhancement codes of all standard entities in the candidate set and semantic enhancement codes of entities to be linked through a pre-trained semantic enhancement model;
s220, determining semantic similarity scores of all standard entities in the candidate set and the entity to be linked according to the semantic enhancement codes of the entity to be linked and the semantic enhancement codes of all standard entities in the candidate set.
In some embodiments, as shown in fig. 3, the step S210 may be specifically implemented by the following steps:
s310, selecting current entities from the entity to be linked and the standard entity candidate set, and executing the following steps for each current entity until determining the semantic enhancement coding of each current entity:
s320, predicting the current entity based on a pre-trained semantic component analysis model, and determining a label sequence corresponding to the current entity.
The semantic component analysis model comprises a first coding layer and a labeling layer, wherein the first coding layer is used for generating a first code based on a current entity, the labeling layer is used for generating a label sequence based on the first code, and the label is used for indicating semantic components of training texts;
s330, determining a second code based on the tag sequence through a second code layer;
s340, combining the first code and the second code to obtain the semantic enhancement code.
In some embodiments, the step S330 may specifically include the following steps:
step 1), based on a plurality of tag sequences, establishing a corresponding relation between a tag and an ID;
step 2), converting the tag sequence into an ID sequence based on the corresponding relation;
step 3), converting the ID sequence into one-hot coding;
step 4), filling the single thermal code based on the code size of the code layer;
and 5) taking the filled single thermal code as a second code.
As an example, as shown in fig. 4, the semantic component parsing model may be a bert+bilstm+crf model. The marking layer comprises a BiLSTM layer and a CRF layer, and the first coding layer is a BERT layer.
The invention is further described below with reference to the "right cup otology" surgical entity.
Step a), adopting BERT+BiLSTM+CRF model to analyze semantic components of the right cup ear rectification operation entity, and analyzing the semantic components as 'azimuth': right side, part: cup-shaped ear, surgical formula: orthotics. BIO-based labels, the semantic label sequences corresponding to each character of the entity are 'B-side, I-side, B-body part, I-body part, B-shhi, I-shhi and I-shhi'. The separators [ CLS ], [ SEP ] can be added before and after, respectively.
And b), performing one-hot coding (first coding) on the tag sequence, and adding the one-hot coding into the coding of the BERT (second coding) to generate the semantic enhancement coding. Wherein, the step b) can be realized by the following steps:
step B-1), based on the labels [ O, B-side, I-side, B-body part, I-body part, B-shhi, I-shhi, [ CLS ], [ SEP ] ] included in the label sequence, establishing a corresponding relation with IDs [0,1,2,3,4,5,6,7,8 ].
Step b-2) using the tag in the ID tag sequence to obtain the ID sequence [7,1,2,3,4,4,5,6,6,8].
Step b-3), converting the ID in the ID sequence into a one-hot coding sequence to obtain [ [000000010], [010000000], [001000000], [000100000], [000010000], [000010000], [000001000], [000000100], [000000100], [000000100] ].
Step b-4), since the size (size) of the code of BERT is larger than the size of the one-hot code, it is necessary to fill all the one-hot codes in the one-hot code sequence with "0". If the BERT model code size is 768 and the one-ho code size is 9, the "768-9=759" 0 "s need to be filled to obtain the label one-hot code.
Step b-5), adding the label one-hot code and the token code of the BERT itself, position code and segment code as semantic enhancement code. Representing the semantic enhancement code e with equation (one) enhance The following are provided:
e enhance =e token +e postion +e segment +e label formula 1
In addition, in the embodiment of the invention, the initial semantic component analysis model can be trained based on a predetermined training sample to obtain a pre-trained semantic component analysis model, wherein the initial semantic component analysis model comprises a first coding layer and an initial labeling layer of the labeling layer, and the first coding layer is pre-trained.
Fig. 5 is a schematic structural diagram of an entity linking device based on semantic components according to an embodiment of the present invention. As shown in fig. 5, the apparatus may include:
an obtaining module 501, configured to obtain an entity to be linked from a medical data set;
a determining module 502, configured to determine a standard entity candidate set of the entity to be linked in the medical knowledge graph, where the standard entity candidate set includes a plurality of standard entities most similar to the entity to be linked;
a scoring module 503, configured to determine, through a pre-trained semantic enhancement model, a semantic similarity score between each standard entity in the candidate set and the entity to be linked, where the semantic enhancement model includes a first encoding layer and a second encoding layer, the first encoding layer is configured to encode based on semantic component information, and the first encoding layer encodes based on a bi-directional encoder BERT;
a linking module 504, configured to link the entity to be linked to the standard entity with the highest semantic similarity score in the candidate set.
In some embodiments, scoring module 503 is specifically configured to:
determining semantic enhancement codes of all standard entities in the candidate set and semantic enhancement codes of entities to be linked through a pre-trained semantic enhancement model;
and determining semantic similarity scores of all the standard entities in the candidate set and the entity to be linked according to the semantic enhancement codes of the entity to be linked and the semantic enhancement codes of all the standard entities in the candidate set.
In some embodiments, scoring module 503 is specifically configured to:
selecting current entities from the entity to be linked and the standard entity candidate set, and executing the following steps for each current entity until the semantic enhancement coding of each current entity is determined:
predicting a current entity based on a pre-trained semantic component analysis model, and determining a tag sequence corresponding to the current entity, wherein the semantic component analysis model comprises a first coding layer and a labeling layer, the first coding layer is used for generating a first code based on the current entity, the labeling layer is used for generating the tag sequence based on the first code, and the tag is used for indicating semantic components of a training text;
determining, by the second encoding layer, a second encoding based on the tag sequence;
and combining the first code and the second code to obtain the semantic enhancement code.
In some embodiments, scoring module 503 is specifically configured to:
based on a plurality of tag sequences, establishing a corresponding relation between the tag and the ID;
converting the tag sequence into an ID sequence based on the correspondence;
converting the ID sequence into a single-hot code;
filling the single thermal code based on the code size of the code layer;
and taking the filled one-hot code as a second code.
In some embodiments, the labeling layers include a BiLSTM layer and a CRF layer.
In some embodiments, the method further includes a training module, configured to train an initial semantic component analysis model based on a predetermined training sample, to obtain a pre-trained semantic component analysis model, where the initial semantic component analysis model includes a first coding layer and an initial labeling layer of the labeling layer, and the first coding layer is pre-trained.
In some embodiments, the entity to be linked includes one or more of a disease term, a surgical term, a symptom term, a medication term, an examination term.
The entity linking device based on the semantic components provided by the embodiment of the application has the same technical characteristics as the entity linking method based on the semantic components provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
As shown in fig. 6, a computer 700 provided in an embodiment of the present application includes: a processor 701, a memory 702 and a bus, the memory 702 storing machine readable instructions executable by the processor 701, the processor 701 and the memory 702 communicating via the bus when the computer 700 is running, the processor 701 executing machine readable instructions to perform the steps of the semantic component based entity linking method as described above.
In particular, the memory 702 and the processor 701 can be general-purpose memories and processors, which are not limited herein, and the entity linking method based on semantic components can be performed when the processor 701 runs a computer program stored in the memory 702.
Corresponding to the above-mentioned entity linking method based on semantic components, the embodiments of the present application further provide a computer readable storage medium storing machine executable instructions that, when invoked and executed by a processor, cause the processor to execute the steps of the above-mentioned entity linking method based on semantic components.
The entity linking device based on the semantic component provided by the embodiment of the application can be specific hardware on the device or software or firmware installed on the device. The device provided in the embodiments of the present application has the same implementation principle and technical effects as those of the foregoing method embodiments, and for a brief description, reference may be made to corresponding matters in the foregoing method embodiments where the device embodiment section is not mentioned. It will be clear to those skilled in the art that, for convenience and brevity, the specific operation of the system, apparatus and unit described above may refer to the corresponding process in the above method embodiment, which is not described in detail herein.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the mobile control method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the foregoing examples are merely illustrative of specific embodiments of the present application, and are not intended to limit the scope of the present application, although the present application is described in detail with reference to the foregoing examples, it will be understood by those skilled in the art that: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application. Are intended to be encompassed within the scope of this application.

Claims (8)

1. A semantic component-based entity linking method, comprising:
acquiring an entity to be linked from a medical data set;
determining a standard entity candidate set of the entity to be linked in a medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities which are most similar to the entity to be linked;
determining semantic similarity scores of all standard entities in the candidate set and entities to be linked through a pre-trained semantic enhancement model, wherein the semantic enhancement model comprises a first coding layer and a second coding layer, the first coding layer is used for coding based on semantic component information, and the first coding layer is used for coding based on a bi-directional encoder BERT;
the step of determining semantic similarity scores of each standard entity in the candidate set and the entity to be linked through a pre-trained semantic enhancement model comprises the following steps:
determining semantic enhancement codes of all standard entities in the candidate set and semantic enhancement codes of entities to be linked through a pre-trained semantic enhancement model; determining the semantic similarity scores of all the standard entities in the candidate set and the entity to be linked according to the semantic enhancement codes of the entity to be linked and the semantic enhancement codes of all the standard entities in the candidate set;
the step of determining the semantic enhancement codes of the standard entities and the semantic enhancement codes of the entities to be linked in the candidate set through a pre-trained semantic enhancement model comprises the following steps:
selecting current entities from the entity to be linked and the standard entity candidate set, and executing the following steps for each current entity until the semantic enhancement coding of each current entity is determined:
predicting the current entity based on a pre-trained semantic component analysis model, and determining a tag sequence corresponding to the current entity, wherein the semantic component analysis model comprises a first coding layer and a labeling layer, the first coding layer is used for generating a first code based on the current entity, the labeling layer is used for generating the tag sequence based on the first code, and the tag is used for indicating semantic components of training text; determining, by the second encoding layer, a second encoding based on the tag sequence; combining the first code and the second code to obtain the semantic enhancement code;
and linking the entity to be linked to a standard entity with the highest semantic similarity score in the candidate set.
2. The method of claim 1, wherein determining, by the second encoding layer, a second encoding based on the tag sequence comprises:
based on a plurality of tag sequences, establishing a corresponding relation between the tag and the ID;
converting the tag sequence into an ID sequence based on the correspondence;
converting the ID sequence to one-hot encoding;
filling the single thermal code based on the code size of the code layer;
and taking the one-hot code after filling as a second code.
3. The method of claim 1, the labeling layer comprising a BiLSTM layer and a CRF layer.
4. The method as recited in claim 1, further comprising:
training an initial semantic component analysis model based on a predetermined training sample to obtain a pre-trained semantic component analysis model, wherein the initial semantic component analysis model comprises a first coding layer and an initial labeling layer of the labeling layer, and the first coding layer is pre-trained.
5. The method of claim 1, wherein the entity to be linked comprises one or more of a disease term, a surgical term, a symptom term, a pharmaceutical term, an inspection term.
6. An entity linking apparatus based on semantic components, comprising:
the acquisition module is used for acquiring the entity to be linked from the medical data set;
the determining module is used for determining a standard entity candidate set of the entity to be linked in the medical knowledge graph, wherein the standard entity candidate set comprises a plurality of standard entities which are most similar to the entity to be linked;
the scoring module is used for determining semantic similarity scores of all standard entities in the candidate set and entities to be linked through a pre-trained semantic enhancement model, wherein the semantic enhancement model comprises a first coding layer and a second coding layer, the first coding layer is used for coding based on semantic component information, and the first coding layer is used for coding based on a bi-directional encoder BERT;
the scoring module is further used for determining semantic enhancement codes of all standard entities in the candidate set and semantic enhancement codes of entities to be linked through a pre-trained semantic enhancement model; determining the semantic similarity scores of all the standard entities in the candidate set and the entity to be linked according to the semantic enhancement codes of the entity to be linked and the semantic enhancement codes of all the standard entities in the candidate set;
the scoring module is further configured to select a current entity from the candidate set of entities to be linked and the standard entity, and perform the following steps for each current entity until determining a semantic enhancement encoding of each current entity:
predicting the current entity based on a pre-trained semantic component analysis model, and determining a tag sequence corresponding to the current entity, wherein the semantic component analysis model comprises a first coding layer and a labeling layer, the first coding layer is used for generating a first code based on the current entity, the labeling layer is used for generating the tag sequence based on the first code, and the tag is used for indicating semantic components of training text; determining, by the second encoding layer, a second encoding based on the tag sequence; combining the first code and the second code to obtain the semantic enhancement code;
and the link module is used for linking the entity to be linked to the standard entity with the highest semantic similarity score in the candidate set.
7. A computer comprising a memory, a processor, the memory having stored therein a computer program executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the method of any of the preceding claims 1 to 5.
8. A computer readable storage medium storing machine executable instructions which, when invoked and executed by a processor, cause the processor to perform the method of any one of claims 1 to 5.
CN202010443446.7A 2020-05-22 2020-05-22 Entity linking method and device based on semantic components Active CN111613341B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010443446.7A CN111613341B (en) 2020-05-22 2020-05-22 Entity linking method and device based on semantic components

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010443446.7A CN111613341B (en) 2020-05-22 2020-05-22 Entity linking method and device based on semantic components

Publications (2)

Publication Number Publication Date
CN111613341A CN111613341A (en) 2020-09-01
CN111613341B true CN111613341B (en) 2024-02-02

Family

ID=72198414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010443446.7A Active CN111613341B (en) 2020-05-22 2020-05-22 Entity linking method and device based on semantic components

Country Status (1)

Country Link
CN (1) CN111613341B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112185574A (en) * 2020-09-28 2021-01-05 云知声智能科技股份有限公司 Method, device, equipment and storage medium for remote medical entity link
CN112231449A (en) * 2020-12-10 2021-01-15 杭州识度科技有限公司 Vertical field entity chain finger system based on multi-path recall
CN112632910A (en) * 2020-12-21 2021-04-09 北京惠及智医科技有限公司 Operation encoding method, electronic device and storage device
CN112905917B (en) * 2021-02-09 2023-07-25 北京百度网讯科技有限公司 Inner chain generation method, model training method, related device and electronic equipment
CN113836912A (en) * 2021-09-08 2021-12-24 上海蜜度信息技术有限公司 Method, system and device for sequence labeling word segmentation of language model and word stock correction
CN114386422B (en) * 2022-01-14 2023-09-15 淮安市创新创业科技服务中心 Intelligent auxiliary decision-making method and device based on enterprise pollution public opinion extraction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388793A (en) * 2017-08-03 2019-02-26 阿里巴巴集团控股有限公司 Entity mask method, intension recognizing method and corresponding intrument, computer storage medium
CN109522551A (en) * 2018-11-09 2019-03-26 天津新开心生活科技有限公司 Entity link method, apparatus, storage medium and electronic equipment
CN110781254A (en) * 2020-01-02 2020-02-11 四川大学 Automatic case knowledge graph construction method, system, equipment and medium
CN111160041A (en) * 2019-12-30 2020-05-15 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080109250A1 (en) * 2006-11-03 2008-05-08 Craig Allan Walker System and method for creating and rendering DICOM structured clinical reporting via the internet

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388793A (en) * 2017-08-03 2019-02-26 阿里巴巴集团控股有限公司 Entity mask method, intension recognizing method and corresponding intrument, computer storage medium
CN109522551A (en) * 2018-11-09 2019-03-26 天津新开心生活科技有限公司 Entity link method, apparatus, storage medium and electronic equipment
CN111160041A (en) * 2019-12-30 2020-05-15 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium
CN110781254A (en) * 2020-01-02 2020-02-11 四川大学 Automatic case knowledge graph construction method, system, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
丁龙等."基于预训练BERT 字嵌入模型的领域实体识别".情报工程.2019,第5卷(第6期),正文第2-3节. *
基于BERT嵌入BiLSTM-CRF模型的中文专业术语抽取研究;吴俊等;情报学报(第04期);全文 *

Also Published As

Publication number Publication date
CN111613341A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
CN111613341B (en) Entity linking method and device based on semantic components
US10929420B2 (en) Structured report data from a medical text report
US20140351228A1 (en) Dialog system, redundant message removal method and redundant message removal program
CN111368094A (en) Entity knowledge map establishing method, attribute information acquiring method, outpatient triage method and device
US11507746B2 (en) Method and apparatus for generating context information
CN113326380B (en) Equipment measurement data processing method, system and terminal based on deep neural network
CN110020005B (en) Method for matching main complaints in medical records with symptoms in current medical history
CN112307337B (en) Associated recommendation method and device based on tag knowledge graph and computer equipment
CN111104800B (en) Entity identification method, entity identification device, entity identification equipment, storage medium and program product
CN114996388A (en) Intelligent matching method and system for diagnosis name standardization
CN111435410A (en) Relationship extraction method and device for medical texts
CN112732863B (en) Standardized segmentation method for electronic medical records
CN112687328B (en) Method, apparatus and medium for determining phenotypic information of clinical descriptive information
CN112151187B (en) Information query method, device, computer equipment and storage medium
WO2014130287A1 (en) Method and system for propagating labels to patient encounter data
CN113096756A (en) Disease evolution classification method and device, electronic equipment and storage medium
CN112749277A (en) Medical data processing method and device and storage medium
CN111507109A (en) Named entity identification method and device of electronic medical record
CN112101034B (en) Method and device for judging attribute of medical entity and related product
CN114218378A (en) Content pushing method, device, equipment and medium based on knowledge graph
CN114068028A (en) Medical inquiry data processing method and device, readable storage medium and electronic equipment
CN113705692A (en) Emotion classification method and device based on artificial intelligence, electronic equipment and medium
CN117766137B (en) Medical diagnosis result determining method and device based on reinforcement learning
CN117766137A (en) medical diagnosis result determining method and device based on reinforcement learning
CN117437422A (en) Medical image recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant