CN112185574A - Method, device, equipment and storage medium for remote medical entity link - Google Patents

Method, device, equipment and storage medium for remote medical entity link Download PDF

Info

Publication number
CN112185574A
CN112185574A CN202011045780.3A CN202011045780A CN112185574A CN 112185574 A CN112185574 A CN 112185574A CN 202011045780 A CN202011045780 A CN 202011045780A CN 112185574 A CN112185574 A CN 112185574A
Authority
CN
China
Prior art keywords
entity
feature representation
candidate
standard
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011045780.3A
Other languages
Chinese (zh)
Inventor
史亚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202011045780.3A priority Critical patent/CN112185574A/en
Publication of CN112185574A publication Critical patent/CN112185574A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a method, a device, equipment and a storage medium for remote medical entity linking, which are applied to the technical field of computers, wherein the method comprises the following steps: acquiring word embedding vectors of all words in the text of the entity to be linked; determining a first feature representation of an unnormalized entity in the text according to each word embedding vector; determining a candidate entity set of which the similarity with the non-standard entity in the medical knowledge graph reaches a preset similarity value; acquiring a second feature representation of each candidate entity in the candidate entity set in the medical knowledge graph; calculating difference scores between the first feature representation and each second feature representation, and determining a standard entity in candidate entities corresponding to the second feature representation according to each difference score; linking the non-canonical entity with the standard entity.

Description

Method, device, equipment and storage medium for remote medical entity link
Technical Field
The invention relates to the technical field of machine learning, in particular to a method, a device, equipment and a storage medium for remote medical entity linking.
Background
The standard medical entity in the real world often has irregular presentation, such as the symptom "cramp", which in case of medical history can be expressed as "sudden global cramping". We need to link an entity that is not formalized to a standard entity to normalize a medical term.
In the prior art, a BERT Chinese language model issued by Google is used, parameters are set in a fine-tuning stage of the model, and a penultimate layer of an output layer is obtained to obtain an entity word vector; then, calculating cosine distances between different entities according to the obtained entity word vectors, namely semantic similarity; and finally, performing entity alignment according to the semantic similarity by setting a threshold.
The entity linking method based on similarity calculation is difficult to process when facing the situation that candidate is close, and the entity linking result is not ideal.
Disclosure of Invention
In view of the foregoing, the present invention provides a method, apparatus, device and storage medium for remote medical entity linking, which at least to some extent overcomes the problems in the related art.
Based on the above object, the present invention provides a method for linking remote medical entities, comprising:
acquiring word embedding vectors of all words in the text of the entity to be linked;
determining a first feature representation of an unnormalized entity in the text according to each word embedding vector;
determining a candidate entity set of which the similarity with the non-standard entity in the medical knowledge graph reaches a preset similarity value;
acquiring a second feature representation of each candidate entity in the candidate entity set in the medical knowledge graph;
calculating difference scores between the first feature representation and each second feature representation, and determining a standard entity in candidate entities corresponding to the second feature representation according to each difference score;
linking the non-canonical entity with the standard entity.
Further, in the above method for linking a remote medical entity, the obtaining word embedding vectors of words in a text of an entity to be linked includes:
and inputting the text of the entity to be linked into a pre-trained bert model to obtain the word embedding vector.
Further, in the above method for remote medical entity linking, the determining a first feature representation of an irregular entity in the text according to each word embedding vector includes
Inputting the word embedding vector into a previously trained bilstm model;
determining an irregular entity in the text;
and outputting the previous word of the non-canonical entity and the last word of the non-canonical entity as the first feature representation.
Further, in the above method for linking remote medical entities, the determining a set of candidate entities in the medical knowledge-graph whose similarity to the non-canonical entity reaches a preset similarity value includes:
and based on the BM25 algorithm, searching a candidate entity set of which the similarity with the non-standard entity in the medical knowledge graph reaches a preset similarity value.
Further, in the above method for linking remote medical entities, obtaining a second feature representation of each candidate entity in the candidate entity set in the medical knowledge base includes:
calculating a third feature representation of each entity in the medical knowledge-graph based on a GNN network model;
and determining a second feature representation belonging to the candidate entity in the third feature representation.
Further, in the above method for linking remote medical entities, calculating difference scores between the first feature representation and each of the second feature representations, and determining a standard entity in candidate entities corresponding to the second feature representation according to each of the difference scores includes:
inputting the first feature representation and each second feature representation into a pre-trained standard entity prediction model;
and calculating the difference scores through the standard entity prediction model, and determining a standard entity in the candidate entities corresponding to the second feature representation according to each difference score.
Further, in the above method for remote medical entity linking, the training process of the pre-trained standard entity prediction model includes:
acquiring a training set, wherein the training set comprises first sample feature representations of N non-standard entity samples and second sample feature representations of candidate sample entities corresponding to the non-standard samples;
inputting first and second sample feature representations in the training set into an initial feedforward neural network model to compute a sample difference score between the first sample feature representation and each of the second sample feature representations through the initial feedforward neural network model;
calculating a loss function according to the sample difference score;
if the loss function is larger than or equal to a preset threshold value, adjusting the weight parameter of a hidden layer in the initial feedforward neural network model, and executing the step of inputting the first sample characteristic representation and the second sample characteristic representation in the training set into the initial feedforward neural network model again until the loss function is smaller than the preset threshold value;
and taking the initial feedforward neural network model when the loss function is smaller than a preset threshold value as the standard entity prediction model.
The invention also provides a device for remote medical entity linking, comprising:
the first acquisition module is used for acquiring word embedded vectors of all words in the text of the entity to be linked;
the first determining module is used for determining a first feature representation of an unnormalized entity in the text according to each word embedding vector;
the second determination module is used for determining a candidate entity set of which the similarity with the non-standard entity in the medical knowledge graph reaches a preset similarity value;
a second obtaining module, configured to obtain a second feature representation of each candidate entity in the candidate entity set in the medical knowledge base;
the calculation module is used for calculating difference scores between the first feature representation and each second feature representation and determining a standard entity in candidate entities corresponding to the second feature representation according to each difference score;
and the entity linking module is used for linking the non-standard entity with the standard entity.
The present invention also provides a telemedicine entity linked device comprising:
a processor, and a memory coupled to the processor;
the memory is used for storing a computer program;
the processor is configured to invoke and execute the computer program in the memory to perform a method of telemedicine entity linking as described in any of the above.
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the method of remote medical entity linking as described in any one of the above.
From the above, the method, the device, the equipment and the storage medium for linking the remote medical entity provided by the invention obtain the word embedded vector of each word in the text of the entity to be linked; determining a first feature representation of an unnormalized entity in the text according to each word embedding vector; determining a candidate entity set of which the similarity with the non-standard entity in the medical knowledge graph reaches a preset similarity value; acquiring a second feature representation of each candidate entity in the candidate entity set in the medical knowledge graph; calculating difference scores between the first feature representation and each second feature representation, and determining a standard entity in candidate entities corresponding to the second feature representation according to each difference score; linking the non-canonical entity with the standard entity. Therefore, semantic similarity among different entities is not calculated any more, difference scores between the candidate entities and the non-standard entities are calculated according to feature representation, and the standard entities are determined, so that entity link results are more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for remote medical entity linking according to an embodiment of the present invention;
FIG. 2 is a flow diagram illustrating a method for performing a FNN training process in a remote medical entity link, according to one embodiment of the invention;
FIG. 3 is a block diagram of a remote medical entity linked device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a remote medical entity linked device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
It is to be noted that technical terms or scientific terms used in the embodiments of the present invention should have the ordinary meanings as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
Fig. 1 is a flowchart illustrating a method for linking remote medical entities according to an embodiment of the present invention. As shown in fig. 1, the present embodiment provides a method for linking a remote medical entity, including:
101. and acquiring word embedded vectors of all words in the text of the entity to be linked.
In a specific implementation process, the text of the entity to be linked may be text data generated during a medical activity and required to be linked to the entity, for example, a medical activity record text such as a medical record, a medical order, a nursing document, an examination report, and the like. The entities to be linked mainly refer to medical terms with different expression patterns, and may be one or more of disease terms, surgical terms, symptom terms, pharmaceutical terms, and examination terms.
Wherein, the word embedding vector can be obtained by inputting the text of the entity to be linked into a pre-trained bert model. The bert model can divide the text of the entity to be linked into a word, and then endow each word with a vector, so that each word embedded vector is obtained.
Further, the word embedding vector is represented as:
Figure BDA0002707924730000051
an entity that is not represented in a canonical representation can be represented as:
m=(wh,…,wt),1≤h≤t≤l
in the formula, l represents the number of words in the text of the entity to be linked, h represents the first word in the entity to be linked, and t represents the last word in the entity to be linked.
102. Determining a first feature representation of an unnormalized entity in the text based on each of the word embedding vectors.
In a specific implementation process, the entity text to be linked includes irregular entities, for example, the entity text to be linked is "sudden whole body cramp during work, and consciousness during work is clear", and a correct expression of the "sudden whole body cramp" therein should be "cramp", so that the irregular entity needs to be linked to the standard entity in the medical knowledge map.
Specifically, the first characteristic representation of the non-canonical entity may be obtained by:
inputting the word embedding vector into a previously trained bilstm model;
determining an irregular entity in the text;
and outputting the previous word of the non-canonical entity and the last word of the non-canonical entity as the first feature representation.
Further, the non-canonical entities are characterized as:
Figure BDA0002707924730000061
where f denotes a forward LSTM output and b denotes a backward LSTM output.
103. And determining a candidate entity set of which the similarity with the non-standard entity in the medical knowledge graph reaches a preset similarity value.
In one implementation, the BM25 algorithm may be used to retrieve the k entity candidates that are most similar to the irregular entity.
104. Obtaining a second feature representation of each candidate entity in the set of candidate entities in the medical knowledge-graph.
In one implementation, a third feature representation of each entity in the medical knowledge-graph may be computed based on a GNN network model. Wherein the third characteristic is expressed as:
Figure BDA0002707924730000062
where n represents the number of entities in the medical knowledge-graph.
After the set of candidate entities is determined in step 103, a second feature representation of the candidate entities in the set of candidate entities may be looked up in the medical knowledge-graph.
105. And calculating difference scores between the first feature representation and each second feature representation, and determining a standard entity in candidate entities corresponding to the second feature representation according to each difference score.
In a specific implementation process, the standard entities in the candidate entities are determined and can be obtained through a pre-trained standard entity prediction model. Specifically, the first characteristic representation and each second characteristic representation are input into a standard entity prediction model, a difference score between the first characteristic representation and each second characteristic representation is calculated through the standard entity prediction model, and a standard entity is determined through the difference score.
Specifically, the training process of the standard entity prediction model may be:
201. obtaining a training set, wherein the training set comprises first sample feature representations of N non-standard entity samples and second sample feature representations of candidate sample entities corresponding to the non-standard samples.
In a specific implementation process, the first sample feature representations of the N unnormalized entity samples are the same as the obtaining manner of the first feature representation, and are not described herein again. Similarly, the second sample feature representation of each candidate sample entity corresponding to each of the non-normative samples may be obtained by referring to the manner of obtaining the second feature representation.
202. Inputting a first sample feature representation and a second sample feature representation in the training set into an initial feedforward neural network model to compute a sample difference score between the first sample feature representation and each of the second sample feature representations through the initial feedforward neural network model.
In one specific implementation, the initial feedforward Neural network model is a Feedforward Neural Network (FNN) including a hidden layer.
The sample difference score between the first sample feature representation and each of the second sample feature representations can be specifically obtained by the following formula:
score(m,c,e)=FFN([e,fh-1,bh-1,ft,bt])
wherein, (m, c) represents a pair of an expression non-canonical entity and a text, and e represents a medical atlas candidate entity.
203. Calculating a loss function from the sample difference scores.
In one particular implementation, the FNN may employ Hinge Loss as a Loss function. Specifically, the loss function can be obtained by the following formula:
loss(m,c)=max(0,maxe∈E-score(m,c,e)+-score(m,c,e+))
Figure BDA0002707924730000071
wherein is margin, E-Set of linked entities representing errors, e+Representing the correct linking entity and D the training set.
204. And judging whether the loss function is smaller than a preset threshold value, if not, executing 205, and if so, executing 206.
205. The weight parameters of the hidden layer in the initial feedforward neural network model are adjusted, and step 202 is executed again.
In a specific implementation process, when the loss function is large, it indicates that the prediction result has a large difference and low accuracy, so that it is necessary to adjust the weight parameters and input the training samples again to continue the FNN training.
Wherein the optimizer may select Adam.
206. And taking the initial feedforward neural network model when the loss function is smaller than a preset threshold value as the standard entity prediction model.
In a specific implementation process, the loss function is reduced to the minimum, so that the predicted standard entity result is more accurate.
It should be noted that the method of the embodiment of the present invention may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In the case of such a distributed scenario, one device of the multiple devices may only perform one or more steps of the method according to the embodiment of the present invention, and the multiple devices interact with each other to complete the method.
Fig. 3 is a schematic structural diagram of a remote medical entity linking apparatus according to an embodiment of the present invention. Referring to fig. 3, an embodiment of the present application provides a remote medical entity linked apparatus, including:
the first obtaining module 30 is configured to obtain a word embedding vector of each word in a text of an entity to be linked;
a first determining module 31, configured to determine, according to each word embedding vector, a first feature representation of an unnormalized entity in the text;
a second determining module 32, configured to determine a candidate entity set in the medical knowledge graph, where a similarity between the candidate entity set and the non-canonical entity reaches a preset similarity value;
a second obtaining module 33, configured to obtain a second feature representation of each candidate entity in the candidate entity set in the medical knowledge-graph;
a calculating module 34, configured to calculate difference scores between the first feature representation and each of the second feature representations, and determine, according to each of the difference scores, a standard entity in candidate entities corresponding to the second feature representation;
an entity linking module 35, configured to link the non-canonical entity with the standard entity.
In a specific implementation process, the first obtaining module is specifically configured to input the text of the entity to be linked into a pre-trained bert model to obtain the word embedding vector.
In a specific implementation process, a first determining module, specifically configured to input the word embedding vector into a pre-trained bilstm model;
determining an irregular entity in the text;
and outputting the previous word of the non-canonical entity and the last word of the non-canonical entity as the first feature representation.
Further, the second determining module is specifically configured to retrieve, based on the BM25 algorithm, a candidate entity set in the medical knowledge graph, where a similarity between the candidate entity set and the non-canonical entity reaches a preset similarity value.
In a specific implementation process, the second obtaining module is specifically configured to calculate a third feature representation of each entity in the medical knowledge graph based on a GNN network model;
and determining a second feature representation belonging to the candidate entity in the third feature representation.
Further, the calculation module is specifically configured to input the first feature representation and each of the second feature representations into a standard entity prediction model trained in advance;
and calculating the difference scores through the standard entity prediction model, and determining a standard entity in the candidate entities corresponding to the second feature representation according to each difference score.
In one embodiment, the training process of the pre-trained standard entity prediction model includes:
acquiring a training set, wherein the training set comprises first sample feature representations of N non-standard entity samples and second sample feature representations of candidate sample entities corresponding to the non-standard samples;
inputting first and second sample feature representations in the training set into an initial feedforward neural network model to compute a sample difference score between the first sample feature representation and each of the second sample feature representations through the initial feedforward neural network model;
calculating a loss function according to the sample difference score;
if the loss function is larger than or equal to a preset threshold value, adjusting the weight parameter of a hidden layer in the initial feedforward neural network model, and executing the step of inputting the first sample characteristic representation and the second sample characteristic representation in the training set into the initial feedforward neural network model again until the loss function is smaller than the preset threshold value;
and taking the initial feedforward neural network model when the loss function is smaller than a preset threshold value as the standard entity prediction model.
For a specific implementation of this embodiment, reference may be made to the method for remote medical entity linking and the related descriptions in the method embodiments described in the foregoing embodiments, and details are not described herein again.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Fig. 4 is a schematic structural diagram of an embodiment of a model deployment device of the present invention, and as shown in fig. 4, the passing device of this embodiment may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The present invention also provides a storage medium storing computer instructions for causing the computer to execute the method of controlling a remote medical entity chain of the above-described embodiment.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. A method of telemedicine entity linking, comprising:
acquiring word embedding vectors of all words in the text of the entity to be linked;
determining a first feature representation of an unnormalized entity in the text according to each word embedding vector;
determining a candidate entity set of which the similarity with the non-standard entity in the medical knowledge graph reaches a preset similarity value;
acquiring a second feature representation of each candidate entity in the candidate entity set in the medical knowledge graph;
calculating difference scores between the first feature representation and each second feature representation, and determining a standard entity in candidate entities corresponding to the second feature representation according to each difference score;
linking the non-canonical entity with the standard entity.
2. The method of claim 1, wherein the obtaining word embedding vectors for words in text of the entity to be linked comprises:
and inputting the text of the entity to be linked into a pre-trained bert model to obtain the word embedding vector.
3. The method of claim 1, wherein determining a first characteristic representation of an unnormalized entity in the text based on each of the word-embedded vectors comprises
Inputting the word embedding vector into a previously trained bilstm model;
determining an irregular entity in the text;
and outputting the previous word of the non-canonical entity and the last word of the non-canonical entity as the first feature representation.
4. The method of claim 1, wherein determining the set of candidate entities in the medical knowledge-graph having a similarity to the non-canonical entity that reaches a preset similarity value comprises:
and based on the BM25 algorithm, searching a candidate entity set of which the similarity with the non-standard entity in the medical knowledge graph reaches a preset similarity value.
5. The method of claim 1, wherein obtaining a second characterization representation of each candidate entity in the set of candidate entities in a medical knowledge-graph comprises:
calculating a third feature representation of each entity in the medical knowledge-graph based on a GNN network model;
and determining a second feature representation belonging to the candidate entity in the third feature representation.
6. The method of claim 1, wherein calculating a difference score between the first feature representation and each of the second feature representations, and determining a standard entity from the difference scores for the candidate entities corresponding to the second feature representations comprises:
inputting the first feature representation and each second feature representation into a pre-trained standard entity prediction model;
and calculating the difference scores through the standard entity prediction model, and determining a standard entity in the candidate entities corresponding to the second feature representation according to each difference score.
7. The method of claim 6, wherein the training process of the pre-trained standard entity prediction model comprises:
acquiring a training set, wherein the training set comprises first sample feature representations of N non-standard entity samples and second sample feature representations of candidate sample entities corresponding to the non-standard samples;
inputting first and second sample feature representations in the training set into an initial feedforward neural network model to compute a sample difference score between the first sample feature representation and each of the second sample feature representations through the initial feedforward neural network model;
calculating a loss function according to the sample difference score;
if the loss function is larger than or equal to a preset threshold value, adjusting the weight parameter of a hidden layer in the initial feedforward neural network model, and executing the step of inputting the first sample characteristic representation and the second sample characteristic representation in the training set into the initial feedforward neural network model again until the loss function is smaller than the preset threshold value;
and taking the initial feedforward neural network model when the loss function is smaller than a preset threshold value as the standard entity prediction model.
8. An apparatus for telemedicine entity linking, comprising:
the first acquisition module is used for acquiring word embedded vectors of all words in the text of the entity to be linked;
the first determining module is used for determining a first feature representation of an unnormalized entity in the text according to each word embedding vector;
the second determination module is used for determining a candidate entity set of which the similarity with the non-standard entity in the medical knowledge graph reaches a preset similarity value;
a second obtaining module, configured to obtain a second feature representation of each candidate entity in the candidate entity set in the medical knowledge base;
the calculation module is used for calculating difference scores between the first feature representation and each second feature representation and determining a standard entity in candidate entities corresponding to the second feature representation according to each difference score;
and the entity linking module is used for linking the non-standard entity with the standard entity.
9. A telemedicine entity linked device, comprising:
a processor, and a memory coupled to the processor;
the memory is used for storing a computer program;
the processor is configured to invoke and execute the computer program in the memory to perform the method of telemedicine entity linking of any of claims 1-7.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the telemedicine entity linking method of any one of claims 1-7.
CN202011045780.3A 2020-09-28 2020-09-28 Method, device, equipment and storage medium for remote medical entity link Pending CN112185574A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011045780.3A CN112185574A (en) 2020-09-28 2020-09-28 Method, device, equipment and storage medium for remote medical entity link

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011045780.3A CN112185574A (en) 2020-09-28 2020-09-28 Method, device, equipment and storage medium for remote medical entity link

Publications (1)

Publication Number Publication Date
CN112185574A true CN112185574A (en) 2021-01-05

Family

ID=73945656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011045780.3A Pending CN112185574A (en) 2020-09-28 2020-09-28 Method, device, equipment and storage medium for remote medical entity link

Country Status (1)

Country Link
CN (1) CN112185574A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463914A (en) * 2021-02-01 2021-03-09 中国人民解放军国防科技大学 Entity linking method, device and storage medium for internet service
CN113657086A (en) * 2021-08-09 2021-11-16 腾讯科技(深圳)有限公司 Word processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140095205A1 (en) * 2012-09-28 2014-04-03 Siemens Medical Solutions Usa, Inc. Automated mapping of service codes in healthcare systems
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device
CN109522551A (en) * 2018-11-09 2019-03-26 天津新开心生活科技有限公司 Entity link method, apparatus, storage medium and electronic equipment
CN110991187A (en) * 2019-12-05 2020-04-10 北京奇艺世纪科技有限公司 Entity linking method, device, electronic equipment and medium
CN111159485A (en) * 2019-12-30 2020-05-15 科大讯飞(苏州)科技有限公司 Tail entity linking method, device, server and storage medium
CN111613341A (en) * 2020-05-22 2020-09-01 云知声智能科技股份有限公司 Entity linking method and device based on semantic components

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140095205A1 (en) * 2012-09-28 2014-04-03 Siemens Medical Solutions Usa, Inc. Automated mapping of service codes in healthcare systems
CN109241294A (en) * 2018-08-29 2019-01-18 国信优易数据有限公司 A kind of entity link method and device
CN109522551A (en) * 2018-11-09 2019-03-26 天津新开心生活科技有限公司 Entity link method, apparatus, storage medium and electronic equipment
CN110991187A (en) * 2019-12-05 2020-04-10 北京奇艺世纪科技有限公司 Entity linking method, device, electronic equipment and medium
CN111159485A (en) * 2019-12-30 2020-05-15 科大讯飞(苏州)科技有限公司 Tail entity linking method, device, server and storage medium
CN111613341A (en) * 2020-05-22 2020-09-01 云知声智能科技股份有限公司 Entity linking method and device based on semantic components

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马诗雯: "医学表型实体的同义关系分析和概念规范化研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463914A (en) * 2021-02-01 2021-03-09 中国人民解放军国防科技大学 Entity linking method, device and storage medium for internet service
CN113657086A (en) * 2021-08-09 2021-11-16 腾讯科技(深圳)有限公司 Word processing method, device, equipment and storage medium
CN113657086B (en) * 2021-08-09 2023-08-15 腾讯科技(深圳)有限公司 Word processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US9779356B2 (en) Method of machine learning classes of search queries
US10832685B2 (en) Speech processing device, speech processing method, and computer program product
AU2019200085B2 (en) Incremental learning of pointwise mutual information (pmi) word-vector embedding for text/language modeling
TWI444844B (en) Simulation parameter correction technique
JP7257585B2 (en) Methods for Multimodal Search and Clustering Using Deep CCA and Active Pairwise Queries
JP6958723B2 (en) Signal processing systems, signal processing equipment, signal processing methods, and programs
CN111079944B (en) Transfer learning model interpretation realization method and device, electronic equipment and storage medium
CN110597878A (en) Cross-modal retrieval method, device, equipment and medium for multi-modal data
US20170293861A1 (en) Machine-learning system and method for identifying same person in genealogical databases
CN112185574A (en) Method, device, equipment and storage medium for remote medical entity link
CN109947971B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
CN111078842A (en) Method, device, server and storage medium for determining query result
JP2019144639A (en) Method for training model outputting vector indicating tag set corresponding to image
CN111881764A (en) Target detection method and device, electronic equipment and storage medium
CN113435531B (en) Zero sample image classification method and system, electronic equipment and storage medium
CN110956131B (en) Single-target tracking method, device and system
KR102192461B1 (en) Apparatus and method for learning neural network capable of modeling uncerrainty
US20220309321A1 (en) Quantization method, quantization device, and recording medium
CN115470190A (en) Multi-storage-pool data classification storage method and system and electronic equipment
JP2022185799A (en) Information processing program, information processing method and information processing device
WO2013105404A1 (en) Reliability calculation device, reliability calculation method, and computer-readable recording medium
JP2019086473A (en) Learning program, detection program, learning method, detection method, learning device, and detection device
US20210232947A1 (en) Signal processing device, signal processing method, and computer program product
US20210182696A1 (en) Prediction of objective variable using models based on relevance of each model
CN117271803B (en) Training method, device, equipment and storage medium for knowledge graph completion model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination