CN111415748B - Entity linking method and device - Google Patents

Entity linking method and device Download PDF

Info

Publication number
CN111415748B
CN111415748B CN202010099197.4A CN202010099197A CN111415748B CN 111415748 B CN111415748 B CN 111415748B CN 202010099197 A CN202010099197 A CN 202010099197A CN 111415748 B CN111415748 B CN 111415748B
Authority
CN
China
Prior art keywords
medical
current
word vector
linked
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010099197.4A
Other languages
Chinese (zh)
Other versions
CN111415748A (en
Inventor
史亚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202010099197.4A priority Critical patent/CN111415748B/en
Publication of CN111415748A publication Critical patent/CN111415748A/en
Application granted granted Critical
Publication of CN111415748B publication Critical patent/CN111415748B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses an entity linking method and device, comprising the following steps: acquiring a current medical text, and determining medical terms to be linked from the current medical text; obtaining a current word vector based on medical terms to be linked; comparing the similarity of the current word vector and the preset word vector, and outputting the comparison similarity; determining a current medical entity of the medical term to be linked according to the comparison similarity; the medical term to be linked is linked with the current medical entity. Compared with the prior art, the word vector is more diversified, the analysis result is not limited to one type, the most suitable result is selected from a plurality of results to serve as the current medical text, the situation that analysis accuracy is too low to effectively obtain a physiotherapy entity or an incorrect medical entity due to the fact that the CRF identifies the semantic component of the entity to be linked is too single is avoided, and accuracy is improved.

Description

Entity linking method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for entity linking.
Background
In the processing of clinical medical record big data, due to differences of regions, hospitals, doctors, standards and the like, the same entity often has a large number of different expression modes, and the data can be effectively counted and calculated only by accurately identifying the same entity and aiming at a limited entity space. Thus, the medical term entity linking is an essential part of the data processing process.
At present, the existing entity linking method generally obtains candidate quantity through an N-gram algorithm, CRF identifies semantic components of entities to be linked in the candidate quantity and semantic components of candidate standard terms to be matched, and finally obtains standard terms with highest similarity by means of synonymous relations of semantic components of a knowledge graph. However, this method has the following disadvantages: the CRF recognizes that too single semantic component of the entity to be linked leads to too low resolution accuracy and can not effectively obtain the physiotherapy entity or obtain the wrong medical entity.
Disclosure of Invention
Aiming at the displayed problems, the method determines the medical terms to be linked based on the current medical text, obtains the current word vector of the medical terms to be linked, compares the current word vector with the preset word vector, and further determines the current medical entity in the medical terms to be linked and links the current medical entity with the medical terms to be linked.
An entity linking method, comprising the steps of:
acquiring a current medical text, and determining medical terms to be linked from the current medical text;
obtaining a current word vector based on the medical terms to be linked;
comparing the similarity of the current word vector and a preset word vector, and outputting comparison similarity;
determining the current medical entity of the medical term to be linked according to the comparison similarity;
and linking the medical term to be linked with the current medical entity.
Preferably, the obtaining the current medical text, determining the medical term to be linked from the current medical text, includes:
extracting all first medical terms from the current medical text;
inputting the first medical term into a preset knowledge graph for retrieval;
and determining the medical terms to be linked through retrieval.
Preferably, the obtaining the current word vector based on the medical terms to be linked includes:
preprocessing the medical terms to be linked, and converting English components in the medical terms to be linked into corresponding Chinese;
calculating the label score of each Chinese in the medical terms to be linked by using the following formula:
wherein, x= (X1, X2,) represents an input sequence of each word in the medical term to be linked, y= (y 1, y2,) represents an output sequence of each word in the medical term to be linked, the followingRepresenting the probability that the input is xi and the output is label yi, said +.>Representing the probability of a transition from the tag yi to tag yi+1;
selecting the output sequence with the highest score as the current label of the medical term to be linked;
extracting n first semantic components of the current tag;
training word vectors of each first semantic component in the n first semantic components by using a preset model;
and determining the word vector of each semantic component as the current word vector.
Preferably, the comparing the similarity between the current word vector and the preset word vector, outputting a comparison similarity, and the method includes:
determining medical concepts corresponding to the medical terms to be linked;
retrieving all second medical terms related to the medical concept from the preset knowledge graph;
extracting a second semantic component in the second medical term;
training word vectors corresponding to the second semantic components by using the preset model;
determining a word vector corresponding to the second semantic component as a preset word vector;
calculating the similarity between the current word vector and the preset word vector by using the following formula:
wherein cos θ is the similarity between the current word vector and the preset word vector, and a is as follows 1 、a 2 、a n For n word vectors of the current word vector, the b 1 、b 2 、b n And n word vectors in the preset word vectors are used.
Preferably, the determining the current medical entity of the medical term to be linked according to the comparison similarity includes:
confirming whether the similarity is larger than or equal to a preset threshold value;
if yes, confirming whether the similarity is hundred percent;
if yes, determining a current medical entity corresponding to the current word vector according to the preset medical entity corresponding to the preset word vector;
otherwise, judging whether the current word vector and the preset word vector meet preset conditions or not:
if yes, determining a current medical entity corresponding to the current word vector according to the preset medical entity corresponding to the preset word vector;
otherwise, prompting that there is no matching current medical entity.
An entity linking apparatus, the apparatus comprising:
the acquisition module is used for acquiring a current medical text and determining medical terms to be linked from the current medical text;
the obtaining module is used for obtaining a current word vector based on the medical terms to be linked;
the comparison module is used for comparing the similarity of the current word vector and the preset word vector and outputting comparison similarity;
the determining module is used for determining the current medical entity of the medical term to be linked according to the comparison similarity;
and the link module is used for linking the medical term to be linked with the current medical entity.
Preferably, the acquiring module includes:
a first extraction sub-module for extracting all first medical terms from the current medical text;
the first retrieval submodule is used for inputting the first medical term into a preset knowledge graph for retrieval;
a first determining sub-module for determining the medical term to be linked by retrieving.
Preferably, the obtaining module includes:
the preprocessing sub-module is used for preprocessing the medical terms to be linked and converting English components in the medical terms to be linked into corresponding Chinese;
a first calculation sub-module, configured to calculate a label score of each chinese in the medical terms to be linked by using the following formula:
wherein, x= (X1, X2,) represents an input sequence of each word in the medical term to be linked, y= (y 1, y2,) represents an output sequence of each word in the medical term to be linked, the followingRepresenting the probability that the input is xi and the output is label yi, said +.>Representing the probability of a transition from the tag yi to tag yi+1;
the selecting submodule is used for selecting the output sequence with the highest score as the current label of the medical term to be linked;
the second extraction submodule is used for extracting n first semantic components of the current tag;
the first training submodule is used for training word vectors of each first semantic component in the n first semantic components by using a preset model;
and the second determining submodule is used for determining the word vector of each semantic component as the current word vector.
Preferably, the comparing module includes:
a third determining submodule, configured to determine a medical concept corresponding to the medical term to be linked;
the second retrieval sub-module is used for retrieving all second medical terms related to the medical concept from the preset knowledge graph;
a third extraction sub-module for extracting a second semantic component in the second medical term;
the second training submodule is used for training word vectors corresponding to the second semantic components by using the preset model;
a third determining submodule, configured to determine a word vector corresponding to the second semantic component as a preset word vector;
the second computing sub-module is used for computing the similarity between the current word vector and the preset word vector by using the following formula:
wherein cos θ is the similarity between the current word vector and the preset word vector, and a is as follows 1 、a 2 、a n For n word vectors of the current word vector, the b 1 、b 2 、b n And n word vectors in the preset word vectors are used.
Preferably, the determining module includes:
the first confirming sub-module is used for confirming whether the similarity is larger than or equal to a preset threshold value;
the second confirming sub-module is used for confirming whether the similarity is hundred percent or not when the first confirming sub-module confirms that the similarity is larger than or equal to the preset threshold value;
a fourth determining submodule, configured to determine, according to a preset medical entity corresponding to the preset word vector, a current medical entity corresponding to the current word vector if the second confirming submodule confirms that the similarity is one hundred percent;
the judging sub-module is used for judging whether the current word vector and the preset word vector meet preset conditions or not when the second confirming sub-module does not confirm that the similarity is hundred percent:
if yes, the fourth determining submodule determines a current medical entity corresponding to the current word vector according to the preset medical entity corresponding to the preset word vector;
and the prompting submodule is used for prompting that the current medical entity is not matched if the judging submodule judges that the current word vector and the preset word vector do not meet the preset condition.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate and together with the embodiments of the invention and do not constitute a limitation to the invention, and in which:
FIG. 1 is a workflow diagram of an entity linking method provided by the present invention;
FIG. 2 is another workflow diagram of an entity linking method provided by the present invention;
FIG. 3 is a block diagram of an entity linking device according to the present invention;
FIG. 4 is a diagram showing another embodiment of a physical link device according to the present invention;
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
In the processing of clinical medical record big data, due to differences of regions, hospitals, doctors, standards and the like, the same entity often has a large number of different expression modes, and the data can be effectively counted and calculated only by accurately identifying the same entity and aiming at a limited entity space. Thus, the medical term entity linking is an essential part of the data processing process.
At present, the existing entity linking method generally obtains candidate quantity through an N-gram algorithm, CRF identifies semantic components of entities to be linked in the candidate quantity and semantic components of candidate standard terms to be matched, and finally obtains standard terms with highest similarity by means of synonymous relations of semantic components of a knowledge graph. However, this method has the following disadvantages: the CRF recognizes that too single semantic component of the entity to be linked leads to too low resolution accuracy and can not effectively obtain the physiotherapy entity or obtain the wrong medical entity. In order to solve the above-mentioned problems, the present embodiment discloses a method for determining a current medical entity in medical terms to be linked and linking the medical terms to be linked based on acquiring a current medical text and acquiring a current word vector of the medical terms to be linked to compare with a preset word vector.
An entity linking method, as shown in fig. 1, comprises the following steps:
step S101, acquiring a current medical text, and determining medical terms to be linked from the current medical text;
step S102, obtaining a current word vector based on medical terms to be linked;
step S103, comparing the similarity of the current word vector and the preset word vector, and outputting the comparison similarity;
step S104, determining the current medical entity of the medical term to be linked according to the comparison similarity;
step S105, linking the medical term to be linked with the current medical entity;
in this embodiment, the current medical entity refers to which field the medical term to be linked currently belongs to, and common medical entities have medical history characteristics, hospitalization conditions, treatment projects, diagnostic records, and the like. Linking the medical term to be linked with the current medical entity refers to classifying the medical term with the link into the current medical entity, and the medical term to be linked can be displayed only by clicking the current medical entity on the platform.
The working principle of the technical scheme is as follows: firstly, acquiring a current medical text, determining medical terms to be linked from the current medical text, then acquiring a current word vector according to the medical terms to be linked, comparing the similarity of the current word vector and a preset word vector, determining a current medical entity of the medical terms to be linked according to the compared similarity, and finally linking the medical terms to be linked with the current medical entity.
The beneficial effects of the technical scheme are as follows: the word vector of the medical term to be linked is compared with the preset word vector to determine the current medical entity, compared with semantic components in the prior art, the word vector is more diversified, the analysis result is not limited to one type, the most suitable result is screened from multiple types of results to serve as the current medical text, compared with the prior art, the situation that the analysis accuracy is too low to effectively obtain physical therapy entities or incorrect medical entities due to the fact that the CRF identifies the semantic components of the entity to be linked is too single is avoided, and the accuracy is improved.
In one embodiment, as shown in fig. 2, obtaining a current medical text, determining medical terms to be linked from the current medical text, includes:
step S201, extracting all first medical terms from the current medical text;
step S202, inputting a first medical term into a preset knowledge graph for retrieval;
step S203, medical terms to be linked are determined through retrieval.
The beneficial effects of the technical scheme are as follows: the medical terms to be linked are found out according to the individual consulting of the historical medical texts without manpower, so that the labor capacity of medical staff is reduced, and the retrieval result in the knowledge graph is more accurate compared with the manual consulting and exclusion.
In one embodiment, obtaining the current word vector based on the medical terms to be linked includes:
preprocessing the medical terms to be linked, and converting English components in the medical terms to be linked into corresponding Chinese;
the label score for each chinese in the medical term to be linked is calculated using the following formula:
wherein x= (X1, X2,) X, xn, represents the input sequence of each word in the medical term to be linked, y= (y 1, y2,) represents the output sequence of each word in the medical term to be linked,representing the probability that the input is xi and the output is label yi,>representing the probability of transition from tag yi to tag yi+1;
selecting the output sequence with the highest score as the current label of the medical term to be linked;
extracting n first semantic components of the current tag;
training word vectors of each first semantic component in the n first semantic components by using a preset model;
the word vector for each semantic component is determined to be the current word vector.
The beneficial effects of the technical scheme are as follows: the current word vector is determined through accurate calculation, the accuracy of the word vector is improved, the most accurate word vector is provided for comparison with the preset word vector at the back, and the accuracy of determining the current medical entity is also ensured. Avoiding the situation that the wrong medical entity is obtained.
In one embodiment, comparing the similarity of the current word vector and the preset word vector, and outputting the comparison similarity includes:
determining medical concepts corresponding to the medical terms to be linked;
retrieving all second medical terms related to the medical concept from a preset knowledge graph;
extracting a second semantic component in a second medical term;
training word vectors corresponding to the second semantic components by using a preset model;
determining word vectors corresponding to the second semantic components as preset word vectors;
the similarity between the current word vector and the preset word vector is calculated by using the following formula:
wherein cos θ is the similarity between the current word vector and the preset word vector, a 1 、a 2 、a n B for n word vectors in the current word vector 1 、b 2 、b n Is n word vectors in the preset word vectors.
The beneficial effects of the technical scheme are as follows: by comparing the similarity, whether the medical terms to be linked have other medical terms of the same category in the knowledge graph can be determined, and then the current medical entity can be determined according to the similarity, so that a great number of medical entities do not need to be turned over to search the medical entity corresponding to the medical term to be linked, and time is saved.
In one embodiment, determining the current medical entity of the medical term to be linked according to the comparative similarity comprises:
confirming whether the similarity is larger than or equal to a preset threshold value;
if yes, confirming whether the similarity is hundred percent;
if yes, determining a current medical entity corresponding to the current word vector according to a preset medical entity corresponding to the preset word vector;
otherwise, judging whether the current word vector and the preset word vector meet preset conditions or not:
if yes, determining a current medical entity corresponding to the current word vector according to a preset medical entity corresponding to the preset word vector;
otherwise, prompting that the current medical entity is not matched;
in this embodiment, the preset threshold may be eighty percent, and the preset condition may be: 1: the current word vectors comprise all preset word vectors, the number of the current word vectors is more than that of the preset word vectors, the surfaces of some word vectors in the 2 current word vectors are different from those of some word vectors in the preset word vectors, but some word vectors in the current word vectors are the upper positions of some word vectors in the preset word vectors, and the residual word vectors in the current word vectors comprise the residual word vectors in the preset word vectors.
The beneficial effects of the technical scheme are as follows: the similarity is determined twice to ensure the difference between the current word vector and the preset word vector, so that the accuracy is further improved, and if the current word vector and the preset word vector are equal, the preset medical entity corresponding to the preset word vector can be directly determined as the current medical entity of the medical term to be linked, so that the matching time is saved, and the efficiency is improved.
In one embodiment, the method comprises:
step 1: extracting medical terms from the medical text, including diseases, operations and the like, and determining medical terms to be linked which are not present in the knowledge graph;
step 2: preprocessing the medical terms to be linked, and converting English components into corresponding Chinese. For example: "IABP implantation" is converted to "aortic balloon counterpulsation implantation";
step 3: for the pretreated medical term obtained in the step 1, semantic component analysis is performed by using Bert+BiLSTM+CRF, for example, the "blepharotomy" can be resolved into site-eyelid and surgical-incision. The concept of Bert+BiLSTM+CRF is that Word vectors trained by Bert are used for replacing Word2Vec vectors of BiLSTM, a BiLSTM model is used for calculating most probable labels of current words, and the CRF ensures the sequency among the labels by using transfer characteristics;
the predictive score formula is as follows:
wherein x= (X 1 ,x 2 ,…,x n ) Representing the input sequence of BiLSTM, y= (y) 1 ,y 2 ,…,y n ) An output tag sequence is shown.Representing input x i The softmax layer output label at BiLSTM is y i Probability of->Representing the slave tag y i To y i+1 Is a transition probability of (2);
the tag sequence with the highest score is selected as the tag of the input sequence, for example:
eye (B-body part) eyelid (I-body part) cut (B-shhi) open (I-shhi) procedure (I-shhi). Semantic components can be extracted further: site-eyelid, surgical-incision;
step 4: training each semantic component (such as eyelid, incision) obtained in step 3 into word vector by using Bert model, for example, eyelid can use m-dimensional word vector A= (a) 1 ,a 2 ,..,a m ) A representation;
step 5: extracting semantic component types (such as parts, operation type and the like) analyzed in the 3 rd step corresponding to medical concepts (such as operations) to be linked with medical terms (such as palpebral incision) in the knowledge graph, and training word vectors by using a Bert model, wherein for example, the eyelid margin can be trained by using an m-dimensional word vector B= (B) 1 ,b 2 ,..,b m ) A representation;
step 6: and combining cosine similarity, and linking the semantic component B of the knowledge graph of the 4 th step semantic component A in the 5 th step. The cosine similarity is given by:
if cos theta is more than xi and xi is a threshold value, then A is considered as synonymous with B. If a plurality of B meet the condition, B with the highest similarity is selected as the synonym of A. If a is a site eyelid or an surgical incision, the same site eyelid or surgical incision can be found in the knowledge graph;
step 7: linking the medical entities in the corresponding knowledge maps of the entities to be linked based on the ontology reasoning logic;
ontology inference logic refers to the relationship between two entities that is determined by the inclusion of attributes behind the entities and the context. For entity P (e.g., blepharotomy, based on step 6, including attribute components in the corresponding knowledge-graph: site-eyelid, surgical-incision) and entity Q (e.g., surgical entity blepharotomy in the knowledge-graph, including attributes: site-blephare and surgical-incision);
if P is synonymous with Q: the number of the attributes of P and Q is the same, and the attributes are completely synonymous; facial blepharotomy is identical in number to blepharotomy in nature, and is identical in number to blepharotomy, but the eyelid is not synonymous with blepharotomy in part, so that blepharotomy is not synonymous with blepharotomy;
if P is an upper entity of Q, one of the following two conditions needs to be satisfied: the attributes of 1.P include all of the attributes in Q, and the number of attributes of P is greater than the number of attributes of Q. Some attributes of P cannot find synonymous attributes in Q, but can find the upper attributes of Q attributes. While other attributes of P include other attributes of Q.
The site eyelid of P is the superior attribute of the site blephar of Q, and the other attribute-wise incision of P includes the other attribute-wise incision of Q, so the blepharotomy is the superior entity of the blepharotomy.
The beneficial effects of the technical scheme are as follows: the Bert+BiLSTM+CRF deep learning model is applied to NER, so that more features in the text can be extracted, and component analysis is more accurate. By means of ontology reasoning of the knowledge graph, the link entity is more accurate and has more interpretability.
The embodiment also discloses an entity linking device, as shown in fig. 3, which comprises:
an obtaining module 301, configured to obtain a current medical text, and determine a medical term to be linked from the current medical text;
an obtaining module 302, configured to obtain a current word vector based on the medical terms to be linked;
the comparison module 303 is configured to compare the similarity between the current word vector and the preset word vector, and output a comparison similarity;
a determining module 304, configured to determine a current medical entity of the medical terms to be linked according to the comparison similarity;
a linking module 305 for linking the medical term to be linked with the current medical entity.
In one embodiment, as shown in fig. 4, the acquisition module includes:
a first extraction sub-module 3011, configured to extract all first medical terms from the current medical text;
the first retrieval submodule 3012 is used for inputting a first medical term into a preset knowledge graph for retrieval;
a first determination submodule 3013 for determining the medical terms to be linked by retrieval.
In one embodiment, the obtaining module comprises:
the preprocessing sub-module is used for preprocessing the medical terms to be linked and converting English components in the medical terms to be linked into corresponding Chinese;
a first calculation sub-module for calculating a label score for each chinese in the medical terms to be linked using the following formula:
wherein x= (X1, X2,) X, xn, represents the input sequence of each word in the medical term to be linked, y= (y 1, y2,) represents the output sequence of each word in the medical term to be linked,representing the probability that the input is xi and the output is label yi,>representing the probability of transition from tag yi to tag yi+1;
the selecting sub-module is used for selecting the output sequence with the highest score as the current label of the medical term to be linked;
the second extraction submodule is used for extracting n first semantic components of the current tag;
the first training submodule is used for training word vectors of each first semantic component in the n first semantic components by using a preset model;
and the second determining submodule is used for determining the word vector of each semantic component as the current word vector.
In one embodiment, the comparison module comprises:
a third determining submodule, configured to determine medical concepts corresponding to medical terms to be linked;
the second retrieval sub-module is used for retrieving all second medical terms related to the medical concept from a preset knowledge graph;
a third extraction sub-module for extracting a second semantic component in a second medical term;
the second training submodule is used for training word vectors corresponding to the second semantic components by using a preset model;
the third determining submodule is used for determining word vectors corresponding to the second semantic components as preset word vectors;
the second computing sub-module is used for computing the similarity between the current word vector and the preset word vector by using the following formula:
wherein cos θ is the similarity between the current word vector and the preset word vector, a 1 、a 2 、a n For the current word vectorN word vectors of (b) 1 、b 2 、b n Is n word vectors in the preset word vectors.
In one embodiment, the determining module includes:
the first confirming sub-module is used for confirming whether the similarity is larger than or equal to a preset threshold value;
the second confirmation sub-module is used for confirming whether the similarity is hundred percent or not if the similarity confirmed by the first confirmation sub-module is larger than or equal to a preset threshold value;
a fourth determining submodule, configured to determine, according to a preset medical entity corresponding to the preset word vector, a current medical entity corresponding to the current word vector if the second determining submodule determines that the similarity is hundred percent;
the judging sub-module is used for judging whether the current word vector and the preset word vector meet the preset condition or not when the second confirming sub-module does not confirm that the similarity is hundred percent:
if yes, the fourth determining submodule determines the current medical entity corresponding to the current word vector according to the preset medical entity corresponding to the preset word vector;
and the prompting sub-module is used for prompting that the current medical entity is not matched when the judging sub-module judges that the current word vector and the preset word vector do not meet the preset conditions.
It will be appreciated by those skilled in the art that the first and second aspects of the present invention refer to different phases of application.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (4)

1. A method of physical linking comprising the steps of:
acquiring a current medical text, and determining medical terms to be linked from the current medical text;
obtaining a current word vector based on the medical terms to be linked;
comparing the similarity of the current word vector and a preset word vector, and outputting comparison similarity;
determining the current medical entity of the medical term to be linked according to the comparison similarity;
linking the medical term to be linked with the current medical entity;
and comparing the similarity of the current word vector and the preset word vector, and outputting the comparison similarity, wherein the comparison similarity comprises the following steps:
determining medical concepts corresponding to the medical terms to be linked;
retrieving all second medical terms related to the medical concept from a preset knowledge graph;
extracting a second semantic component in the second medical term;
training word vectors corresponding to the second semantic components by using a preset model;
determining a word vector corresponding to the second semantic component as a preset word vector;
calculating the similarity between the current word vector and the preset word vector by using the following formula:
wherein the saidFor the similarity of the current word vector and the preset word vector, the a 1 、a 2 、a n For n word vectors of the current word vector, the b 1 、b 2 、b n N word vectors in the preset word vectors;
the obtaining the current medical text, determining the medical term to be linked from the current medical text, comprises the following steps:
extracting all first medical terms from the current medical text;
inputting the first medical term into a preset knowledge graph for retrieval;
determining the medical term to be linked through retrieval;
the obtaining the current word vector based on the medical term to be linked comprises the following steps:
preprocessing the medical terms to be linked, and converting English components in the medical terms to be linked into corresponding Chinese;
calculating the label score of each Chinese in the medical terms to be linked by using the following formula:
wherein, x= (X1, X2,) represents an input sequence of each word in the medical term to be linked, y= (y 1, y2,) represents an output sequence of each word in the medical term to be linked, the followingRepresenting the probability that the input is xi and the output is label yi, said +.>Representing the probability of converting from the label yi to the label yi+1, n representing the number of labels, i representing the ith label;
selecting the output sequence with the highest score as the current label of the medical term to be linked;
extracting n first semantic components of the current tag;
training word vectors of each first semantic component in the n first semantic components by using a preset model;
and determining the word vector of each semantic component as the current word vector.
2. The entity linking method according to claim 1, wherein said determining the current medical entity of the medical term to be linked according to the comparative similarity comprises:
confirming whether the similarity is larger than or equal to a preset threshold value;
if yes, confirming whether the similarity is hundred percent;
if yes, determining a current medical entity corresponding to the current word vector according to the preset medical entity corresponding to the preset word vector;
otherwise, judging whether the current word vector and the preset word vector meet preset conditions or not:
if yes, determining a current medical entity corresponding to the current word vector according to the preset medical entity corresponding to the preset word vector;
otherwise, prompting that there is no matching current medical entity.
3. An entity linking apparatus, comprising:
the acquisition module is used for acquiring a current medical text and determining medical terms to be linked from the current medical text;
the obtaining module is used for obtaining a current word vector based on the medical terms to be linked;
the comparison module is used for comparing the similarity of the current word vector and the preset word vector and outputting comparison similarity;
the determining module is used for determining the current medical entity of the medical term to be linked according to the comparison similarity;
a linking module for linking the medical term to be linked with the current medical entity;
the comparison module comprises:
a third determining submodule, configured to determine a medical concept corresponding to the medical term to be linked;
the second retrieval sub-module is used for retrieving all second medical terms related to the medical concept from a preset knowledge graph;
a third extraction sub-module for extracting a second semantic component in the second medical term;
the second training submodule is used for training word vectors corresponding to the second semantic components by using a preset model;
a third determining submodule, configured to determine a word vector corresponding to the second semantic component as a preset word vector;
the second computing sub-module is used for computing the similarity between the current word vector and the preset word vector by using the following formula:
wherein the saidFor the similarity of the current word vector and the preset word vector, the a 1 、a 2 、a n For n word vectors of the current word vector, the b 1 、b 2 、b n N word vectors in the preset word vectors;
the acquisition module comprises:
a first extraction sub-module for extracting all first medical terms from the current medical text;
the first retrieval submodule is used for inputting the first medical term into a preset knowledge graph for retrieval;
a first determination sub-module for determining the medical terms to be linked by retrieving;
the obtaining module comprises:
the preprocessing sub-module is used for preprocessing the medical terms to be linked and converting English components in the medical terms to be linked into corresponding Chinese;
a first calculation sub-module, configured to calculate a label score of each chinese in the medical terms to be linked by using the following formula:
wherein, x= (X1, X2,) represents an input sequence of each word in the medical term to be linked, y= (y 1, y2,) represents an output sequence of each word in the medical term to be linked, the followingRepresenting the probability that the input is xi and the output is label yi, said +.>Representing the probability of converting from the label yi to the label yi+1, n representing the number of labels, i representing the ith label;
the selecting submodule is used for selecting the output sequence with the highest score as the current label of the medical term to be linked;
the second extraction submodule is used for extracting n first semantic components of the current tag;
the first training submodule is used for training word vectors of each first semantic component in the n first semantic components by using a preset model;
and the second determining submodule is used for determining the word vector of each semantic component as the current word vector.
4. The entity-linking apparatus of claim 3, wherein the determining module comprises:
the first confirming sub-module is used for confirming whether the similarity is larger than or equal to a preset threshold value;
the second confirming sub-module is used for confirming whether the similarity is hundred percent or not when the first confirming sub-module confirms that the similarity is larger than or equal to the preset threshold value;
a fourth determining submodule, configured to determine, according to a preset medical entity corresponding to the preset word vector, a current medical entity corresponding to the current word vector if the second confirming submodule confirms that the similarity is one hundred percent;
the judging sub-module is used for judging whether the current word vector and the preset word vector meet preset conditions or not when the second confirming sub-module does not confirm that the similarity is hundred percent:
if yes, the fourth determining submodule determines a current medical entity corresponding to the current word vector according to the preset medical entity corresponding to the preset word vector;
and the prompting submodule is used for prompting that the current medical entity is not matched if the judging submodule judges that the current word vector and the preset word vector do not meet the preset condition.
CN202010099197.4A 2020-02-18 2020-02-18 Entity linking method and device Active CN111415748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010099197.4A CN111415748B (en) 2020-02-18 2020-02-18 Entity linking method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010099197.4A CN111415748B (en) 2020-02-18 2020-02-18 Entity linking method and device

Publications (2)

Publication Number Publication Date
CN111415748A CN111415748A (en) 2020-07-14
CN111415748B true CN111415748B (en) 2023-08-08

Family

ID=71492802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010099197.4A Active CN111415748B (en) 2020-02-18 2020-02-18 Entity linking method and device

Country Status (1)

Country Link
CN (1) CN111415748B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967270B (en) * 2020-08-16 2023-11-21 云知声智能科技股份有限公司 Method and equipment based on fusion of characters and semantics
CN114266245A (en) * 2020-09-16 2022-04-01 北京金山数字娱乐科技有限公司 Entity linking method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808124A (en) * 2017-10-09 2018-03-16 平安科技(深圳)有限公司 Electronic installation, the recognition methods of medical text entities name and storage medium
CN109582955A (en) * 2018-11-14 2019-04-05 金色熊猫有限公司 Standardized method, device and the medium of medical terms
KR101968200B1 (en) * 2018-10-20 2019-04-12 최정민 Medical information recommendation system based on diagnosis name, operation name and treatment name
CN109670177A (en) * 2018-12-20 2019-04-23 翼健(上海)信息科技有限公司 One kind realizing the semantic normalized control method of medicine and control device based on LSTM

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2487403C1 (en) * 2011-11-30 2013-07-10 Федеральное государственное бюджетное учреждение науки Институт системного программирования Российской академии наук Method of constructing semantic model of document

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808124A (en) * 2017-10-09 2018-03-16 平安科技(深圳)有限公司 Electronic installation, the recognition methods of medical text entities name and storage medium
KR101968200B1 (en) * 2018-10-20 2019-04-12 최정민 Medical information recommendation system based on diagnosis name, operation name and treatment name
CN109582955A (en) * 2018-11-14 2019-04-05 金色熊猫有限公司 Standardized method, device and the medium of medical terms
CN109670177A (en) * 2018-12-20 2019-04-23 翼健(上海)信息科技有限公司 One kind realizing the semantic normalized control method of medicine and control device based on LSTM

Also Published As

Publication number Publication date
CN111415748A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN109564589B (en) Entity identification and linking system and method using manual user feedback
CN109871538A (en) A kind of Chinese electronic health record name entity recognition method
CN109684445B (en) Spoken medical question-answering method and spoken medical question-answering system
CN112002411A (en) Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record
US11670420B2 (en) Drawing conclusions from free form texts with deep reinforcement learning
US20050210015A1 (en) System and method for patient identification for clinical trials using content-based retrieval and learning
CN111949759A (en) Method and system for retrieving medical record text similarity and computer equipment
CN112364174A (en) Patient medical record similarity evaluation method and system based on knowledge graph
CN113535974B (en) Diagnostic recommendation method and related device, electronic equipment and storage medium
CN110277167A (en) The Chronic Non-Communicable Diseases Risk Forecast System of knowledge based map
CN111415748B (en) Entity linking method and device
CN112420151A (en) Method, system, equipment and medium for structured analysis after ultrasonic report
CN112052318A (en) Semantic recognition method and device, computer equipment and storage medium
CN113901807A (en) Clinical medicine entity recognition method and clinical test knowledge mining method
CN116386805A (en) Intelligent guided diagnosis report generation method
CN117573843B (en) Knowledge calibration and retrieval enhancement-based medical auxiliary question-answering method and system
Zhang et al. Using a pre-trained language model for medical named entity extraction in Chinese clinic text
Wu et al. Sembler: Ensembling crowd sequential labeling for improved quality
CN113761151A (en) Synonym mining method, synonym mining device, synonym question answering method, synonym question answering device, computer equipment and storage medium
Wang et al. Research on named entity recognition of doctor-patient question answering community based on bilstm-crf model
CN116719840A (en) Medical information pushing method based on post-medical-record structured processing
CN112635050B (en) Diagnosis recommendation method, electronic equipment and storage device
Colón-Ruiz et al. Protected health information recognition byBiLSTM-CRF
CN113761899A (en) Medical text generation method, device, equipment and storage medium
Chen et al. Leveraging task transferability to meta-learning for clinical section classification with limited data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant