CN111414393B - Semantic similar case retrieval method and equipment based on medical knowledge graph - Google Patents

Semantic similar case retrieval method and equipment based on medical knowledge graph Download PDF

Info

Publication number
CN111414393B
CN111414393B CN202010221246.7A CN202010221246A CN111414393B CN 111414393 B CN111414393 B CN 111414393B CN 202010221246 A CN202010221246 A CN 202010221246A CN 111414393 B CN111414393 B CN 111414393B
Authority
CN
China
Prior art keywords
case
entity
similarity
matching
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010221246.7A
Other languages
Chinese (zh)
Other versions
CN111414393A (en
Inventor
武学鸿
李建华
费耀平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HUNAN CREATOR INFORMATION TECHNOLOGIES CO LTD
Original Assignee
HUNAN CREATOR INFORMATION TECHNOLOGIES CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HUNAN CREATOR INFORMATION TECHNOLOGIES CO LTD filed Critical HUNAN CREATOR INFORMATION TECHNOLOGIES CO LTD
Priority to CN202010221246.7A priority Critical patent/CN111414393B/en
Publication of CN111414393A publication Critical patent/CN111414393A/en
Application granted granted Critical
Publication of CN111414393B publication Critical patent/CN111414393B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The invention discloses a semantic similar case retrieval method and equipment based on a medical knowledge graph, wherein the method comprises the following steps: acquiring an electronic case meeting the requirements of case content specifications; carrying out structuralization processing on the electronic case text, and combining a medical knowledge map to obtain a structuralized electronic case with unified standard terms; calculating the similarity between the structured electronic case and the case in the library according to the content matching degree and the scale similarity; and sorting and outputting the cases in the library according to the calculated similarity. The similarity calculation method combines the alignment of the medical knowledge graph and the construction of a semantic similarity calculation model according to the semantic network of the knowledge graph to calculate the similarity between the structured electronic case and the cases in the database, not only considers the number of matching but also considers the matching metric value, thereby defining the similarity as more to be matched and the matching accuracy, and improving the granularity requirement and the accuracy of matching of similar cases.

Description

Semantic similar case retrieval method and equipment based on medical knowledge graph
Technical Field
The invention relates to the field of similar case retrieval, in particular to a semantic similar case retrieval method and semantic similar case retrieval equipment based on a medical knowledge graph.
Background
With the development of computer technology, retrieval has become a commonly used means for acquiring information in daily life. In the medical field, similar case retrieval has great significance in scientific research and clinic, similar cases can not only assist doctors to make better diagnosis and analysis on current cases based on past similar cases and improve the diagnosis accuracy, but also can make treatment plans of current cases through treatment schemes of similar cases, shorten the cure period of patients and improve the treatment efficiency.
The knowledge map is also called scientific knowledge map, is called knowledge domain visualization or knowledge domain mapping map in the book information field, is a series of different graphs for displaying the relation between the knowledge development process and the structure, describes knowledge resources and carriers thereof by using visualization technology, and excavates, analyzes, constructs, draws and displays knowledge and the mutual relation between the knowledge resources and the carriers. With the proposition and establishment of knowledge maps, information can be searched and inquired more conveniently, clearly and accurately, and more industries establish professional knowledge maps of various industries, such as medical knowledge maps.
The traditional similar case retrieval method is to search in a library according to medical features extracted from an input text and return a matched similar case, but the complicated relation among the medical features often causes inaccurate definition, so that the retrieval granularity is coarse and the retrieval is inaccurate.
Disclosure of Invention
The invention provides a semantic similar case retrieval method based on a medical knowledge graph, which aims to solve the problems of inaccurate definition, coarse retrieval granularity and inaccurate retrieval in the conventional similar case retrieval.
The technical scheme adopted by the invention is as follows:
a semantic similar case retrieval method based on a medical knowledge graph comprises the following steps:
acquiring an electronic case meeting the requirements of case content specifications;
carrying out structuralization processing on the electronic case text, and combining a medical knowledge map to obtain a structuralized electronic case with unified standard terms;
calculating the similarity between the structured electronic case and the case in the library according to the content matching degree and the scale similarity;
and sorting and outputting the cases in the library according to the calculated similarity.
As a possible embodiment, the electronic case meeting the requirements of the case content specification includes basic information of the patient including the name, sex, age and marital status of the patient and basic health information including chief complaints, current medical history, past medical history, personal history, family history and physical examination.
As a possible embodiment, the step of structuring the electronic case text by combining the medical knowledge map to obtain a structured electronic case with uniform canonical terms specifically includes the steps of:
extracting a medical entity from the basic health information of the patient by using an entity extraction model;
aligning and standardizing the extracted medical entity with the medical knowledge graph, and aligning the non-professional term expression with the professional term expression to obtain the medical entity with standard terms;
and classifying the medical entities with the standard terms according to preset entity categories to obtain the structured electronic cases with the uniform standard terms.
As a feasible embodiment, the entity extraction model adopts a named entity recognition model bilstm-crf, and training learning is carried out based on an electronic case text; when the extracted medical entity is aligned and standardized with the medical knowledge graph, a translation model bilstm-attention based on an encoding and decoding technology is adopted, and training learning is carried out based on a unified and standardized medical term system in the medical knowledge graph.
As a possible embodiment, the predetermined entity category is obtained by classifying several categories of clinical features according to different sources of the entity and the negative or positive of the entity, and includes: chief symptoms, chief signs, non-chief symptoms, non-chief signs, current illness, historical illness, current causes, historical causes, familial illness, current medications, historical medications, current surgery, historical surgery, current examination items, historical examination items, current examination results, historical examination results, current physical examination, historical physical examination, current occupation, historical occupation, physical constitution, physical condition, the plurality of medical clinical characteristics including chief symptoms, chief signs, non-chief symptoms, non-chief signs, illness, causes, surgery, medication, physical condition, physical constitution, occupation, physical examination, examination items, examination results, and examination results, the negative positive of the entity indicates the existence of the entity, positive indicates the presence of positive, and negative indicates the absence of negative.
As a possible embodiment, the calculating the similarity between the structured electronic case and the case in the library according to the content matching degree and the scale similarity specifically comprises the following steps:
calculating a content matching degree of the structured electronic case and the case in the library, wherein the content matching degree is obtained by dividing the entity matching score of the structured electronic case and the case in the library by the total entity score of the structured electronic case:
Figure GDA0002892229120000031
wherein M represents a content matching degree, S1Entity matching score, S, representing structured case and case in repository2The method comprises the following steps of representing the total entity score of a structured electronic case, w representing entity category weight, m representing entity type total number, i representing the currently traversed entity type ordinal number, n representing the entity total number corresponding to the ith entity type, j representing the currently traversed entity ordinal number, f representing the result of entity matching, wherein the value is 0-1, the matching factor is equal to 1 if complete matching is successful, and the matching factor is 0 if complete matching is failed, wherein the matching factor f between any two entities is calculated based on a tree structure formed by the subordinate relations between the entities in a medical knowledge graph:
fab=1/(1+n)
wherein n is the distance for finding b from the entity a to the root node or finding a from the entity b to the root node, if not, the distance n is infinite, the matching factor between the entity a and the entity b is 0, if a is b, the distance n is 0, and the matching factor is 1;
calculating the similarity of the scale of the structured electronic case and the scale of the case in the library, wherein the calculation formula is as follows:
C=N1/N2,N2≥N1
wherein C represents the scale approximation, N1Representing the total number of case entities, N, with a smaller number of entities2Representing the total number of the case entities with more entities;
calculating the similarity between the structured electronic case and the case in the library according to the formula
Figure GDA0002892229120000041
As a possible embodiment, the sorting and outputting the cases in the library according to the calculated similarity includes:
acquiring a list of cases with high to low similarity between the cases in the database and the structured electronic cases according to the calculated similarity;
and traversing the case list, filtering the case list according to a preset similarity threshold t, and sequentially storing cases in the library with the similarity greater than or equal to the similarity threshold t into a final return list and outputting the cases.
A semantic similar case retrieval device based on medical knowledge mapping comprises:
the case acquisition module is used for acquiring the electronic case meeting the case content standard requirement;
the case structuring module is used for structuring the electronic case text by combining a medical knowledge map to obtain a structured electronic case with uniform standard terms;
the similarity calculation module is used for calculating the similarity between the structured electronic case and the case in the library according to the content matching degree and the scale similarity;
and the output module is used for sequencing and outputting the cases in the library according to the calculated similarity.
A storage medium comprising a stored program which, when executed, controls a device on which the storage medium is located to perform the method for semantic similar case retrieval based on medical knowledge-graph.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the medical knowledge-graph based semantic similar case retrieval method as described when executing the program.
Compared with the prior art, the invention has the following beneficial effects:
the invention extracts the clinical expression information of the case content through the entity extraction model, and performs the structuralization processing on the electronic case text by combining the medical knowledge map to obtain the structuralized electronic case with the uniform standard terms, and according to the content matching degree and scale similarity degree calculating the similarity of said structured electronic case and case in the library to obtain and output the similar case in the library, thereby ensuring the correctness and normalization of the transformation from the unstructured case to the structured case, and simultaneously, because the similarity between the structured electronic case and the case in the database is calculated by constructing a semantic similarity calculation model according to the semantic network of the knowledge graph, the matching quantity and the matching metric value are considered, therefore, the definition of the similarity is not only much to be matched, but also the matching accuracy, and the granularity requirement and the accuracy of matching of similar cases are improved.
In addition to the objects, features and advantages described above, other objects, features and advantages of the present invention are also provided. The present invention will be described in further detail below with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a flow chart of the semantic similar case retrieval method based on medical knowledge mapping according to the preferred embodiment of the invention.
Fig. 2 is a sample basic health information for an electronic patient case.
Fig. 3 is a schematic diagram of structured extraction of electronic patient cases according to the preferred embodiment of the present invention.
Fig. 4 is a diagram illustrating the effect of the present invention on the normalization of structured electronic cases.
Fig. 5 is a schematic diagram of an output similar cases interface in accordance with a preferred embodiment of the present invention.
FIG. 6 is a schematic diagram of a tree structure of dependencies between entities in a medical knowledge graph.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
As shown in fig. 1, a semantic similar case retrieval method based on medical knowledge-graph includes the steps:
s1, acquiring the electronic case meeting the case content specification requirement;
s2, carrying out structuring processing on the electronic case text, and combining a medical knowledge map to obtain a structured electronic case with uniform standard terms;
s3, calculating the similarity between the structured electronic case and the case in the library according to the content matching degree and the scale similarity;
and S4, sorting and outputting the cases in the library according to the calculated similarity.
The embodiment extracts clinical performance information of case content through an entity extraction model, performs structural processing on the electronic case text by combining a medical knowledge graph to obtain a structural electronic case with uniform standard terms, and according to the content matching degree and scale similarity degree calculating the similarity of said structured electronic case and case in the library to obtain and output the similar case in the library, thereby ensuring the correctness and normalization of the transformation from the unstructured case to the structured case, and simultaneously, because the similarity between the structured electronic case and the case in the database is calculated by constructing a semantic similarity calculation model according to the semantic network of the knowledge graph, the matching quantity and the matching metric value are considered, therefore, the definition of the similarity is not only much to be matched, but also the matching accuracy, and the granularity requirement and the accuracy of matching of similar cases are improved.
As a possible embodiment, as shown in table 1, the electronic case meeting the requirements of the case content specification includes basic patient information including patient name, sex, age, marital status, and occupation, and basic health information including chief complaints, current medical history, past medical history, personal history, family history, and physical examination. Wherein the basic information of the patient is structured information, and the basic health information is unstructured text, and further structured extraction is needed.
TABLE 1 basic patient information and basic health information
Figure GDA0002892229120000071
The basic health information in the electronic case is shown in fig. 2, for example, and further structured extraction is required because the basic health information is unstructured text.
As a possible embodiment, the step of structuring the electronic case text by combining the medical knowledge map to obtain a structured electronic case with uniform canonical terms specifically includes the steps of:
s21, extracting medical entities from basic health information of a patient by using an entity extraction model, wherein the entity extraction model adopts a named entity recognition model bilstm-crf and is trained and learned based on an electronic case text;
s22, aligning and standardizing the extracted medical entity with the medical knowledge graph, aligning the non-professional term expression with the professional term expression to obtain the medical entity with standard terms, and adopting a translation model bilstm-attention based on coding and decoding technology when aligning and standardizing the extracted medical entity with the medical knowledge graph, and training and learning based on a medical term system with unified specification in the medical knowledge graph;
and S23, classifying the medical entities with the standard terms according to preset entity categories to obtain the structured electronic cases with the uniform standard terms.
As shown in table 2, the predetermined entity categories are obtained by classifying several categories of clinical features according to different sources of the entities and negative and positive properties of the entities, including: chief symptoms, chief signs, non-chief symptoms, non-chief signs, current illness, historical illness, current causes, historical causes, familial illness, current medications, historical medications, current surgery, historical surgery, current examination items, historical examination items, current examination results, historical examination results, current physical examination, historical physical examination, current occupation, historical occupation, physical constitution, physical condition, the plurality of medical clinical characteristics including chief symptoms, chief signs, non-chief symptoms, non-chief signs, illness, causes, surgery, medication, physical condition, physical constitution, occupation, physical examination, examination items, examination results, and examination results, the negative positive of the entity indicates the existence of the entity, positive indicates the presence of positive, and negative indicates the absence of negative.
Table 2 entity classes for structured extraction
Figure GDA0002892229120000081
Figure GDA0002892229120000091
In the above embodiment, the structured extraction process is mainly divided into two steps, the first step is to extract an entity from the basic health information of the patient; and secondly, classifying the extracted entities according to the classification rules in the table 2.
After structured extraction is performed on the basic health information of the patient, the extraction result shown in fig. 3 is obtained.
In order to normalize the extracted entities, all entities are further structured and aligned (non-professional expressions are aligned with professional expressions, and all entities in the knowledge graph are expressed by professional terms), and the normalized processing effect is as shown in fig. 4, converting the entities into a structured effect in JSON format, and secondly converting the entities into standard terms in medical knowledge graph by aligning, and describing as "iterating" because there is no "reappearance" term in the knowledge graph. After all entities are extracted and normalized, all entities are classified and organized according to the classification mode provided by the table 2, and finally the structured electronic case with normalized terms is obtained.
As a possible embodiment, the calculating the similarity between the structured electronic case and the case in the library according to the content matching degree and the scale similarity specifically comprises the following steps:
calculating a content matching degree of the structured electronic case and the case in the library, wherein the content matching degree is obtained by dividing the entity matching score of the structured electronic case and the case in the library by the total entity score of the structured electronic case:
Figure GDA0002892229120000101
wherein M represents a content matching degree, S1Entity matching score, S, representing structured case and case in repository2The method comprises the following steps of representing the total entity score of a structured electronic case, w represents entity category weight, m represents entity type total number, i represents the currently traversed entity type ordinal number, n represents the entity total number corresponding to the ith entity type, j represents the currently traversed entity ordinal number, f is a matching factor, represents the result of entity matching, and takes the value of 0-1, if complete matching is successful, the matching factor is equal to 1, and if complete matching is failed, the matching factor is 0, wherein the entity and knowledge in a knowledge graph have an affiliation relationship, so that a tree structure is formed, as shown in FIG. 6, therefore, the matching factor f between any two entities is calculated based on the tree structure formed by the affiliation relationship between the entity and the entity in a medical knowledge graph:
fab=1/(1+n)
wherein n is a distance from the entity a to the root node or a from the entity b to the root node, if the distance n is not found, the distance is infinite, the matching factor between the entity a and the entity b is 0, if the distance a is b, the distance n is 0, the matching factor is 1, in the tree structure diagram shown in fig. 6, the irritant dry cough belongs to dry cough, the dry cough belongs to the cough, and the cough belongs to respiratory system symptoms, the repeated cough belongs to cough, if the entity a is irritant dry cough, and if the entity b is cough, the distance n from the entity a (irritant dry cough) to the entity b (cough) is 2, the matching factor f between the entity a (irritant dry cough) and the entity b (cough) is 1/3, if the entity a is repeated cough, and the b is cough, the distance n from the entity a (recurrent cough) to the entity b (cough) is 1, the matching factor f between entity a (repeated cough) and entity b (cough) is 1/2;
calculating the similarity of the scale of the structured electronic case and the scale of the case in the library, wherein the calculation formula is as follows:
C=N1/N2,N2≥N1
wherein C represents the scale approximation, N1Representing the total number of case entities, N, with a smaller number of entities2Representing the total number of the case entities with more entities;
calculating the similarity between the structured electronic case and the case in the library according to the formula
Figure GDA0002892229120000111
As a possible embodiment, the sorting and outputting the cases in the library according to the calculated similarity includes:
acquiring a list of cases with high to low similarity between the cases in the database and the structured electronic cases according to the calculated similarity;
and traversing the case list, filtering the case list according to a preset similarity threshold t (default to 0.5), and sequentially storing cases with similarity greater than or equal to the similarity threshold t in a final return list and outputting the cases.
In the above embodiment, the weight w of each entity category is determined by multiple professional doctors according to years of experience of medical science, and the accuracy of the model can be adjusted through the weight parameters. The weights corresponding to the entity categories are specifically shown in table 3.
TABLE 3 weight definition and initialization values for entity classes
Figure GDA0002892229120000121
The similarity degree in this embodiment is calculated in two steps, including content similarity degree and scale similarity degree, which are actually two dimensions considered by the similarity degree calculation: both the number of matches and the metric of the matches are taken into account, i.e. the definition of similarity is not only how many matches are to be matched, but also what the match is to be.
The advantages of similarity calculation in this embodiment include:
(1) based on the similarity model, 17 classes of matching factors are provided, and chief complaints and non-chief complaint symptoms are distinguished.
(2) The matching values of two entities, such as 'cough' and 'repeated cough', are calculated by adopting semantic relations based on the knowledge graph, the matching values of the two entities are not 1, but are not 0, the repeated cough belongs to the cough, and the repeated cough belongs to the upper and lower relation in the knowledge graph because the later has a regular attribute of 'repeated'.
(3) And abundant weight parameters are provided, different people have different understandings on the similarity, and the accuracy of the model can be adjusted through the weight parameters.
In the above embodiment, the output similar cases all have a similar value, the value range is 0-1, 1 indicates that the cases are completely consistent, and 0 indicates that the cases are completely different. The higher the threshold value is, the higher the similarity of the output medical records is, and since the higher the reference value of the case with the higher similarity is, in order to reduce the amount of the output similar cases, and also ensure that the doctor can focus attention on the case with the earlier similarity, in the above embodiment, the similar cases are filtered by setting the similarity threshold value t, so that the case with the similarity lower than the threshold value t is filtered, and only the similar cases with the similarity greater than the threshold value t are output, and the similarity threshold value t selected in the above embodiment is 0.5. The filtered similar case information is output as shown in fig. 5.
Another embodiment of the present invention provides a semantic similar case retrieval apparatus based on a medical knowledge graph, including:
the case acquisition module is used for acquiring the electronic case meeting the case content standard requirement;
the case structuring module is used for structuring the electronic case text by combining a medical knowledge map to obtain a structured electronic case with uniform standard terms;
the similarity calculation module is used for calculating the similarity between the structured electronic case and the case in the library according to the content matching degree and the scale similarity;
and the output module is used for sequencing and outputting the cases in the library according to the calculated similarity.
Another embodiment of the present invention provides a storage medium comprising a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the semantic similar case retrieval method based on the medical knowledge-graph.
Another embodiment of the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the medical knowledge-graph-based semantic similar case retrieval method.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The functions of the method of the present embodiment, if implemented in the form of software functional units and sold or used as independent products, may be stored in one or more storage media readable by a computing device. Based on such understanding, part of the contribution of the embodiments of the present invention to the prior art or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device, a network device, or the like) to execute all or part of the steps of the method described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A semantic similar case retrieval method based on a medical knowledge graph is characterized by comprising the following steps:
acquiring an electronic case meeting the requirements of case content specifications;
carrying out structuralization processing on the electronic case text, and combining a medical knowledge map to obtain a structuralized electronic case with unified standard terms;
calculating the similarity between the structured electronic case and the case in the library according to the content matching degree and the scale similarity;
sorting and outputting the cases in the library according to the calculated similarity;
the step of calculating the similarity between the structured electronic case and the case in the database according to the content matching degree and the scale similarity specifically comprises the following steps:
calculating a content matching degree of the structured electronic case and the case in the library, wherein the content matching degree is obtained by dividing the entity matching score of the structured electronic case and the case in the library by the total entity score of the structured electronic case:
Figure FDA0002892229110000011
wherein M represents a content matching degree, S1Entity matching score, S, representing structured case and case in repository2The method comprises the following steps of representing the total entity score of a structured electronic case, w representing entity category weight, m representing entity type total number, i representing the currently traversed entity type ordinal number, n representing the entity total number corresponding to the ith entity type, j representing the currently traversed entity ordinal number, f representing the result of entity matching, wherein the value is 0-1, the matching factor is equal to 1 if complete matching is successful, and the matching factor is 0 if complete matching is failed, wherein the matching factor f between any two entities is calculated based on a tree structure formed by the subordinate relations between the entities in a medical knowledge graph:
fab=1/(1+n)
wherein n is the distance for finding b from the entity a to the root node or finding a from the entity b to the root node, if not, the distance n is infinite, the matching factor between the entity a and the entity b is 0, if a is b, the distance n is 0, and the matching factor is 1;
calculating the similarity of the scale of the structured electronic case and the scale of the case in the library, wherein the calculation formula is as follows:
C=N1/N2,N2≥N1
wherein C represents the scale approximation, N1Representing the total number of case entities, N, with a smaller number of entities2Representing the total number of the case entities with more entities;
calculating the similarity between the structured electronic case and the case in the library according to the formula
Figure FDA0002892229110000021
2. The semantic similar case retrieval method based on medical knowledge-graph according to claim 1, wherein the electronic case meeting the case content specification comprises basic patient information and basic health information, the basic patient information comprises patient name, gender, age and marital conditions, and the basic health information comprises chief complaints, current medical history, past medical history, personal history, family history and physical examination.
3. The semantic similar case retrieval method based on medical knowledge-graph as claimed in claim 2, wherein the step of combining medical knowledge-graph to structure the electronic case text to obtain the structured electronic case with unified canonical terms comprises the steps of:
extracting a medical entity from the basic health information of the patient by using an entity extraction model;
aligning and standardizing the extracted medical entity with the medical knowledge graph, and aligning the non-professional term expression with the professional term expression to obtain the medical entity with standard terms;
and classifying the medical entities with the standard terms according to preset entity categories to obtain the structured electronic cases with the uniform standard terms.
4. The medical knowledge graph-based semantic similar case retrieval method according to claim 3, wherein the entity extraction model adopts a named entity recognition model bilstm-crf and is trained and learned based on electronic case texts; when the extracted medical entity is aligned and standardized with the medical knowledge graph, a translation model bilstm-attention based on an encoding and decoding technology is adopted, and training learning is carried out based on a unified and standardized medical term system in the medical knowledge graph.
5. The medical knowledge-graph-based semantic similar case retrieval method according to claim 4, wherein the preset entity categories are obtained by classifying a plurality of categories of medical clinical features according to different sources of entities and negative and positive of the entities, and the method comprises the following steps: chief symptoms, chief signs, non-chief symptoms, non-chief signs, current illness, historical illness, current causes, historical causes, familial illness, current medications, historical medications, current surgery, historical surgery, current examination items, historical examination items, current examination results, historical examination results, current physical examination, historical physical examination, current occupation, historical occupation, physical constitution, physical condition, the plurality of medical clinical characteristics including chief symptoms, chief signs, non-chief symptoms, non-chief signs, illness, causes, surgery, medication, physical condition, physical constitution, occupation, physical examination, examination items, examination results, and examination results, the negative positive of the entity indicates the existence of the entity, positive indicates the presence of positive, and negative indicates the absence of negative.
6. The semantic similar case retrieval method based on medical knowledge-graph according to claim 1, wherein the sorting and outputting the cases in the library according to the calculated similarity comprises:
acquiring a list of cases with high to low similarity between the cases in the database and the structured electronic cases according to the calculated similarity;
and traversing the case list, filtering the case list according to a preset similarity threshold t, and sequentially storing cases in the library with the similarity greater than or equal to the similarity threshold t into a final return list and outputting the cases.
7. A semantic similar case retrieval device based on medical knowledge mapping is characterized by comprising:
the case acquisition module is used for acquiring the electronic case meeting the case content standard requirement;
the case structuring module is used for structuring the electronic case text by combining a medical knowledge map to obtain a structured electronic case with uniform standard terms;
and the similarity calculation module is used for calculating the similarity between the structured electronic case and the case in the library according to the content matching degree and the scale similarity: calculating a content matching degree of the structured electronic case and the case in the library, wherein the content matching degree is obtained by dividing the entity matching score of the structured electronic case and the case in the library by the total entity score of the structured electronic case:
Figure FDA0002892229110000041
wherein M represents a content matching degree, S1Entity matching score, S, representing structured case and case in repository2The method comprises the following steps of representing the total entity score of a structured electronic case, w representing entity category weight, m representing entity type total number, i representing the currently traversed entity type ordinal number, n representing the entity total number corresponding to the ith entity type, j representing the currently traversed entity ordinal number, f representing the result of entity matching, wherein the value is 0-1, the matching factor is equal to 1 if complete matching is successful, and the matching factor is 0 if complete matching is failed, wherein the matching factor f between any two entities is calculated based on a tree structure formed by the subordinate relations between the entities in a medical knowledge graph:
fab=1/(1+n)
wherein n is the distance for finding b from the entity a to the root node or finding a from the entity b to the root node, if not, the distance n is infinite, the matching factor between the entity a and the entity b is 0, if a is b, the distance n is 0, and the matching factor is 1;
calculating the similarity of the scale of the structured electronic case and the scale of the case in the library, wherein the calculation formula is as follows:
C=N1/N2,N2≥N1
wherein C represents the scale approximation, N1Representing the total number of case entities, N, with a smaller number of entities2Representing the total number of the case entities with more entities;
calculating the similarity between the structured electronic case and the case in the library according to the formula
Figure FDA0002892229110000051
And the output module is used for sequencing and outputting the cases in the library according to the calculated similarity.
8. A storage medium comprising a stored program, characterized in that when the program is run, the apparatus on which the storage medium is located is controlled to execute the medical knowledge-graph based semantic similar case retrieval method according to any one of claims 1 to 6.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the medical knowledge graph-based semantic similar case retrieval method according to any one of claims 1 to 6.
CN202010221246.7A 2020-03-26 2020-03-26 Semantic similar case retrieval method and equipment based on medical knowledge graph Active CN111414393B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010221246.7A CN111414393B (en) 2020-03-26 2020-03-26 Semantic similar case retrieval method and equipment based on medical knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010221246.7A CN111414393B (en) 2020-03-26 2020-03-26 Semantic similar case retrieval method and equipment based on medical knowledge graph

Publications (2)

Publication Number Publication Date
CN111414393A CN111414393A (en) 2020-07-14
CN111414393B true CN111414393B (en) 2021-02-23

Family

ID=71491424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010221246.7A Active CN111414393B (en) 2020-03-26 2020-03-26 Semantic similar case retrieval method and equipment based on medical knowledge graph

Country Status (1)

Country Link
CN (1) CN111414393B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986749A (en) * 2020-07-15 2020-11-24 万达信息股份有限公司 Digital pathological image retrieval system
CN112070731B (en) * 2020-08-27 2021-05-11 佛山读图科技有限公司 Method for guiding registration of human body model atlas and case CT image by artificial intelligence
CN112216397A (en) * 2020-09-10 2021-01-12 广州呼吸健康研究院 Early warning method and system for new coronary pneumonia
CN112635072A (en) * 2020-12-31 2021-04-09 大连东软教育科技集团有限公司 ICU (intensive care unit) similar case retrieval method and system based on similarity calculation and storage medium
CN112650860A (en) * 2021-01-15 2021-04-13 科技谷(厦门)信息技术有限公司 Intelligent electronic medical record retrieval system based on knowledge graph
CN112925918B (en) * 2021-02-26 2023-03-24 华南理工大学 Question-answer matching system based on disease field knowledge graph
CN113257371B (en) * 2021-06-03 2022-02-15 中南大学 Clinical examination result analysis method and system based on medical knowledge map
CN113345587B (en) * 2021-06-16 2022-06-17 北京邮电大学 Man-machine collaborative health case matching method and system based on chronic disease big data
CN113641784A (en) * 2021-06-25 2021-11-12 合肥工业大学 Medical knowledge recommendation method and system integrating medical teaching and research
CN113221541A (en) * 2021-07-09 2021-08-06 清华大学 Data extraction method and device
CN113539409B (en) * 2021-07-28 2024-04-26 平安科技(深圳)有限公司 Treatment scheme recommendation method, device, equipment and storage medium
CN113488189A (en) * 2021-08-03 2021-10-08 罗慕科技(北京)有限公司 Similar case retrieval device, method and computer-readable storage medium
CN113590842A (en) * 2021-08-05 2021-11-02 思必驰科技股份有限公司 Medical term standardization method and system
CN113722418A (en) * 2021-08-30 2021-11-30 平安科技(深圳)有限公司 Clinical case standardization method, device, equipment and medium
CN113886535B (en) * 2021-09-18 2022-07-08 前海飞算云创数据科技(深圳)有限公司 Knowledge graph-based question and answer method and device, storage medium and electronic equipment
CN114300083B (en) * 2021-11-16 2022-10-18 北京左医科技有限公司 Medical record construction method and system
CN113934824B (en) * 2021-12-15 2022-05-06 之江实验室 Similar medical record matching system and method based on multi-round intelligent question answering
CN114743681B (en) * 2021-12-20 2024-01-30 健康数据(北京)科技有限公司 Case grouping screening method and system based on natural language processing
CN115312186B (en) * 2022-08-09 2023-06-09 北京至真互联网技术有限公司 Auxiliary screening system for diabetic retinopathy
CN115269613B (en) * 2022-09-27 2023-01-13 四川互慧软件有限公司 Patient main index construction method, system, equipment and storage medium

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2467791B1 (en) * 2009-10-13 2021-04-28 Open Text Software GmbH Method for performing transactions on data and a transactional database
CN108140025A (en) * 2015-05-26 2018-06-08 阿雅斯迪公司 For the interpretation of result of graphic hotsopt
US10007721B1 (en) * 2015-07-02 2018-06-26 Collaboration. AI, LLC Computer systems, methods, and components for overcoming human biases in subdividing large social groups into collaborative teams
CN106897572A (en) * 2017-03-08 2017-06-27 山东大学 Lung neoplasm case matching assisted detection system and its method of work based on manifold learning
CN106934018A (en) * 2017-03-11 2017-07-07 广东省中医院 A kind of doctor's commending system based on collaborative filtering
CN106991284B (en) * 2017-03-31 2019-12-31 南华大学 Intelligent child-care knowledge service method and system
CN107247868B (en) * 2017-05-18 2020-05-12 深思考人工智能机器人科技(北京)有限公司 Artificial intelligence auxiliary inquiry system
US10937551B2 (en) * 2017-11-27 2021-03-02 International Business Machines Corporation Medical concept sorting based on machine learning of attribute value differentiation
CN108492886B (en) * 2018-03-26 2020-10-09 合肥工业大学 Minimally invasive surgery similar case recommendation method, device, equipment and medium
EP3557439A1 (en) * 2018-04-16 2019-10-23 Tata Consultancy Services Limited Deep learning techniques based multi-purpose conversational agents for processing natural language queries
CN108595708A (en) * 2018-05-10 2018-09-28 北京航空航天大学 A kind of exception information file classification method of knowledge based collection of illustrative plates
CN108875051B (en) * 2018-06-28 2020-04-28 中译语通科技股份有限公司 Automatic knowledge graph construction method and system for massive unstructured texts
CN110265098A (en) * 2019-05-07 2019-09-20 平安科技(深圳)有限公司 A kind of case management method, apparatus, computer equipment and readable storage medium storing program for executing
CN110222201B (en) * 2019-06-26 2021-04-27 中国医学科学院医学信息研究所 Method and device for constructing special disease knowledge graph
CN110516260A (en) * 2019-08-30 2019-11-29 腾讯科技(深圳)有限公司 Entity recommended method, device, storage medium and equipment
CN110598116A (en) * 2019-09-19 2019-12-20 上海腾程医学科技信息有限公司 Inspection item recommendation method and device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN111414393A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN111414393B (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
CN107731269B (en) Disease coding method and system based on original diagnosis data and medical record file data
CN107705839B (en) Disease automatic coding method and system
CN110993081B (en) Doctor online recommendation method and system
CN106776711B (en) Chinese medical knowledge map construction method based on deep learning
CN106682411B (en) A method of disease label is converted by physical examination diagnostic data
CN113707297B (en) Medical data processing method, device, equipment and storage medium
US7809660B2 (en) System and method to optimize control cohorts using clustering algorithms
CN112131393A (en) Construction method of medical knowledge map question-answering system based on BERT and similarity algorithm
CN113535974B (en) Diagnostic recommendation method and related device, electronic equipment and storage medium
CN114817386A (en) Method and device for generating structured medical data
CN111191048A (en) Emergency call question-answering system construction method based on knowledge graph
WO2020074023A1 (en) Deep learning-based method and device for screening for key sentences in medical document
Wang et al. Automatic diagnosis with efficient medical case searching based on evolving graphs
WO2021127012A1 (en) Unsupervised taxonomy extraction from medical clinical trials
Khan et al. Development of national health data warehouse for data mining.
CN113764112A (en) Online medical question and answer method
Wang et al. Multiple valued logic approach for matching patient records in multiple databases
CN110299194B (en) Similar case recommendation method based on comprehensive feature representation and improved wide-depth model
Saranya et al. Intelligent medical data storage system using machine learning approach
CN113343680A (en) Structured information extraction method based on multi-type case history texts
CN116304114B (en) Intelligent data processing method and system based on surgical nursing
CN113284627A (en) Medication recommendation method based on patient characterization learning
CN112635072A (en) ICU (intensive care unit) similar case retrieval method and system based on similarity calculation and storage medium
Kalankesh et al. Taming EHR data: using semantic similarity to reduce dimensionality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant