CN110299209B - Similar medical record searching method, device and equipment and readable storage medium - Google Patents

Similar medical record searching method, device and equipment and readable storage medium Download PDF

Info

Publication number
CN110299209B
CN110299209B CN201910557217.5A CN201910557217A CN110299209B CN 110299209 B CN110299209 B CN 110299209B CN 201910557217 A CN201910557217 A CN 201910557217A CN 110299209 B CN110299209 B CN 110299209B
Authority
CN
China
Prior art keywords
graph structure
structure data
class
similarity
subgraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910557217.5A
Other languages
Chinese (zh)
Other versions
CN110299209A (en
Inventor
魏巍
陈俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910557217.5A priority Critical patent/CN110299209B/en
Publication of CN110299209A publication Critical patent/CN110299209A/en
Application granted granted Critical
Publication of CN110299209B publication Critical patent/CN110299209B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a device, equipment and a readable storage medium for searching similar medical records, which are used for searching the similar medical records by acquiring inquiry medical record data and a plurality of historical medical record data; acquiring query graph structure data corresponding to query medical record data and historical graph structure data corresponding to historical medical record data, wherein the query graph structure data and the historical graph structure data comprise a first-class subgraph and a second-class subgraph, and intermediate nodes and leaf nodes of the second-class subgraph are obtained by performing feature recognition on the first-class subgraph; acquiring the similarity degree of each historical graph structure data and the query graph structure data according to the root node similarity degree, the first class of subgraph similarity degree and the second class of subgraph similarity degree; according to the preset selection rule and the similarity degree, the similar medical record searching result of the medical record data is determined, so that inherent and recognizable subgraphs in the medical record data are extracted, the relevance of the content of the corresponding subgraphs is measured in comparison, and the accuracy of searching the similar medical record is improved.

Description

Similar medical record searching method, device and equipment and readable storage medium
Technical Field
The invention relates to the technical field of information processing, in particular to a method, a device, equipment and a readable storage medium for searching similar medical records.
Background
In the medical field, similar medical record retrieval has great significance in scientific research and clinic. For example, when a patient is in a visit, a doctor can quickly search a medical record similar to the patient and can make effective judgment in time according to the diagnosis and treatment path and effect of the similar medical record; or, when a doctor analyzes a medical record or writes a medical record report aiming at a certain medical record, the doctor can obtain some referenceable diagnosis opinions and treatment methods from the medical record by referring to historical medical records with certain similarity; alternatively, in clinical research, it is necessary to search more similar medical records for research and discussion from a certain medical record as a starting point in some cases.
The conventional medical record matching retrieval mode is generally retrieval of medical record full-text information. For example, if fever and dyspnea are used as keywords for searching, all medical records with the two keywords of fever and dyspnea in the pre-stored medical records can be searched.
However, the corresponding diseases are very different due to the same symptoms, and the accuracy of the conventional similar medical record retrieval method is not high.
Disclosure of Invention
The embodiment of the invention provides a method, a device and equipment for searching similar medical records and a readable storage medium, which improve the accuracy and reliability of searching similar medical records.
According to a first aspect of the present invention, a method for searching similar medical records is provided, which includes:
acquiring query medical record data and a plurality of historical medical record data;
acquiring query graph structure data corresponding to the query medical record data and historical graph structure data corresponding to each historical medical record data, wherein the query graph structure data and the historical graph structure data both comprise a first-class subgraph and a second-class subgraph, a middle node of the first-class subgraph is of a medical record field type, and a middle node and a leaf node of the second-class subgraph are obtained by performing feature recognition on the first-class subgraph;
acquiring the similarity degree of each historical graph structure data and the query graph structure data according to the similarity degree of each historical graph structure data and a root node in the query graph structure data, the similarity degree of a first class of subgraph and the similarity degree of a second class of subgraph;
and determining similar medical record searching results of the inquired medical record data in the plurality of historical medical record data according to a preset selection rule and the similarity degree, wherein the historical graph structure data corresponding to the similar medical record searching results has the similarity degree meeting the preset selection rule.
According to a second aspect of the present invention, there is provided a similar medical record searching apparatus, including:
the medical record acquisition module is used for acquiring inquiry medical record data and a plurality of historical medical record data;
the graph structuring module is used for acquiring query graph structure data corresponding to the query medical record data and historical graph structure data corresponding to the historical medical record data, wherein the query graph structure data and the historical graph structure data both comprise a first-class subgraph and a second-class subgraph, a middle node of the first-class subgraph is of a medical record field type, and a middle node and a leaf node of the second-class subgraph are obtained by performing feature recognition on the first-class subgraph;
the processing module is used for acquiring the similarity degree of each historical graph structure data and the query graph structure data according to the similarity degree of each historical graph structure data and a root node in the query graph structure data, the similarity degree of a first class of subgraph and the similarity degree of a second class of subgraph;
and the selecting module is used for determining similar medical record searching results of the inquired medical record data in the historical medical record data according to a preset selecting rule and the similarity degree, wherein the historical graph structure data corresponding to the similar medical record searching results has the similarity degree meeting the preset selecting rule.
According to a third aspect of the invention, there is provided an apparatus comprising: the system comprises a memory, a processor and a computer program, wherein the computer program is stored in the memory, and the processor runs the computer program to execute the similar medical record searching method according to the first aspect and various possible designs of the first aspect.
According to a fourth aspect of the present invention, there is provided a readable storage medium, in which a computer program is stored, and the computer program is used for implementing the similar medical record searching method according to the first aspect and various possible designs of the first aspect of the present invention when executed by a processor.
According to the method, the device, the equipment and the readable storage medium for searching the similar medical records, the medical record data and the plurality of historical medical record data are inquired by acquiring; acquiring query graph structure data corresponding to the query medical record data and history graph structure data corresponding to each history medical record data, wherein the query graph structure data and the history graph structure data both comprise a first class subgraph and a second class subgraph, a middle node of the first class subgraph is of a medical record field type, and a middle node and a leaf node of the second class subgraph are obtained by performing feature recognition on the first class subgraph; acquiring the similarity degree of each historical graph structure data and the query graph structure data according to the similarity degree of each historical graph structure data and a root node in the query graph structure data, the similarity degree of a first class of subgraph and the similarity degree of a second class of subgraph; and determining a similar medical record searching result of the inquired medical record data in the plurality of historical medical record data according to a preset selection rule and the similarity degree, thereby extracting an inherent subgraph and an identifiable subgraph in the inquired medical record data, measuring the relevance of the inquired medical record data and the data in the corresponding subgraph in the historical medical record data, and improving the accuracy of similar medical record searching.
Drawings
Fig. 1 is a schematic flow chart of a method for searching similar medical records according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating query graph structure data according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a first class subgraph and a second class subgraph provided in the embodiment of the present invention;
fig. 4 is a schematic flowchart of an alternative embodiment of step S103 in fig. 1 according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a similar medical record searching apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a hardware structure of an apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and the like in the description and in the claims, and in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein.
It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that, in the present invention, "a plurality" means two or more.
It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.
As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.
The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
In a piece of medical record data, the main diagnostic information, a plurality of fields inherent to the medical record, and the contents of the fields are generally included. The field contents are the contents filled in by the patient or doctor on the field names in the standardized medical record, and the field names comprise: gender, age, department, time of visit, type of treatment, medical advice, chief complaints, medical history, physical examination, and auxiliary examination. In the current medical record matching search mode, the full-text information of the medical record or the contents of the fields are searched. For example, if the search is performed using "fever" and "poor breathing" as keywords, all medical records with the two keywords of fever and poor breathing in the pre-stored medical records can be searched. Medical records containing the keyword "fever" in the complaint can also be designated as search results. However, the same complaints may correspond to completely different symptoms and diseases, such as fever due to cold and fever due to allergy, and the treatment and diagnosis are far from each other, and do not have the value of similarity medical record comparison. Therefore, the accuracy of the conventional similar medical record retrieval mode is not high.
In order to solve the problem that the existing similar medical record retrieval mode is not high in accuracy, the embodiment of the invention provides a similar medical record searching method, which includes the steps of constructing query graph structure data with a first class of subgraph and a second class of subgraph and historical graph structure data, wherein the second class of subgraph is obtained by carrying out feature recognition on the first class of subgraph, and finally determining the similarity between the query medical record data and the historical medical record data according to the root node similarity, the first class of subgraph similarity and the second class of subgraph similarity, so that the accuracy of similar medical record searching is improved.
Referring to fig. 1, which is a schematic flowchart of a method for finding a similar medical record according to an embodiment of the present invention, an execution subject of the method shown in fig. 1 may be a software and/or hardware device, for example, may be understood as a server. The method shown in fig. 1 includes steps S101 to S104, which are specifically as follows:
s101, acquiring inquiry medical record data and a plurality of historical medical record data.
For example, when receiving query medical record data that needs to be searched for similar medical records, the server obtains a plurality of historical medical record data. The historical medical record data can be pre-stored in a medical record database or can be acquired from a distributed storage unit. In some embodiments, medical record data related to the query medical record data can be taken as historical medical record data. For example, the server obtains query medical record data; and then determining full-text retrieval keywords according to the query medical record data. The full-text search keyword is, for example, a term extracted from query medical record data, such as "cold", "cough", "XX drug allergy", and the like. And finally, performing medical record full-text search in a medical record library according to the full-text search key to obtain a plurality of historical medical record data containing the full-text search key.
S102, obtaining query graph structure data corresponding to the query medical record data and historical graph structure data corresponding to each historical medical record data, wherein the query graph structure data and the historical graph structure data both comprise a first-class subgraph and a second-class subgraph, a middle node of the first-class subgraph is of a medical record field type, and a middle node and a leaf node of the second-class subgraph are obtained by performing feature recognition on the first-class subgraph.
For example, structured medical record data is constructed by using feature engineering, and graph structure data bodies are respectively generated for the query medical record data and the historical medical record data, so that query graph structure data and historical graph structure data are obtained. The historical map structure data may be obtained by structuring in the same manner as the query map structure data, or may be obtained from the outside in advance and stored in correspondence with the historical medical record data, which is not limited herein. This embodiment takes an example of an acquisition process of query graph structure data as an example.
Fig. 2 is a schematic diagram of query graph structure data according to an embodiment of the present invention. In this embodiment, for example, query graph structure data corresponding to query medical record data is taken, arrows in fig. 2 point to respective parent nodes from child nodes, and a root node is a main diagnosis.
Generally, the medical record data includes three main categories of data including main diagnosis, basic information and basic conditions. The master diagnostics are text type data. The basic information may for example include items that need to be filled in by the doctor: age and sex. The underlying condition may include, for example, items that need to be filled in by a physician: medical orders, physical examinations, types of diagnoses and treatment, chief complaints, auxiliary examinations, department, current medical history, and time of visit.
For example, the primary diagnosis of the query medical record can be used as the description information of the root node. While the intermediate nodes under the root node can be divided into two categories, the first category of intermediate nodes can be the categories of fields in the query case history data, such as basic information and basic conditions, see fig. 2. The second class of intermediate nodes may be pre-set features, such as features that are intended to be identified from the content of the item that the doctor is filling in. As shown in fig. 2, the second class intermediate nodes may include, for example: allergy, surgery, medicine, population attributes, symptoms, disease, signs, test, examination.
For the first-class intermediate node, the medical record field type included in the query medical record may be used as the first-class intermediate node, the fields included in the field type are respectively used as the first-class leaf nodes of the field type, and the description information corresponding to the fields is used as the description information of the first-class leaf nodes. For example, if the field type is "basic information", the "basic information" is used as a first-type intermediate node, and the "age" and the "gender" are respectively used as 2 first-type leaf nodes below the first-type intermediate node. The leaf nodes of the first type intermediate node "basic condition" include 8 first type leaf nodes of "medical orders", "physical examination", "diagnosis and treatment type", "main complaint", "auxiliary examination", "department", "present medical history" and "time of visit", which are shown in the graph structure data shown in fig. 2.
For the second-class intermediate node, a preset feature may be first used as the second-class intermediate node. For example, in the example shown in fig. 2, the preset feature "symptom" is taken as a second-class intermediate node. The second class of intermediate nodes may be pre-defined, features that need to be identified. And the server identifies the fields and the description information of the fields in the query medical record data by the characteristics to obtain the description information of the characteristics, or obtains the description information of the characteristics and the additive factor attribute information or the multiplicative factor attribute information of the description information. For example, NLU processing is performed on the characteristic "symptom" in the fields and the descriptive information of the fields based on the present medical history, the chief complaint, and the like, and the duration (1 month) of cough, the degree of cough (mild) and the fever and the time of fever (the current 24 hours) are obtained. Then, the descriptive information "cough" and "fever" may be taken as the 2 second-class leaf nodes under the second-class intermediate node "symptom". Further, "present" may be used as the factorial attribute information of the descriptive information "cough", and "slight degree" and "lasting for 1 month" may be used as the factorial attribute information of the descriptive information "cough". Likewise, "last 24 hours" may be taken as the factorized attribute information describing the information "heat generation". For example, in the fields and the description information of the fields according to the current medical history, the chief complaints and the like, NLU processing is carried out on the characteristic 'crowd attributes', and the old people are obtained. The descriptive information "old person" may be taken as a second type of leaf node below the second type of intermediate node "crowd property". After determining the description information of the feature, the server may use the description information of the feature as a second type of leaf node of the feature, where the data type of the second type of leaf node having the factorized attribute information or the factorized attribute information is a complex data type.
Finally, query graph structure data corresponding to the query medical record data can be obtained according to the root node, the first type intermediate node, the first type leaf node, the second type intermediate node and the second type leaf node, wherein the root node, the first type intermediate node and the first type leaf node form a first type subgraph; and the root node, the second type intermediate nodes and the second type leaf nodes form a second type subgraph. Fig. 3 is a schematic diagram of a first-class subgraph and a second-class subgraph provided in the embodiment of the present invention. In fig. 3, 2 first class subgraphs are illustrated, as well as 2 second class subgraphs. The root nodes of the first class subgraph and the second class subgraph are both main diagnoses, but the first class intermediate nodes of the first class subgraph are field categories, such as basic information or basic conditions, and the second class intermediate nodes of the second class subgraph are characteristics, such as symptoms or signs. The first-class subgraphs are cut by taking leaf nodes, and each first-class subgraph comprises a single first-class leaf node. While a second class subgraph is split into intermediate nodes, a second class subgraph comprises a single second class intermediate node but may comprise a plurality of second class leaf nodes.
After the query graph structure data is obtained, in order to facilitate subsequent similarity measurement with subgraphs as comparison units, the query graph structure data may be first split into a plurality of first-class subgraphs and a plurality of second-class subgraphs.
S103, obtaining the similarity degree of each historical graph structure data and the query graph structure data according to the similarity degree of each historical graph structure data and the root node in the query graph structure data, the similarity degree of the first class of subgraphs and the similarity degree of the second class of subgraphs.
In the medical record similarity comparison, the main diagnoses added into the nodes are completely different, and the nodes do not belong to similar medical records generally, so that the root node similarity is used as one of comparison bases of the similarity of the historical graph structure data and the query graph structure data, and the accuracy of searching for the similar medical records can be improved.
Specifically, the root node, the first-class subgraph and the second-class subgraph of the historical graph structure data are obtained, the root node, the first-class subgraph and the second-class subgraph of the query graph structure data are obtained, then the root node, the first-class subgraph and the second-class subgraph of the query graph structure data are compared to obtain the root node similarity, the first-class subgraph and the second-class subgraph are compared to obtain the first-class subgraph similarity, and the second-class subgraph are compared to obtain the second-class subgraph similarity.
In order to more clearly describe step S103 (obtaining the similarity between each historical graph structure data and the query graph structure data according to the similarity between each historical graph structure data and the root node in the query graph structure data, the similarity between the first-class subgraph and the second-class subgraph), the following description is given by referring to the accompanying drawings and the specific embodiments.
Fig. 4 is a schematic flowchart of an embodiment provided by the embodiment of the present invention and optional in step S103 in fig. 1. The method shown in fig. 4 includes steps S201 to S204, and specifically includes the following steps:
s201, determining a first similarity value of the historical graph structure data and the query graph structure data according to the similarity of the historical graph structure data and the root node in the query graph structure data.
For example, the root node similarity may be directly used as the first similarity measure value of the historical graph structure data and the query graph structure data.
Since the main diagnosis of the root node is usually text type data, before determining the first similarity value, or before determining step 103 shown in fig. 1, the similarity between each piece of historical graph structure data and the root node in the query graph structure data may be determined according to a preset text type similarity determination model and the description information of the root node. The text-type similarity determination model may be, for example, a similarity determination model corresponding to text-type data shown in table one below.
S202, according to the significance type of the leaf nodes, in the first-type subgraph, a first-type significant subgraph and a first-type non-significant subgraph are determined.
The significance class of the leaf node can be set according to the design application requirement when the actual similarity is matched. Suppose the design application requirements are: "age", "sex", "type of treatment" and "time of treatment" are the conditions for screening the results of similar medical records. For example, a first-class sub-graph including "age", "sex", "time of visit", "type of treatment" may be taken as a first-class significant sub-graph. Wherein, "sex" and "diagnosis and treatment type" are similar medical record matching and screening conditions of boolean data type, such as: similar medical record screening conditions are' sex: male, then similar medical records are searched for matches "" gender: male's result output. Wherein, "age" and "time to visit" are similar medical record screening matching conditions of numerical data type, and a numerical decay function is applied to perform similar medical record matching screening, for example: the similar medical record screening conditions are' age: and the age is 20, the similar medical records are subjected to attenuation matching screening aiming at the age, and if other matching items of the similar medical record search results are basically consistent, the closer the age is to the age of 20, the higher the similarity is. In this embodiment, the first-class sub-graphs including "medical order", "physical examination", "chief complaint", "auxiliary examination", "department", "current medical history" may be used as the first-class non-significant sub-graphs.
S203, according to the similarity of the first type of significant subgraphs in the historical graph structure data and the query graph structure data, determining a second similarity metric value between the historical graph structure data and the query graph structure data.
The history graph structure data and the query graph structure data may include a plurality of first-class significant subgraphs, and each first-class significant subgraph of the history graph structure data may be compared with a corresponding first-class significant subgraph in the query graph structure data to obtain first-class significant subgraph similarity. Then, the server may use a product of the historical graph structure data and each first-type significant subgraph similarity of the query graph structure data as a second similarity metric value of the historical graph structure data and the query graph structure data. Through the product form, the influence degree of each first-class significant subgraph similarity on the second similarity metric value can be improved. For example, if any of the first-class significant subgraph similarities is 0, i.e. the first-class significant subgraphs of the two graph structure data are not correlated at all, the direct value determines the second similarity metric to be 0.
S204, determining a third similarity metric value of the historical graph structure data and the query graph structure data according to the similarity between the historical graph structure data and the first class of non-significant subgraph and the similarity between the second class of subgraph in the query graph structure data.
And when the similarity between the first-class non-significant subgraph and the second-class subgraph in the historical graph structure data and the query graph structure data is obtained, a third similarity metric value can be obtained according to the similarity.
Since the first class of non-significant subgraph and the second class of subgraph generally correspond to specific medical conditions and do not play a decisive role in similarity, in the process of determining the third similarity measure, for example, the similarity between the historical graph structure data and each first class of non-significant subgraph and the sum of the similarities between each second class of subgraph in the query graph structure data can be used as the third similarity measure between the historical graph structure data and the query graph structure data. It is understood that the third similarity measure is embodied as a sum of each non-significant subgraph similarity of the first class and each subgraph similarity of the second class.
In this embodiment, the steps S201, S203, and S204 are not limited by the order described in fig. 4, and the steps S201, S203, and S204 may be performed in other orders or simultaneously, which is not limited herein.
S205, determining the similarity degree of each historical graph structure data and the query graph structure data according to the first similarity metric value, the second similarity metric value and the third similarity metric value of each historical graph structure data and the query graph structure data.
Specifically, the similarity between the history graph structure data and the query graph structure data can be determined according to the following formula I.
Figure BDA0002107216530000091
Wherein, A is I-M;
r (d, q) represents the similarity degree of the historical graph structure data d and the query graph structure data q;
sim is a similarity operator, Sim (v)d, root of,vq, root of) Is a first similarity measure, vd, root ofIs the root node of the historical graph structure data, vq, root ofA root node for querying graph structure data;
Figure BDA0002107216530000101
is a second similarity measure, ui∈M,ujE M represents the set of leaf nodes of the first class of significant subgraphs, thereby defining S in the second similarity metric valued,iFor the ith first-class significant subgraph of the historical graph structure data, S in the second similarity metric valueq,The jth first-class significant subgraph of the query graph structure data is obtained;
Figure BDA0002107216530000102
for the third similarity measure, I is the set of all leaf nodes, ui∈A,ujE A represents leaf nodes other than the set of leaf nodes of the first class of significant subgraphs, thereby defining S in a third similarity metric valued,Is the ith non-significant subgraph of the first class or the subgraph of the second class of the historical graph structure data, and S in the third similarity metric valueq,The method is characterized in that the method is a jth first-class non-significant subgraph or a second-class subgraph of query graph structure data.
In the first formula, the root node is, for example, the main diagnosis shown in fig. 2, and the M set corresponds to the subgraph to which the leaf node belongs, which may be, for example, the first-class significant subgraph to which "age", "gender", "time to visit" and "diagnosis and treatment type" shown in fig. 2 belong; the subgraphs corresponding to the set A can be subgraphs shown in FIG. 2 except the subgraphs of "age", "sex", "time of visit" and "diagnosis type", including subgraphs of the second type and subgraphs of the first type which are not significant, for example.
And S104, according to a preset selection rule and the similarity degree, determining a similar medical record searching result of the inquired medical record data in the historical medical record data, wherein the historical graph structure data corresponding to the similar medical record searching result has the similarity degree meeting the preset selection rule.
For example, the number of historical medical record data in the similar medical record search result output based on the decision maker model control can be used. For example, the historical medical record data are sorted from high similarity to low similarity according to the similarity, and then a plurality of historical medical record data in the sorted order in accordance with the truncation ratio are used as the similar medical record search result according to a preset truncation ratio (for example, 50%). Or according to a preset truncation number, sequentially using the historical medical record data in the top row meeting the truncation number (for example, 5) as a similar medical record search result. The preset selection rule may be number-based or proportion-based, and is not limited herein.
According to the similar medical record searching method provided by the embodiment, the medical record data to be searched and a plurality of historical medical record data are obtained; acquiring query graph structure data corresponding to the query medical record data and historical graph structure data corresponding to each historical medical record data, wherein the query graph structure data and the historical graph structure data both comprise a first-class subgraph and a second-class subgraph, a middle node of the first-class subgraph is of a medical record field type, and a middle node and a leaf node of the second-class subgraph are obtained by performing feature recognition on the first-class subgraph; acquiring the similarity degree of each historical graph structure data and the query graph structure data according to the similarity degree of each historical graph structure data and a root node in the query graph structure data, the similarity degree of a first class of subgraph and the similarity degree of a second class of subgraph; and determining a similar medical record searching result of the inquired medical record data in the plurality of historical medical record data according to a preset selection rule and the similarity degree, thereby extracting an inherent subgraph and an identifiable subgraph in the inquired medical record data, measuring the relevance of the inquired medical record data and the data in the corresponding subgraph in the historical medical record data, and improving the accuracy of similar medical record searching.
Because the forming modes and embodied content types of the first class subgraph and the second class subgraph are different, the similarity of the first class subgraph and the similarity of the second class subgraph can be determined respectively in two different similarity determination modes.
For determining the similarity of the first-class subgraph, determining the similarity of the first-class significant subgraph and the similarity of the first-class non-significant subgraph.
In the above embodiment, it may be understood that, before step S203 (determining a second similarity metric value between the history graph structure data and the query graph structure data according to the similarity between the history graph structure data and the first type significant subgraph in the query graph structure data) and step S204 (determining a third similarity metric value between the history graph structure data and the query graph structure data according to the similarity between the history graph structure data and the first type insignificant subgraph and the second type subgraph in the query graph structure data), a process of calculating the similarity between the first type significant subgraph and the first type insignificant subgraph may also be included.
For example, in the first-class subgraph with the same intermediate node between the historical graph structure data and the query graph structure data, the similarity between the historical graph structure data and the first-class subgraph in the query graph structure data can be obtained according to the similarity between leaf nodes in the first-class subgraph, preset edge weights between root nodes and leaf nodes, and the similarity between root nodes. The similarity of the first-class subgraph comprises the similarity of the first-class significant subgraph and the similarity of the first-class non-significant subgraph, namely a calculation method of the similarity of the first-class significant subgraph and the similarity of the first-class non-significant subgraph.
In some embodiments, the similarity between the history graph structure data and the first-class subgraph in the query graph structure data can be obtained by the following formula two.
Sim(Sd,Sq)=Sim(ud,uq)*weight(uv)*Sim(vd,vq) Formula two
Wherein Sim represents a similarity operator; sdFirst class subgraph, S, representing historian structural dataqA first class of subgraph representing query graph structure data; u. ofdFirst-class subgraph S representing historical graph structure datadLeaf node of uqFirst class subgraph S representing query graph structure dataqA leaf node of (1); weight (uv) represents the preset weight of the edge of the root node and the leaf node in the first class of subgraph; v. ofdFirst class subgraph S representing historical graph structure datadV root node ofqFirst class subgraph S representing query graph structure dataqThe root node of (2).
The first sub-graph in the second formula may be a sub-graph with the leaf nodes of age, sex, medical advice, physical examination, diagnosis and treatment type, chief complaint, auxiliary examination, department, current medical history and time of visit in the graph structure data shown in fig. 2. Wherein, the subgraphs taking age, sex, diagnosis and treatment type and time of treatment as leaf nodes are the first-class significant subgraphs, and the rest are the first-class non-significant subgraphs. The preset weight of the edge of the root node and the leaf node may be a weight preset according to experience of a medical expert.
It can be understood that, before step S204 (determining a third similarity metric between the history graph structure data and the query graph structure data according to the similarity between the history graph structure data and the first class of non-significant subgraph and the second class of subgraph in the query graph structure data), the similarity between the history graph structure data and the second class of subgraph in the query graph structure data may also be obtained. Specifically, the preset edge weight of the root node and the middle node in the second class of subgraph and the statistical edge weight of the leaf node and the middle node in the second class of subgraph may be obtained first. It should be understood that in the historical map structure data, u isd,iLeaf nodes representing subgraphs of the second type, cdRepresents an intermediate node, vdRepresenting the root node, then the relationship of the edge is: u. ofd,cd+cdvd=ud,vd. Edge c of root node and intermediate nodedvdWeighted value, for an intermediate node c belonging to the samedDifferent leaf nodes u ofd,And ud,To say, leaf nodes belonging to the same type, i.e. cdvd=ud,vd-ud,cdAnd cdvd=ud,vd-ud,cdAre the same. So the edges u of the intermediate node and the leaf nodesd,cdWeight and edge u of root node and leaf noded,vdThe weights are positively linearly dependent. For example, the side ud,ivdThe weight is expressed as the correlation statistics of the main diagnosis and the leaf nodes such as symptom A, symptom B and the like, and has medical interpretability, so the edge statistical weight of the leaf node and the middle node is determined according to the correlation statistical result of the root node and the leaf node. For example, the main diagnostics and characterization information is based on a weighted average of mutual information and chi-square statistics. For example, it may be specifically to obtain the intermediate nodes in the second class subgraph firstAt least one leaf node of a lower; and acquiring mutual information between the root node and the at least one leaf node in the second class of subgraph. Then, chi-square statistic values of the root node and the at least one leaf node in the second class of subgraphs are obtained; and finally, taking the weighted sum of the mutual information value corresponding to the intermediate node and the chi-square statistic value as the edge statistic weight of the leaf node and the intermediate node in the second class of subgraph. And then, in the second class subgraph with the same intermediate node of the historical graph structure data and the query graph structure data, acquiring the similarity of the historical graph structure data and the second class subgraph in the query graph structure data according to the similarity of leaf nodes in the second class subgraph, the preset edge weight of the root node and the intermediate node, the statistical edge weight of the leaf nodes and the intermediate node and the similarity of the root node. For example, the similarity calculation is performed on the second-class subgraph with the middle node of the historical graph structural data as the symptom and the second-class subgraph with the middle node of the query graph structural data as the symptom to obtain the similarity of the second-class subgraph with the middle node as the symptom.
For example, the similarity between the history graph structure data and the second-class subgraph in the query graph structure data can be obtained according to the following formula three.
Figure BDA0002107216530000131
Wherein, the second class subgraph of the historical graph structure data is Sd={ud,1,ud,2,…,ud,m,cd,vd},{ud,1cd,ud, 2cd,…,ud,mcd,cdvd}),ud,mFor the mth leaf node of the second type subgraph of the historical graph structure data, cdIntermediate nodes, v, of a second class of subgraphs of the historian structural datadFor the root node, u, of a second class of subgraphs of the histogramming datad,cdAs leaf node ud,And intermediate node cdC is a side ofdvdIs composed ofInter node cdAnd root node vdThe edge of (1); the second class of subgraph for querying the graph structure data is Sq={uq,1,uq,2,…,uq,n,cq,vq},{uq,1cq,uq,2cq,…,uq,ncq,cqvq}),uq,nFor the nth leaf node of the second type subgraph of the query graph structure data, cqFor intermediate nodes of a second class of subgraphs querying graph structure data, vqFor querying the root node, u, of a subgraph of the second type of graph structure dataq,ncqAs leaf node uq,nAnd intermediate node cqC is a side ofqvqAs an intermediate node cqAnd root node vqThe edge of (1);
weight(ud,icd) Representing edge statistical weights of the leaf nodes and the middle nodes in the second class of subgraphs;
weight(cdvd) Representing preset weights of edges of a root node and a middle node in the second class of subgraphs;
operator symbol
Figure BDA0002107216530000132
Wherein, alpha and beta are constants.
Watch 1
Figure BDA0002107216530000133
In the above embodiment of determining the similarity of the first-type subgraph, before obtaining the similarity between the history graph structure data and the first-type subgraph in the query graph structure data, a step of determining the similarity of each leaf node of the first-type subgraph in a corresponding mode according to the data type of the leaf node may be further included. Specifically, the data type of the leaf node of the first-type subgraph in the history graph structure data and the query graph structure data may be obtained. Referring to table one, a similarity determination model with four data types selectable is provided in the embodiment of the present invention. In the similarity determination model corresponding to the complex data type shown in table one, the multiplicative factor attribute set is a set formed by all multiplicative factor attribute information of the node x, and the additive factor attribute set is a set formed by all additive factor attribute information of the node x, wherein the nodes x and y for similarity comparison have multiplicative factor attribute information in one-to-one correspondence and additive factor attribute information in one-to-one correspondence. The data types of the leaf nodes of the first type of subgraph are typically all simple data types, such as numeric data type, boolean data type, and text data type in table one. And acquiring a target similarity determination model corresponding to the data type, wherein the corresponding relation is shown in fig. 1 for example. And then determining a model according to the target similarity and the description information of the historical graph structure data and the leaf nodes in the first class of subgraph in the query graph structure data to obtain the similarity of the historical graph structure data and each leaf node of the first class of subgraph in the query graph structure data. The description information of the leaf nodes in the first class of subgraphs is calculated by using the target similarity determination model, and the similarity of each leaf node of the first class of subgraphs is obtained. The numerical data type is the descriptive information of the fields such as age, visit time, etc. The boolean data type is, for example, description information of a gender field (for example, a male is 1 and a female is 0 are preset). The text data type is, for example, description information of a field such as a chief complaint, a current medical history, and the like. The similarity of each leaf node is calculated by selecting the corresponding target similarity determination model, so that the accuracy of the similarity of the leaf nodes is improved, and the accuracy of the final search structure is further improved.
In the above embodiment of determining the similarity of the second-type subgraph, before the similarity between the historical graph structure data and the second-type subgraph in the query graph structure data is obtained, a step of selecting a corresponding mode to determine the similarity of each leaf node of the second-type subgraph according to the data type of the leaf node may also be included. Specifically, the data type of the leaf node of the second type subgraph in the history graph structure data and the query graph structure data may be obtained. The data type of the leaf node of the second type of subgraph may be any of the four data types in table one. The numeric data type, the boolean data type, and the text data type are simple data types, and are complex data types.
And if the data type is a simple data type, determining a target similarity determination model according to the data type corresponding to the leaf node of the simple data type. And according to the target similarity determination model and the description information of the leaf nodes of the simple data types, obtaining the leaf node similarity of the historical graph structure data and each simple data type of the second class of subgraphs in the query graph structure data. The implementation manner of the similarity calculation of the leaf nodes of the second class of subgraphs of the simple data types is similar to that of the similarity calculation of the leaf nodes of the first class of subgraphs, and is not described herein again.
And if the data type is the complex data type, acquiring multiplier attribute information and additive attribute information corresponding to the leaf node of the complex data type. In a second class of subgraph with "symptom" as a middle node in the query graph structure data, leaf nodes comprise "cough", the multiplier attribute information of the "cough" is "1" (indicating having), and the multiplier attribute information thereof is "severe" or "24 hours", for example. Then, the product of the similarity of each multiplicative attribute information corresponding to the leaf node of the complex data type and the weighted sum of the similarities of each multiplicative attribute information corresponding to the leaf node of the complex data type may be used as the leaf node similarity of each complex data type between the history graph structure data and the second-type subgraph in the query graph structure data. For example, if the multiplier attribute information corresponding to the historical graph structure data is "1" (indicated by "1"), and the factorization attribute information is "light" or "1 hour", for example, the multiplier (1,1) is calculated by using the boolean data type similarity determination model, the factorization attribute information ("severe", "light") is calculated by using the text data type similarity determination model, and the factorization attribute information ("24 hours" or "1 hour") is calculated by using the numerical data type similarity determination model. The description information of each leaf node in the second class of subgraph corresponds to the multiplier attribute information or the additive attribute information, and the corresponding relationship can be preset. And when the description information of each leaf node in the second class of subgraph is detected, obtaining multiplication factor attribute information and addition factor attribute information corresponding to each leaf node in the second class of subgraph according to the record of the description information of each leaf node in the second class of subgraph in a preset corresponding relationship.
In the embodiment of the similar medical record searching method, matching and rearrangement of similar medical records are realized, the degree calculation of the relationship among medical record characteristics, the measurement calculation of each characteristic attribute of the medical records and the like are realized, and therefore the accuracy of searching the similar medical records is improved.
Referring to fig. 5, which is a schematic structural diagram of a similar medical record searching apparatus according to an embodiment of the present invention, the similar medical record searching apparatus 50 shown in fig. 5 includes:
the medical record obtaining module 51 is configured to obtain query medical record data and a plurality of historical medical record data.
The graph structuring module 52 is configured to obtain query graph structure data corresponding to the query medical record data and history graph structure data corresponding to each history medical record data, where the query graph structure data and the history graph structure data both include a first-class subgraph and a second-class subgraph, a middle node of the first-class subgraph is a medical record field type, and a middle node and a leaf node of the second-class subgraph are obtained by performing feature recognition on the first-class subgraph.
And the processing module 53 is configured to obtain a similarity degree between each piece of history graph structure data and the query graph structure data according to the similarity degree between each piece of history graph structure data and the root node in the query graph structure data, the similarity degree between the first type of subgraph and the similarity degree between the second type of subgraph.
A selecting module 54, configured to determine, according to a preset selection rule and the similarity degree, a similar medical record search result of the query medical record data in the plurality of historical medical record data, where the historical graph structure data corresponding to the similar medical record search result has the similarity degree meeting the preset selection rule.
The similar medical record searching apparatus in the embodiment shown in fig. 5 can be correspondingly used to execute the steps executed by the server in the embodiment of the method shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again.
Optionally, the processing module 53 is configured to determine a first similarity metric value between the historical graph structure data and the query graph structure data according to a similarity between the historical graph structure data and a root node in the query graph structure data; according to the significance type of the leaf nodes, in the first-type subgraph, a first-type significant subgraph and a first-type non-significant subgraph are determined; determining a second similarity metric value of the historical graph structure data and the query graph structure data according to the similarity of the historical graph structure data and the first type of significant subgraphs in the query graph structure data; determining a third similarity metric value of the historical graph structural data and the query graph structural data according to the similarity between the historical graph structural data and the first type of non-significant subgraph and the similarity between the second type of subgraph in the query graph structural data; and determining the similarity degree of each historical graph structure data and the query graph structure data according to the first similarity metric value, the second similarity metric value and the third similarity metric value of each historical graph structure data and the query graph structure data.
Optionally, the processing module 53 determines a second similarity metric value between the history graph structure data and the query graph structure data according to the similarity between the history graph structure data and the first type of significant subgraph in the query graph structure data; and before determining a third similarity metric value between the historical graph structure data and the query graph structure data according to the similarity between the historical graph structure data and the first-class non-significant subgraph and the similarity between the historical graph structure data and the second-class subgraph in the query graph structure data, the similarity between the historical graph structure data and the first-class subgraph in the query graph structure data is obtained according to the similarity between leaf nodes in the first-class subgraph, preset edge weights between root nodes and leaf nodes and the similarity between root nodes in the first-class subgraph, wherein the similarity between the historical graph structure data and the first-class sub-graph in the query graph structure data comprises the similarity between the first-class significant subgraph and the similarity between the first-class non-significant subgraph.
Optionally, the processing module 53 is configured to use a product of the historical graph structure data and each first-type significant subgraph similarity of the query graph structure data as a second similarity metric value of the historical graph structure data and the query graph structure data.
Optionally, the processing module 53 is further configured to, before determining a third similarity metric value between the historical graph structure data and the query graph structure data according to the similarity between the historical graph structure data and the first-class non-significant subgraph and the second-class subgraph in the query graph structure data, obtain edge statistical weights of leaf nodes and intermediate nodes in the second-class subgraph; and in the second class of subgraphs of the historical graph structure data and the query graph structure data with the same intermediate nodes, acquiring the similarity between the historical graph structure data and the second class of subgraphs in the query graph structure data according to the similarity between leaf nodes in the second class of subgraphs, the preset weight between the root nodes and the intermediate nodes, the statistical weight between the leaf nodes and the intermediate nodes and the similarity between the root nodes.
Optionally, the processing module 53 is configured to obtain similarity between the history graph structure data and a second class of subgraph in the query graph structure data according to the following formula three:
Figure BDA0002107216530000171
wherein, the second class subgraph of the historical graph structure data is Sd={ud,1,ud,2,…,ud,m,cd,vd},{ud,1cd,ud, 2cd,…,ud,mcd,cdvd}),ud,mFor the mth leaf node of the second type subgraph of the historical graph structure data, cdIntermediate nodes, v, of a second class of subgraphs of the historian structural datadSecond for historical graph structure dataRoot node, u, of class subgraphd,mcdAs leaf node ud,And intermediate node cdC is a side ofdvdAs an intermediate node cdAnd root node vdThe edge of (1); the second class of subgraph for querying the graph structure data is Sq={uq,1,uq,2,…,uq,n,cq,vq},{uq,1cq,uq,2cq,…,uq,ncq,cqvq}),uq,nFor the nth leaf node, c, of the second type of subgraph of the query graph structure dataqFor intermediate nodes of a second class of subgraphs querying graph structure data, vqFor querying the root node, u, of a second class of subgraphs of graph-structured dataq,cqAs leaf node uq,And intermediate node cqC is a side ofqvqAs an intermediate node cqAnd root node vqThe edge of (a);
weight(ud,icd) Representing the edge statistical weight of the leaf node and the middle node in the second class of subgraph;
weight(cdvd) Representing the preset weight of the edges of the root node and the middle node in the second class of subgraph;
operator symbol
Figure BDA0002107216530000181
Wherein, alpha and beta are constants.
Optionally, the processing module 53 is configured to obtain at least one leaf node below the intermediate node in the second class of subgraph; acquiring mutual information between a root node and the at least one leaf node in the second class of subgraph; acquiring chi-square statistic values of a root node and the at least one leaf node in the second class of subgraph; and taking the weighted sum of the mutual information value corresponding to the intermediate node and the chi-square statistic value as the edge statistic weight of the leaf node and the intermediate node in the second class subgraph.
Optionally, the processing module 53 is configured to use a sum of the similarity between the history graph structure data and each of the first class non-significant sub-graphs and the similarity between the history graph structure data and each of the second class sub-graphs in the query graph structure data as a third similarity metric value between the history graph structure data and the query graph structure data.
Optionally, the processing module 53 is configured to, in the first-class subgraph with the same intermediate node as that of the historical graph structure data and the query graph structure data, obtain a similarity between the historical graph structure data and the first-class subgraph in the query graph structure data according to a similarity between a leaf node in the first-class subgraph, an edge preset weight between a root node and the leaf node, and a root node similarity, where the similarity between the first-class subgraph includes the similarity between the first-class significant subgraph and the similarity between the first-class non-significant subgraph, and is further configured to obtain a data type of the leaf node in the first-class subgraph in the historical graph structure data and the query graph structure data; acquiring a target similarity determination model corresponding to the data type; and determining a model according to the target similarity and the description information of the historical graph structure data and the leaf nodes in the first class of subgraphs in the query graph structure data to obtain the similarity of the historical graph structure data and each leaf node of the first class of subgraphs in the query graph structure data.
Optionally, the processing module 53 is further configured to, in the second-class sub-graph having the same intermediate node as the historical graph structure data and the query graph structure data, obtain a data type of a leaf node of the second-class sub-graph in the historical graph structure data and the query graph structure data before obtaining the similarity between the historical graph structure data and the second-class sub-graph in the query graph structure data according to the similarity between the leaf node of the second-class sub-graph, the preset edge weight between the root node and the intermediate node, the statistical edge weight between the leaf node and the intermediate node, and the similarity between the root node and the intermediate node; if the data type is a simple data type, determining a target similarity determination model according to the data type corresponding to the leaf node of the simple data type; determining a model according to the target similarity and the description information of the leaf nodes of the simple data types to obtain the leaf node similarity of the historical graph structure data and each simple data type of the second class of subgraphs in the query graph structure data; if the data type is a complex data type, acquiring multiplier attribute information and additive attribute information corresponding to the leaf node of the complex data type; and taking the product of the similarity of each multiplying factor attribute information corresponding to the leaf node of the complex data type and the product of the weighted sum of the similarity of each adding factor attribute information corresponding to the leaf node of the complex data type as the leaf node similarity of the historical graph structure data and each complex data type of the second class of subgraph in the query graph structure data.
Optionally, the processing module 53 is further configured to determine, according to a preset text type similarity determination model and description information of the root node, a similarity between each of the historical graph structure data and the root node in the query graph structure data before obtaining a similarity between each of the historical graph structure data and the query graph structure data according to the similarity between each of the historical graph structure data and the root node in the query graph structure data, the first class of sub-graph similarity and the second class of sub-graph similarity.
Optionally, the graph structuring module 52 is configured to use the primary diagnosis of the query medical record as description information of a root node; taking medical record field types included in the query medical record as first-type intermediate nodes, and taking fields included in the field types as first-type leaf nodes of the field types respectively; taking the description information corresponding to the field as the description information of the first type of leaf node; taking preset characteristics as a second type intermediate node, and identifying each field and description information of each field by the characteristics to obtain the description information of the characteristics, or obtaining the description information of the characteristics and the additive factor attribute information or multiplicative factor attribute information of the description information; taking the description information of the feature as a second type of leaf node of the feature, wherein the data type of the second type of leaf node with the factorization attribute information or the multiplicative attribute information is a complex data type; obtaining query graph structure data corresponding to the query medical record data according to the root node, the first-class intermediate nodes, the first-class leaf nodes, the second-class intermediate nodes and the second-class leaf nodes, wherein the root node, the first-class intermediate nodes and the first-class leaf nodes form a first-class subgraph; the root node, the second class intermediate nodes and the second class leaf nodes form a second class subgraph.
Optionally, a medical record obtaining module 51, configured to obtain query medical record data; determining full-text retrieval keywords according to the query medical record data; and performing full-text medical record search in a medical record library according to the full-text search key to obtain a plurality of historical medical record data containing the full-text search key.
Referring to fig. 6, which is a schematic diagram of a hardware structure of an apparatus according to an embodiment of the present invention, the apparatus 60 includes: a processor 61, memory 62 and computer programs; wherein
A memory 62 for storing the computer program, which may also be a flash memory (flash). The computer program is, for example, an application program, a functional module, or the like that implements the above method.
And the processor 61 is configured to execute the computer program stored in the memory to implement the steps executed by the server in the similar medical record searching method. Reference may be made in particular to the description relating to the preceding method embodiment.
Alternatively, the memory 62 may be separate or integrated with the processor 61.
When the memory 62 is a device independent of the processor 61, the apparatus may further include:
a bus 63 for connecting the memory 62 and the processor 61. The apparatus of fig. 6 can further include a transmitter (not shown) for sending out the similar medical record search results obtained by the processor 61 for querying the medical record data.
The invention also provides a readable storage medium, wherein a computer program is stored in the readable storage medium, and when the computer program is executed by a processor, the computer program is used for realizing the similar medical record searching method provided by the various embodiments.
The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device. The readable storage medium may be read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and the like.
The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device can read the execution instruction from the readable storage medium, and the at least one processor executes the execution instruction to enable the device to implement the similar medical record searching method provided by the above-mentioned various embodiments.
In the above embodiments of the apparatus, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (16)

1. A method for searching similar medical records is characterized by comprising the following steps:
acquiring query medical record data and a plurality of historical medical record data;
acquiring query graph structure data corresponding to the query medical record data and historical graph structure data corresponding to each historical medical record data, wherein the query graph structure data and the historical graph structure data both comprise a first-class subgraph and a second-class subgraph, a middle node of the first-class subgraph is of a medical record field type, and a middle node and a leaf node of the second-class subgraph are obtained by performing feature recognition on the first-class subgraph;
acquiring the similarity degree of each historical graph structure data and the query graph structure data according to the similarity degree of each historical graph structure data and a root node in the query graph structure data, the similarity degree of a first class of subgraph and the similarity degree of a second class of subgraph; the first-class subgraph similarity is obtained in a first-class subgraph with the same intermediate nodes of the historical graph structure data and the query graph structure data according to the leaf node similarity, the preset weight of the edges of the root node and the leaf node and the root node similarity in the first-class subgraph; the second-class subgraph similarity is obtained in a second-class subgraph with the same intermediate node of the historical graph structure data and the query graph structure data according to leaf node similarity of the second-class subgraph, edge preset weight of a root node and the intermediate node, edge statistical weight of the leaf node and the intermediate node and root node similarity;
and determining similar medical record searching results of the inquired medical record data in the plurality of historical medical record data according to a preset selection rule and the similarity degree, wherein the historical graph structure data corresponding to the similar medical record searching results has the similarity degree meeting the preset selection rule.
2. The method of claim 1, wherein the obtaining the similarity degree between each piece of historical graph structure data and the query graph structure data according to the similarity degree between each piece of historical graph structure data and a root node in the query graph structure data, the similarity degree between a first class subgraph and a second class subgraph comprises:
determining a first similarity value of the historical graph structure data and the query graph structure data according to the similarity of the historical graph structure data and the root node in the query graph structure data;
according to the significance classes of the leaf nodes, in the first class of subgraphs, determining a first class of significant subgraphs and a first class of non-significant subgraphs;
determining a second similarity metric value of the historical graph structure data and the query graph structure data according to the similarity of the historical graph structure data and the first type of significant subgraphs in the query graph structure data;
determining a third similarity metric value of the historical graph structure data and the query graph structure data according to the similarity of the first type of non-significant subgraph and the similarity of the second type of subgraph in the historical graph structure data and the query graph structure data;
and determining the similarity degree of each historical graph structure data and the query graph structure data according to the first similarity metric value, the second similarity metric value and the third similarity metric value of each historical graph structure data and the query graph structure data.
3. The method of claim 2, wherein determining a second similarity metric value between the historical graph structure data and the query graph structure data is based on the similarity between the historical graph structure data and the first class of significant subgraphs in the query graph structure data; and
before determining a third similarity metric value between the historical graph structure data and the query graph structure data according to the similarity between the historical graph structure data and the first non-significant subgraph and the similarity between the historical graph structure data and the second non-significant subgraph in the query graph structure data, the method further comprises:
in the first class of subgraphs of the historical graph structure data and the query graph structure data with the same intermediate nodes, obtaining the similarity between the historical graph structure data and the first class of subgraphs in the query graph structure data according to the similarity between leaf nodes in the first class of subgraphs, preset edge weights between root nodes and leaf nodes and the similarity between the root nodes, wherein the similarity between the first class of subgraphs comprises the similarity between the first class of significant subgraphs and the similarity between the first class of insignificant subgraphs.
4. The method of claim 2 or 3, wherein the determining a second similarity metric value of the historical graph structure data and the query graph structure data according to the similarity of the historical graph structure data and the first class of significant subgraphs in the query graph structure data comprises:
and taking the product of the historical graph structure data and each first type of significant subgraph similarity of the query graph structure data as a second similarity metric value of the historical graph structure data and the query graph structure data.
5. The method of claim 2, wherein before the determining a third similarity metric value between the history graph structure data and the query graph structure data according to the similarity between the history graph structure data and the first non-significant sub-graph and the similarity between the history graph structure data and the second non-significant sub-graph in the query graph structure data, the method further comprises:
acquiring edge statistical weights of the leaf nodes and the middle nodes in the second class of subgraphs;
and in the second class of subgraphs of the historical graph structure data and the query graph structure data with the same intermediate nodes, acquiring the similarity between the historical graph structure data and the second class of subgraphs in the query graph structure data according to the similarity between leaf nodes in the second class of subgraphs, the preset weight between the root nodes and the intermediate nodes, the statistical weight between the leaf nodes and the intermediate nodes and the similarity between the root nodes.
6. The method of claim 5, wherein obtaining the similarity between the historical graph structure data and the second-class subgraph in the query graph structure data according to the similarity between leaf nodes in the second-class subgraph, the preset weight of the edges between the root node and the middle node, the statistical weight of the edges between the leaf nodes and the middle node, and the similarity between the root node and the second-class subgraph comprises:
acquiring the similarity between the history graph structure data and a second class of subgraph in the query graph structure data according to the following formula:
Figure FDA0003570875220000031
wherein, the second class subgraph of the historical graph structure data is Sd=({ud,1,ud,2,...,ud,m,cd,vd},{ud,1cd,ud, 2cd,...,ud,mcd,cdvd}),ud,mFor the mth leaf node of the second type subgraph of the historical graph structure data, cdIntermediate nodes, v, of a second class of subgraphs of the historian structural datadFor the root node, u, of a second class of subgraphs of the histogramming datad,mcdAs leaf node ud,mAnd intermediate node cdC is a side ofdvdAs an intermediate node cdAnd root node vdThe edge of (1); the second class of subgraph for querying the graph structure data is Sq=({uq,1,uq,2,...,uq,n,cq,vq},{uq,1cq,uq,2cq,...,uq,ncq,cqvq}),uq,nFor the nth leaf node, c, of the second type of subgraph of the query graph structure dataqFor intermediate nodes of a second class of subgraphs querying graph structure data, vqFor querying the root node, u, of a second class of subgraphs of graph-structured dataq,ncqAs leaf node uq,nAnd intermediate node cqC is a side ofqvqAs an intermediate node cqAnd root node vqThe edge of (1);
weight(ud,icd) Representing the edge statistical weight of the leaf node and the middle node in the second class of subgraph;
weight(cdvd) Representing the preset weight of the edges of the root node and the middle node in the second class of subgraph;
operator symbol
Figure FDA0003570875220000032
Wherein, alpha and beta are constants.
7. The method of claim 5, wherein the obtaining the edge statistical weights of the leaf nodes and the intermediate nodes in the subgraph of the second class comprises:
acquiring at least one leaf node below the middle node in the second class of subgraph;
acquiring a mutual information value between a root node and the at least one leaf node in the second class of subgraph;
acquiring chi-square statistic values of a root node and the at least one leaf node in the second class of subgraph;
and taking the weighted sum of the mutual information value corresponding to the intermediate node and the chi-square statistic value as the edge statistic weight of the leaf node and the intermediate node in the second class subgraph.
8. The method of claim 2, 3, 5, 6 or 7, wherein the determining a third similarity metric value of the historical graph structure data and the query graph structure data according to the similarity of the historical graph structure data and the first non-significant subgraph and the similarity of the second subgraph in the query graph structure data comprises:
and taking the sum of the similarity of the historical graph structure data and each first class of non-significant subgraph and the similarity of each second class of subgraph in the query graph structure data as a third similarity metric value of the historical graph structure data and the query graph structure data.
9. The method of claim 3, wherein the obtaining the similarity between the history graph structure data and the first-class subgraph in the query graph structure data according to the similarity between leaf nodes, preset edge weights between root nodes and leaf nodes, and the similarity between root nodes in the first-class subgraph, wherein the similarity between the first-class subgraph and the first-class non-significant subgraph further comprises:
acquiring the data types of leaf nodes of the first type subgraph in the historical graph structure data and the query graph structure data;
acquiring a target similarity determination model corresponding to the data type;
and determining a model according to the target similarity and the description information of the historical graph structure data and the leaf nodes in the first class of subgraphs in the query graph structure data to obtain the similarity of the historical graph structure data and each leaf node of the first class of subgraphs in the query graph structure data.
10. The method of claim 5, wherein before obtaining the similarity between the historical graph structure data and the second-class subgraph in the query graph structure data according to the similarity between the leaf nodes in the second-class subgraph, the preset weight between the root node and the middle node, the statistical weight between the leaf nodes and the middle node, and the similarity between the root node and the middle node, in the second-class subgraph having the same middle node as the historical graph structure data, the method further comprises:
acquiring the data types of the leaf nodes of the second class subgraph in the historical graph structure data and the query graph structure data;
if the data type is a simple data type, determining a target similarity determination model according to the data type corresponding to the leaf node of the simple data type; determining a model according to the target similarity and the description information of the leaf nodes of the simple data types to obtain the leaf node similarity of the historical graph structure data and each simple data type of the second class of subgraphs in the query graph structure data;
if the data type is a complex data type, acquiring multiplier attribute information and additive attribute information corresponding to the leaf node of the complex data type; and taking the product of the similarity of each multiplying factor attribute information corresponding to the leaf node of the complex data type and the product of the weighted sum of the similarity of each adding factor attribute information corresponding to the leaf node of the complex data type as the leaf node similarity of the historical graph structure data and each complex data type of the second class of subgraph in the query graph structure data.
11. The method according to claim 1, 9 or 10, wherein before obtaining the similarity degree between each of the historical graph structure data and the query graph structure data according to the similarity degree between each of the historical graph structure data and the root node in the query graph structure data, the similarity degree between the first-class subgraph and the second-class subgraph, the method further comprises:
and determining the similarity of each historical graph structure data and the root node in the query graph structure data according to a preset text type similarity determination model and the description information of the root node.
12. The method according to claim 1, 9 or 10, wherein the obtaining of query graph structure data corresponding to the query medical record data, wherein the query graph structure data and the history graph structure data both include a first-class subgraph and a second-class subgraph, a middle node of the first-class subgraph is a medical record field type, and a middle node and a leaf node of the second-class subgraph are obtained by performing feature recognition on the first-class subgraph, includes:
taking the main diagnosis of the inquiry medical record as the description information of a root node;
taking the medical record field types included in the query medical record as first-type intermediate nodes, and taking the fields included in the field types as first-type leaf nodes of the field types respectively; taking the description information corresponding to the field as the description information of the first type of leaf node;
taking preset characteristics as a second type intermediate node, and identifying each field and description information of each field by the characteristics to obtain the description information of the characteristics, or obtaining the description information of the characteristics and the additive factor attribute information or multiplicative factor attribute information of the description information; taking the description information of the feature as a second type of leaf node of the feature, wherein the data type of the second type of leaf node with the factorization attribute information or the multiplicative attribute information is a complex data type;
obtaining query graph structure data corresponding to the query medical record data according to the root node, the first-class intermediate nodes, the first-class leaf nodes, the second-class intermediate nodes and the second-class leaf nodes, wherein the root node, the first-class intermediate nodes and the first-class leaf nodes form a first-class subgraph; and the root node, the second type intermediate nodes and the second type leaf nodes form a second type subgraph.
13. The method of claim 1, wherein obtaining query medical record data and a plurality of historical medical record data comprises:
acquiring inquiry medical record data;
determining full-text retrieval keywords according to the query medical record data;
and searching the medical record full text in a medical record library according to the full text retrieval key to obtain a plurality of historical medical record data containing the full text retrieval key.
14. A similar medical record searching device is characterized by comprising:
the medical record acquisition module is used for acquiring inquiry medical record data and a plurality of historical medical record data;
the graph structuring module is used for acquiring query graph structure data corresponding to the query medical record data and historical graph structure data corresponding to the historical medical record data, wherein the query graph structure data and the historical graph structure data both comprise a first-class subgraph and a second-class subgraph, a middle node of the first-class subgraph is of a medical record field type, and a middle node and a leaf node of the second-class subgraph are obtained by performing feature recognition on the first-class subgraph;
the processing module is used for acquiring the similarity degree of each historical graph structure data and the query graph structure data according to the similarity degree of each historical graph structure data and a root node in the query graph structure data, the similarity degree of a first class of subgraphs and the similarity degree of a second class of subgraphs; the first-class subgraph similarity is obtained in a first-class subgraph with the same middle node of the historical graph structure data and the query graph structure data according to the similarity of leaf nodes, preset weights of edges of root nodes and leaf nodes and the similarity of the root nodes in the first-class subgraph; the second-class subgraph similarity is obtained in a second-class subgraph of the historical graph structure data and the query graph structure data, wherein the second-class subgraph similarity is the same as the query graph structure data and has the same middle node according to leaf node similarity of the second-class subgraph, preset weight of edges of a root node and the middle node, statistical weight of edges of the leaf node and the middle node and root node similarity;
and the selection module is used for determining a similar medical record searching result of the inquired medical record data in the historical medical record data according to a preset selection rule and the similarity degree, wherein the historical chart structure data corresponding to the similar medical record searching result has the similarity degree meeting the preset selection rule.
15. An electronic device, comprising: the device comprises a memory, a processor and a computer program, wherein the computer program is stored in the memory, and the processor runs the computer program to execute the similar medical record searching method according to any one of claims 1 to 13.
16. A readable storage medium, wherein a computer program is stored in the readable storage medium, and when being executed by a processor, the computer program is used for implementing the computer program to execute the similar medical record searching method according to any one of claims 1 to 13.
CN201910557217.5A 2019-06-25 2019-06-25 Similar medical record searching method, device and equipment and readable storage medium Active CN110299209B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910557217.5A CN110299209B (en) 2019-06-25 2019-06-25 Similar medical record searching method, device and equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910557217.5A CN110299209B (en) 2019-06-25 2019-06-25 Similar medical record searching method, device and equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN110299209A CN110299209A (en) 2019-10-01
CN110299209B true CN110299209B (en) 2022-05-20

Family

ID=68028806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910557217.5A Active CN110299209B (en) 2019-06-25 2019-06-25 Similar medical record searching method, device and equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN110299209B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111370086A (en) * 2020-02-27 2020-07-03 平安国际智慧城市科技股份有限公司 Electronic case detection method, electronic case detection device, computer equipment and storage medium
CN113767401A (en) * 2020-04-03 2021-12-07 清华大学 Network representation learning algorithm across medical data sources
CN111599427B (en) * 2020-05-14 2023-03-31 郑州大学第一附属医院 Recommendation method and device for unified diagnosis, electronic equipment and storage medium
CN111613339B (en) * 2020-05-15 2021-07-09 山东大学 Similar medical record searching method and system based on deep learning
CN111767707B (en) * 2020-06-30 2023-10-31 平安科技(深圳)有限公司 Method, device, equipment and storage medium for detecting Leideogue cases
CN113010746B (en) * 2021-03-19 2023-08-29 厦门大学 Medical record graph sequence retrieval method and system based on sub-tree inverted index
CN113590777A (en) * 2021-06-30 2021-11-02 北京百度网讯科技有限公司 Text information processing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017174406A (en) * 2016-03-24 2017-09-28 富士通株式会社 Healthcare risk estimation system and method
CN108388580A (en) * 2018-01-24 2018-08-10 平安医疗健康管理股份有限公司 Merge the dynamic knowledge collection of illustrative plates update method of medical knowledge and application case
CN109215754A (en) * 2018-09-10 2019-01-15 平安科技(深圳)有限公司 Medical record data processing method, device, computer equipment and storage medium
CN109785968A (en) * 2018-12-27 2019-05-21 东软集团股份有限公司 A kind of event prediction method, apparatus, equipment and program product

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8589400B2 (en) * 2001-11-30 2013-11-19 Intelligent Medical Objects, Inc. Longitudinal electronic record system and method
US20130096944A1 (en) * 2011-10-13 2013-04-18 The Board of Trustees of the Leland Stanford, Junior, University Method and System for Ontology Based Analytics

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017174406A (en) * 2016-03-24 2017-09-28 富士通株式会社 Healthcare risk estimation system and method
CN108388580A (en) * 2018-01-24 2018-08-10 平安医疗健康管理股份有限公司 Merge the dynamic knowledge collection of illustrative plates update method of medical knowledge and application case
CN109215754A (en) * 2018-09-10 2019-01-15 平安科技(深圳)有限公司 Medical record data processing method, device, computer equipment and storage medium
CN109785968A (en) * 2018-12-27 2019-05-21 东软集团股份有限公司 A kind of event prediction method, apparatus, equipment and program product

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于用户意图分析的电子病历检索技术研究;王超;《中国优秀硕士学位论文全文数据库 医药卫生科技辑》;20170815;E053-70 *

Also Published As

Publication number Publication date
CN110299209A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
CN110299209B (en) Similar medical record searching method, device and equipment and readable storage medium
JP6693252B2 (en) Similarity calculation device, side effect determination device and system for calculating similarity of drugs and estimating side effects using the similarity
US20210233658A1 (en) Identifying Relevant Medical Data for Facilitating Accurate Medical Diagnosis
CN113724848A (en) Medical resource recommendation method, device, server and medium based on artificial intelligence
AU2017250467B2 (en) Query optimizer for combined structured and unstructured data records
KR101897080B1 (en) Method and Apparatus for generating association rules between medical words in medical record document
WO2013090413A1 (en) Storing structured and unstructured clinical information for information retrieval
CN112635011A (en) Disease diagnosis method, disease diagnosis system, and readable storage medium
JP6567484B2 (en) Estimated model construction system, estimated model construction method and program
US20200058408A1 (en) Systems, methods, and apparatus for linking family electronic medical records and prediction of medical conditions and health management
Wang et al. Automatic diagnosis with efficient medical case searching based on evolving graphs
CN111091883B (en) Medical text processing method, device, storage medium and equipment
CN113658712A (en) Doctor-patient matching method, device, equipment and storage medium
CN114496140B (en) Data matching method, device, equipment and medium for query conditions
CN111785383A (en) Data processing method and related equipment
US20200051698A1 (en) Precision clinical decision support with data driven approach on multiple medical knowledge modules
CN111696656A (en) Doctor evaluation method and device of Internet medical platform
CN114201598A (en) Text recommendation method and text recommendation device
CN112635072A (en) ICU (intensive care unit) similar case retrieval method and system based on similarity calculation and storage medium
CN110473636B (en) Intelligent medical advice recommendation method and system based on deep learning
CN111640517A (en) Medical record encoding method and device, storage medium and electronic equipment
CN109144999B (en) Data positioning method, device, storage medium and program product
CN113724883B (en) Medical expense prediction method, medical expense prediction device, storage medium and computer equipment
US20150339602A1 (en) System and method for modeling health care costs
CN114882985A (en) Medicine multimedia management system and method based on database and AI algorithm identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant