CN117012375B

CN117012375B - Clinical decision support method and system based on patient topological feature similarity

Info

Publication number: CN117012375B
Application number: CN202311284104.5A
Authority: CN
Inventors: 张阳; 李劲松; 池胜强; 周天舒; 田雨
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2024-03-26
Anticipated expiration: 2043-10-07
Also published as: CN117012375A

Abstract

The invention discloses a clinical decision support method and a system based on the similarity of topological features of patients, wherein the method comprises the following steps: collecting electronic health record data of a patient; preprocessing the electronic health record data to obtain standardized diagnosis and treatment events of each patient; constructing a heterogram of the patient according to the diagnosis and treatment event and the medical knowledge graph of each patient; calculating first-order similarity among patients through first-order shared neighbors, mining high-order similarity among the patients through heterogeneous paths, and fusing the two similarities to obtain the similarity among the patients; and sequencing the similarity to obtain M patients with highest similarity, and carrying out clinical analysis according to the electronic health record data of the M patients to obtain an analysis result, thereby providing guidance for doctors to design treatment schemes. The invention does not need expert knowledge base, can be compatible with different medical fields, avoids long-time training and black box characteristics, is beneficial to saving time cost and enhancing generalization capability and interpretability.

Description

Clinical decision support method and system based on patient topological feature similarity

Technical Field

The invention belongs to the technical field of medical health information, and particularly relates to a clinical decision support method and system based on similarity of topological features of patients.

Background

With the continuous development and popularization of information technology, massive medical large data are generated, and the analogue reasoning of high-dimensional explosive growth data by relying on human brain alone becomes more and more difficult. In clinical practice, doctors often make diagnosis and treatment decisions for patients according to clinical guidelines or clinical experience, and do not conform to personalized treatment strategies in the current accurate medical mode. How to mine electronic medical record information, integrate patient similarity measurement into a clinical decision support system (Clinical Decision Support System, CDSS), and realize a clinical decision support method based on analogy reasoning, which is a research hotspot in recent years.

The electronic medical record data is complex, not only comprises multidimensional data such as demographics, biomarkers and clinical characteristics, but also has complex longitudinal time sequence information, and patients have multiple times of treatment information and different treatment events at different times. Extracting important features from electronic cases for patient similarity measurement is an important method for accurate medical treatment. The general method converts the electronic case information of the patient into vectors through nlp text analysis or network characterization and the like, and then measures the similarity between the vectors through cosine similarity, minkowski distance, jaccard distance and the like to be used as the similarity calculation of the patient. Although the methods well ensure the privacy of patients, the results are unexplained, the generalization capability is poor and the reliability of the model is to be considered.

Clinical decision support systems can be generally divided into two main categories according to technology: (1) rule-based CDSS. Rule-based methods, which originate from early expert systems, are a knowledge base that represents expert experience in a field in the form of rules, encompassing problems and solutions to the problems. The method requires manual definition of a knowledge base, possibly with manual input of patient information by a physician, and then reasoning out a series of diagnostic options according to rules. However, each medical specialty has its own unique thought method, doctors of different levels have different thought hierarchy requirements, and it is impossible to program all diagnostic ideas into a fixed pattern like a computer. The CDSS successfully used for the diagnosis link at present is often limited to a certain field, and is incompatible in different medical fields, and meanwhile, the CDSS needs to be supported by an expert knowledge base. (2) artificial intelligence based CDSS. The artificial intelligence method can automatically learn from medical big data, extract implicit rules and models, and further provide intelligent decision references. Common artificial intelligence methods include machine learning, neural networks, natural language processing, and the like. These methods do not require manual writing of rules and searching for expert input, but mostly rely on existing artificial intelligence methods, and the training process is very time consuming, generalizing poorly and the reliability of the model remains to be checked. Some systems target a single disease, and often do not fully account for the overall condition of the disease in the patient. Second, such models are typically black boxes, with poor interpretability.

The clinical decision system method based on the patient topological feature similarity calculation is characterized in that the patient similarity is calculated through a traditional graph theory calculation method, neural network training and additional expert labeling information are not needed, the clinical decision system method has good interpretability, and the homogeneity and the heterogeneity of different diagnosis and treatment events and the high-order correlation among different patients are fully considered.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art and provides a clinical decision support method and a system based on the similarity of topological features of patients.

The aim of the invention is realized by the following technical scheme: the first aspect of the embodiment of the invention provides a clinical decision support method based on the similarity of topological features of patients, which comprises the following steps:

s1, acquiring electronic health record data of a patient, wherein the electronic health record data comprise basic information, diagnosis record information, diagnosis data during an observation window, laboratory test data, operation and treatment measure data and medication data;

s2, preprocessing the acquired electronic health record data to acquire diagnosis and treatment events of each patient;

s3, taking the patient and the diagnosis and treatment event as nodes, determining the relation between the nodes as edges according to the diagnosis and treatment event of each patient, constructing a two-part graph of the patient and the diagnosis and treatment event, and determining the relation between different diagnosis and treatment events as edges according to the medical knowledge graph on the basis of the two-part graph to construct an abnormal graph;

S4, calculating first-order similarity between two patients according to the constructed topological information of the patient and the bipartite graph of the diagnosis and treatment event; extracting a sub-graph taking the patient as a center according to the heterogeneous graph to calculate the importance of the meta-path, and calculating the high-order similarity between two patients according to the meta-path of the patients; similarity between two patients based on first order similarity and higher order similarity;

s5, sorting the similarity between the two patients, selecting the first M patients with the largest similarity, and providing auxiliary decisions for clinicians by combining the electronic health record data of the M patients.

Further, the step S2 includes the following sub-steps:

s21, performing missing value processing on the electronic health record data according to the diagnosis data in the observation window period;

s22, deleting abnormal value records exceeding a medical range in laboratory test data to form a medical test set;

s23, performing ICD-10-cm coding on historical diagnosis data in the medical history of the patient and diagnosis data in the observation window period, classifying the historical diagnosis data into a basic information set, and forming a medical diagnosis set from the diagnosis data in the current observation window period;

s24, CPT encoding is carried out on the operation and treatment measure data to form a medical operation set;

S25, ATC encoding is carried out on the medication data to form a medication set;

s26, a patient set is formed according to basic information acquisition, and diagnosis and treatment events of each patient in the patient set are acquired according to the basic information set, the medical examination set, the medication set, the medical diagnosis set and the medical operation set.

Further, in the step S3, the patient and the diagnosis and treatment event are taken as nodes, the relationship between the nodes is determined as edges according to the diagnosis and treatment event of each patient, a bipartite graph of the patient and the diagnosis and treatment event is constructed, and the relationship between different diagnosis and treatment events is determined as edges according to the medical knowledge graph on the basis of the bipartite graph, so as to construct a different graph, which specifically includes: firstly, taking basic information, medical examination, medication, medical diagnosis and medical operation in a patient and a diagnosis and treatment event as nodes, and representing different types of nodes by using different shapes, wherein the number of the patient nodes, the basic information nodes, the medical examination nodes, the medication nodes, the medical diagnosis nodes and the medical operation nodes is the total number of the patient, the global basic information, the medical examination, the medication, the medical diagnosis and the medical operation; then traversing the diagnosis and treatment event of each patient, judging whether basic information, medical examination, medication, medical diagnosis or medical operation data exist in the diagnosis and treatment event of the patient, and if the basic information, the medical examination, the medication, the medical diagnosis or the medical operation data exist in the diagnosis and treatment event of the patient, establishing a corresponding type of connection side; otherwise, the connecting edge is not established; and no connection edge is established between patient nodes; and finally, constructing association edges between every two nodes of the basic information node, the medical examination node, the medication node, the medical diagnosis node and the medical operation node according to the medical knowledge graph.

Further, the medical knowledge graph includes UMLS, knomed ct, and CUMLS.

Further, in the step S4, the calculating the first-order similarity between the two patients according to the constructed topological information of the two graphs of the patient and the diagnosis and treatment event specifically includes: calculating authority ranking scores of patients and diagnosis and treatment events according to topology information of different compositions; fusing authoritative ranking scores of patients and diagnosis and treatment events with the transition probabilities of the nodes to obtain the transition probabilities of the fused nodes; and extracting first-order neighbor nodes of the patient nodes according to the constructed topological information of the heterograms, and iteratively calculating first-order similarity between two patients by introducing an evidence matrix and combining the transition probability of the fused nodes.

Further, the expressions of the authoritative ranking scores of the patients and the diagnosis and treatment events are respectively:

；

wherein,authoritative ranking score representing different medical events, < ->Authoritative ranking score representing diagnosis and treatment event a, < ->An authoritative ranking score representing a diagnostic event i; />An authoritative ranking score representing the patient; />Representing the super-ginseng; />Inverse number of times representing occurrence of diagnosis and treatment event a corresponding to all patients, +. >Indicating the number of times all diagnostic events co-occur with diagnostic event a,/->Indicating patient->The number of occurrences of all diagnostic events,，/>、/>、/>、/>、/>、/>the number of categories and the total number of patients representing global basic information, medical tests, medications, medical diagnoses and medical procedures, respectively,/->Connecting edges between nodes representing different diagnosis and treatment events, < >>And->Representing the connection side of the patient node and the diagnosis and treatment event node, < ->And->Has symmetry;

the expression of the first order similarity between the two patients is:

；

wherein,representing a first order similarity between patient i and patient j at iteration round t; />Event set for patient i in diagnosis and treatmentFirst-order neighbor node set->For patient j in the first-order neighbor node set in the diagnosis and treatment event set, a and b respectively represent the set +.>、/>Subscripts of any one of the diagnosis and treatment events are respectively in the value range of +.>And，/>elements in the first order neighbor node representing patient i,/->Representing elements in the first-order neighbor node of patient j; when (when)When (I)>Initialized to 1, if->And->One is empty, then ∈>The method comprises the steps of carrying out a first treatment on the surface of the C is an attenuation factor; v is a bipartite graph node set; t is the current iteration round number; />In order to introduce the evidence matrix E,，/>representing the number of commonly connected edges; / >Representing diagnosis and treatment event->Probability of transition to patient i; />Representing diagnosis and treatment event->Probability of transition to patient j; />Representing patient i->Representing patient j.

Further, in the step S4, the step of extracting the patient-centered sub-graph according to the isomerism graph to calculate the importance of the meta-path, and calculating the high-order similarity between the two patients according to the meta-path of the patients specifically includes: processing the heterogeneous graph by taking a patient as a center, extracting a K-order sub-graph of the patient, and counting the occurrence times of each element path in each K-order sub-graph to obtain the importance of each element path, wherein the K value is determined according to the length of the element path; calculating a transition probability between two adjacent patients based on the meta-path; a high-order similarity between two patients is calculated based on the probability of transition between two adjacent patients based on the meta-path and the importance of the meta-path.

Further, the calculating of the transition probability between two adjacent patients based on the meta-path specifically includes: and determining the transition probability from the patient node to the next patient node by calculating the product of the transition probabilities of each node in the meta-path to the next node, taking the transition probability as the weight of the edge, and calculating all weight sums of the patient passing through the first-order edge of the meta-path according to the weight so as to acquire the transition probability between two adjacent patients based on the meta-path.

A second aspect of the present invention provides a clinical decision support system based on patient topological feature similarity, for implementing the above clinical decision support method based on patient topological feature similarity, the system comprising:

the data acquisition module is used for acquiring electronic health record data of a patient; wherein the electronic health record data includes basic information, visit record information, diagnostic data during a viewing window, laboratory test data, surgical and therapeutic measure data, and medication data;

the data preprocessing module is used for preprocessing the electronic health record data acquired by the data acquisition module to acquire diagnosis and treatment events of each patient; wherein the diagnosis and treatment event comprises a basic information set, a medical examination set, a medication set, a medical diagnosis set and a medical operation set;

the patient diagram structure construction module is used for constructing a heterogeneous diagram of a patient, wherein the heterogeneous diagram takes the patient and the diagnosis and treatment events as nodes, determines the relation between the nodes as edges according to the diagnosis and treatment events of each patient, constructs a two-part diagram of the patient and the diagnosis and treatment events, and determines the relation between different diagnosis and treatment events as edges according to the medical knowledge graph on the basis of the two-part diagram so as to construct a heterogram;

The patient similarity calculation module is used for calculating first-order similarity between two patients according to the constructed topological information of the two graphs of the patient and the diagnosis and treatment event; extracting a sub-graph taking the patient as a center according to the heterogeneous graph to calculate the importance of the meta-path, and calculating the high-order similarity between two patients according to the meta-path of the patients; similarity between two patients based on first order similarity and higher order similarity; and

and the auxiliary clinical decision module is used for sequencing the similarity between the two patients, selecting the first M patients with the maximum similarity, and providing auxiliary decisions for clinicians by combining the electronic health record data of the M patients.

Further, the patient similarity calculation module comprises a first-order similarity calculation module and a high-order similarity calculation module;

the first-order similarity calculation module is used for calculating authority ranking scores of patients and diagnosis and treatment events according to topology information of different compositions; fusing authoritative ranking scores of patients and diagnosis and treatment events with the transition probabilities of the nodes to obtain the transition probabilities of the fused nodes; extracting first-order neighbor nodes of the patient nodes according to the constructed topological information of the heterograms, and iteratively calculating first-order similarity between two patients by introducing an evidence matrix and combining the transition probability of the fused nodes;

The high-order similarity calculation module is used for processing the heterogeneous graph by taking a patient as a center, extracting a K-order sub-graph of the patient, and counting the occurrence times of each element path in each K-order sub-graph to obtain the importance of each element path, wherein the K value is determined according to the length of the element path; calculating a transition probability between two adjacent patients based on the meta-path; a high-order similarity between two patients is calculated based on the probability of transition between two adjacent patients based on the meta-path and the importance of the meta-path.

The method for constructing the patient heterogeneous bipartite graph has the advantages that the patient EHR data can be constructed into a patient graph structure through a series of standardized codes and the prior knowledge graph; according to the method, first-order similarity among different patients is calculated through first-order neighbor nodes of the patients, and co-occurrence information among the different patients is obtained; then according to the higher-order information of different patients connected through different element paths, calculating implicit association information among different patients, and obtaining the topological similarity of the patients by combining the lower order with the higher order; the invention does not need expert knowledge base, can be compatible with different medical fields, has strong applicability, avoids long-time training and black box characteristics of the neural network, is beneficial to saving time cost, is beneficial to enhancing generalization capability and interpretability, improves reliability, screens and analyzes K patient information similar to patients by a graph theory method, and assists doctors in making on-site decisions.

Drawings

FIG. 1 is a flow chart of a clinical decision support method based on patient topological feature similarity of the present invention;

FIG. 2 is a schematic diagram of the structure of an iso-pattern constructed in the present invention;

FIG. 3 is a flow chart of a patient first order similarity calculation in accordance with the present invention;

FIG. 4 is a schematic representation of similarity measures for patient 1 of the present invention;

fig. 5 is a block diagram of the clinical decision support system based on similarity of topological features of the patient of the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The present invention will be described in detail with reference to the accompanying drawings. The features of the examples and embodiments described below may be combined with each other without conflict.

Referring to fig. 1, the clinical decision support method based on the similarity of topological features of patients of the present invention comprises the steps of:

s1, collecting electronic health record data of a patient, wherein the electronic health record data comprise basic information, diagnosis record information, diagnosis data during an observation window, laboratory test data, operation and treatment measure data, medication data and the like.

In particular, electronic Health Record (EHR) data such as patient basic information, visit record information, diagnostic data during the viewing window, laboratory test data, surgical and therapeutic data, and medication data may be collected from an electronic health record system (Electronic Health Record, EHR).

S2, preprocessing the acquired electronic health record data to acquire diagnosis and treatment events of each patient.

S21, performing missing value processing on the electronic health record data according to the diagnosis data in the observation window period.

Specifically, the patient record with the diagnostic deficiency is deleted, i.e., if the diagnostic data during the observation window is missing in the patient's electronic health record data, the patient's electronic health record data is deleted.

S22, deleting abnormal value records exceeding the medical range in laboratory test data to form a medical test set L.

It should be understood that the medical ranges of laboratory test data are different, each having a respective standard, and that exceeding this range indicates that the laboratory test data corresponding to the index is outlier. For example, in blood routine testing, the medical range of adult male Red Blood Cells (RBC) is (4.0-5.5) x 1012/L, the medical range of adult female red blood cells is (3.5-5.0) x 1012/L, and when the medical range is exceeded, the corresponding electronic health record data is deleted, and the remaining laboratory test data forms a medical test set L.

S23, carrying out ICD-10-cm (International Classification of Diseases, ICD, international disease classification) coding on the historical diagnosis data in the patient history and the diagnosis data in the observation window, classifying the historical diagnosis data into a basic information set B, and forming a medical diagnosis set D from the diagnosis data in the current observation window.

It should be appreciated that the collection of the patient's electronic health record data is performed in time sequence, and that when the diagnostic data is encoded, the patient's corresponding historical diagnostic data is queried based on the patient's basic information and the patient record information, wherein the historical diagnostic data is diagnostic data during the observation window in the historical time sequence.

It is readily understood that the basic information set B contains basic information and historical diagnostic data.

S24, CPT (Current Procedural Terminology, currently using medical procedure terminology) encoding is carried out on the operation and treatment measure data to form a medical operation set H.

And S25, coding the medication data by ATC (Anatomical Therapeutic Chemical, an anatomic, therapeutic and chemical classification system of the medicines) to form a medication set M.

It should be appreciated that ICD-10-cm coding, CPT coding and ATC coding are common coding methods in medicine.

S26, a patient set P is formed according to basic information acquisition, and diagnosis and treatment events Y of each patient in the patient set P are acquired according to the basic information set B, the medical examination set L, the medicine administration set M, the medical diagnosis set D and the medical operation set H.

Specifically, a patient set P is formed from basic information acquisition, the patient set P including a plurality of patients, expressed as Wherein->Representing the ith patient; the diagnosis and treatment event Y of each patient comprises a basic information set B and a medical examination setFive medical data sets, L, M, D and H, are expressed asThe method comprises the steps of carrying out a first treatment on the surface of the The diagnosis and treatment event of the ith patient is expressed as +.>Wherein，/>，/>，/>，T represents the maximum index of the corresponding category data that the patient has. It should be understood that the value of T will vary from category to category and patient to patient.

S3, taking the patient and the diagnosis and treatment event as nodes, determining the relation between the nodes as edges according to the diagnosis and treatment event of each patient, constructing a two-part graph of the patient and the diagnosis and treatment event, and determining the relation between different diagnosis and treatment events as edges according to the medical knowledge graph on the basis of the two-part graph to construct the heterogram.

In this embodiment, a patient and a diagnosis and treatment event are taken as nodes, a relationship between the nodes is determined as edges according to the diagnosis and treatment event of each patient, a bipartite graph of the patient and the diagnosis and treatment event is constructed, and a relationship between different diagnosis and treatment events is determined as edges according to a medical knowledge graph on the basis of the bipartite graph, so as to construct an abnormal graph, which specifically includes: firstly, taking basic information B, medical examination L, medicine M, medical diagnosis D and medical operation H in a patient P and a diagnosis event Y as nodes, and using different shapes to represent different kinds of nodes, wherein the number of the patient P node, the basic information B node, the medical examination L node, the medicine M node, the medical diagnosis D node and the medical operation H node is the total number of patients, the global basic information, the medical examination, the medicine, the medical diagnosis and the medical operation; then traversing the diagnosis and treatment event Y of each patient, judging whether basic information B, medical examination L, medicine M, medical diagnosis D or medical operation H data exist in the diagnosis and treatment event Y of the patient, and if the basic information B, the medical examination L, the medicine M, the medical diagnosis D or the medical operation H data exist in the diagnosis and treatment event Y of the patient, establishing a corresponding type of connection side; otherwise, the connecting edge is not established; the patient P nodes are not connected with each other, and the two patient P nodes are not connected with each other; and finally, constructing association edges between every two nodes of basic information B node, medical examination L node, medication M node, medical diagnosis D node and medical operation H node according to the medical knowledge graph.

Specifically, according to the diagnosis and treatment event of each patient, constructing an abnormal pattern of two nodes of the patient P and the diagnosis and treatment event Y, taking the patient P and the diagnosis and treatment event Y as nodes in the abnormal pattern, wherein the diagnosis and treatment event Y can be subdivided into five nodes of basic information B, medical examination L, medicine M, medical diagnosis D and medical operation H, each node can be represented by different shapes, as shown in figure 2, a circle represents the patient node, a hexagon represents the basic information B node, a diamond represents the medical examination L node, a square represents the medicine M node, a triangle represents the medical diagnosis D node, a trapezoid represents the medical operation H node, the number of the patient P node, the basic information B node, the medical examination L node, the medicine M node, the medical diagnosis D node and the medical operation H node are the total number of patients, the global basic information, the medical examination, the medicine, the kind number of medical diagnosis and the medical operation H node respectively,、/>、/>、/>、/>、/>the total number of nodes representing the total number of patients, global basic information, the kinds of medical examination, medication, medical diagnosis and medical operation, respectively, of the diagnosis and treatment event Y in the heterogram is +.>. Traversing the diagnosis and treatment event Y of the patient, if basic information B, medical examination L, medicine M, medical diagnosis D or medical operation H data exist in the diagnosis and treatment event Y of the patient, establishing corresponding types of edges for the P node of the patient and the node in the diagnosis and treatment event Y, and using >Representing, for example, as shown in fig. 2, patient +.>Is about diagnosis and treatment event>Basic information +.>And->Then at patient node P ₁ And base information node B ₁ Establishing a connecting edge between the two nodes, at the patient node P ₁ And base information node B ₂ A connecting edge is established between the two parts of the images; for another example, patient->Is about diagnosis and treatment event>There is a medical test->But there is no medical examination +.>Then only at patient node P ₁ And a medical examination node L ₁ Establishes a connecting edge between the two, and the patient node P ₁ And a medical examination node L ₂ There is no connecting edge between them, and similarly, the two-part diagram is finally constructed. No connection edge is established between the patient P nodes, i.e. no connection edge is established between two patient P nodes. Besides, on the basis of the bipartite graph, constructing the association edges between every two nodes of basic information B node, medical examination L node, medication M node, medical diagnosis D node and medical operation H node according to the medical knowledge graph, and using +.>Denoted xy equal B, L, M, D, H, respectively. Thus, the embodiment constructs a heterogeneous graph with 6 node types and 21 edge types according to the electronic health record data of the patients, and the patients are not connected with one another.

Further, the medical knowledge graph includes UMLS, knomed ct, CUMLS, etc., where UMLS and knomed ct are medical knowledge graphs commonly used abroad, and CUMLS is a medical knowledge graph commonly used domestically.

In this embodiment, the medical knowledge graph knomed ct is used to construct the correlation edges between different diagnosis and treatment events Y. It should be understood that the medical knowledge graph knomed ct covers most of clinical information, such as 19 clinical contents of human body structure, clinical findings, clinical operations, events, medicines, etc., and includes entities, descriptions and relationships, and the relationships between nodes can be obtained according to the medical knowledge graph.

In this embodiment, the constructed iso-pattern may be represented by a block adjacency matrix W as shown in formula (1). Wherein,initially 0, i.e. there are no connecting edges between patient nodes; />And->The connection edge of the patient node and the diagnosis and treatment event node Y=contact (B, L, M, D, H) has symmetry; />The connection edges among different diagnosis and treatment event nodes are also formed by different block adjacency matrixes, belong to directed edges and have no symmetry, and are shown in a formula (2):

（1）

（2）

s4, calculating first-order similarity between two patients according to the constructed topological information of the patient and the bipartite graph of the diagnosis and treatment event; extracting a sub-graph taking the patient as a center according to the heterogeneous graph to calculate the importance of the meta-path, and calculating the high-order similarity between two patients according to the meta-path of the patients; similarity between two patients is based on first order similarity and higher order similarity.

In this embodiment, first-order similarity between two patients is calculated according to the constructed topological information of the two graphs of the patient and the diagnosis and treatment event, as shown in fig. 3, and specifically includes: calculating authority ranking scores of patients and diagnosis and treatment events according to topology information of different compositions; fusing authoritative ranking scores of patients and diagnosis and treatment events with the transition probabilities of the nodes to obtain the transition probabilities of the fused nodes; and extracting first-order neighbor nodes of the patient nodes according to the constructed topological information of the heterograms, and iteratively calculating first-order similarity between two patients by introducing an evidence matrix and combining the transition probability of the fused nodes.

In this embodiment, the expression of the first order similarity between two patients is:

（3）

wherein,representing a first order similarity between patient i and patient j at iteration round t; />First-order neighbor node set in diagnosis and treatment event set for patient i,>for patient j in the first-order neighbor node set in the diagnosis and treatment event set, a and b respectively represent the set +.>、/>Subscripts of any one of the diagnosis and treatment events are respectively in the value range of +.>And，/>elements in the first order neighbor node representing patient i,/->Representing elements in the first-order neighbor node of patient j; when- >When (I)>Initialized to 1, if->And->One is empty, then ∈>The method comprises the steps of carrying out a first treatment on the surface of the C is an attenuation factor; v is a bipartite graph node set; t is the current iteration round number; />In order to introduce the evidence matrix E,，/>representing the number of commonly connected edges, the more edges two patients are commonly connected to, the more similar the two patients are; />Representing diagnosis and treatment event->Probability of transition to patient i, if diagnosis and treatment event ∈>And patient i have a connecting edge, then +.>Otherwise->The method comprises the steps of carrying out a first treatment on the surface of the Similarly, a->Representing diagnosis and treatment event->Probability of transition to patient j, if diagnosis and treatment event ∈>And patient j have a connecting side +.>Otherwise->；/>Representing patient i->Representing patient j.

It should be appreciated that the similarity between two patients is measured by calculating common neighbor nodes between the two patients, i.e., the more common neighbor nodes between the two patients, the more similar the patient is, the greater the similarity between the two patients is. If two patients are similar, the medical events associated with the two patients are also similar; if two medical events are similar, the patients associated with the two medical events are also similar. The two-part graph is that nodes in the graph can be divided into two subsets, and two end points of any side in the graph are respectively sourced from the two subsets, and no side connection exists inside the subsets of the two-part graph.

Further, the expressions of the authoritative ranking scores of the patients and the medical events are respectively:

（4）

wherein,authoritative ranking scores representing different diagnosis and treatment events are initialized to 1, and the larger the value is, the more important the diagnosis and treatment event is; />Authoritative ranking score representing diagnosis and treatment event a, < ->An authoritative ranking score representing a diagnostic event i; />The representation of the super-parameters can be set manually; />Representing the reciprocal of the occurrence times of diagnosis and treatment events a corresponding to all patients, wherein the more the occurrence times of the diagnosis and treatment events are, the less important the carried information is; />Representing the number of times that all diagnosis and treatment events and diagnosis and treatment event a co-occur; />An authoritative ranking score representing a patient, initialized to 1, a larger value representing more episodes of treatment performed, a greater likelihood of the patient being ill; />Indicating patient->An authoritative ranking score for (1); />Indicating patient->The number of occurrences of all diagnostic events. />，/>、/>、/>、/>、、/>The number of categories and the total number of patients representing global basic information, medical tests, medications, medical diagnoses and medical procedures, respectively,/->Connecting edges between nodes representing different diagnosis and treatment events, < >>And->Representing the connection side of the patient node and the diagnosis and treatment event node, < ->And->Has symmetry.

It should be noted that, because of the importance of different medical events to the decision of a doctor, for example, most patients may be bordered by all necessary basic information and routine examinations, but such general information is difficult to assist the decision of a doctor. If the importance of the diagnosis and treatment event is not distinguished, the calculation usually focuses on some general information, and some information with small data volume and importance is ignored. In a practical scenario, some unique events are generally characterized by importance, for example, diagnostic decisions of acute kidney injury often depend on elevated serum creatinine and reduced urine volume, while other information is only auxiliary information. The more unique information may have a higher decision-making, the more patients with more decision-making information are ill. The higher the occurrence frequency of the diagnosis and treatment event is, the more universal the information is; the lower the occurrence frequency of the diagnosis and treatment event is, the stronger the uniqueness of the information is, and the more important is. Meanwhile, patients with more unique information are more concerned. To this end, patients are introduced as well as an authoritative ranking score for the medical event to distinguish the importance of the medical event from the patient.

Further, the authoritative ranking scores of the patients and the diagnosis and treatment events are fused with the transition probabilities of the nodes to obtain the transition probabilities of the fused nodes, which are expressed as I.e.Wherein->Representing diagnosis and treatment event->Probability of transition to patient node i; the transition probability of the node after fusion is shown on the left side of the arrow, the transition probability of the node before fusion is shown on the right side of the arrow, and before fusion, if diagnosis and treatment are carried out +.>And patient i have a connecting edge, then +.>Otherwise->。

And then introducing an evidence matrix, iteratively calculating the first-order similarity of the patient, presetting the iteration round number, and continuously carrying out iterative calculation to obtain the first-order similarity of the patient if the current iteration round number is smaller than the preset iteration round number, as shown in fig. 3. Further transforming equation (3) into a distributed matrix representation, i.e. the first order similarity of the patient is represented using a distributed matrix:

（5）

wherein I is an identity matrix,the function being to take all elements on the main diagonal of the matrixThe elements constitute a vector of values,is to convert the vector into a diagonal matrix.

The first-order similarity matrix calculation formula of the patient can be generalized asBy introducing the evidence matrix E, the first-order similarity of the patient is obtained by continuous iterative calculation>。

In this embodiment, the method includes extracting a patient-centered sub-graph according to a heterogeneous graph, and calculating a high-order similarity between two patients according to a meta-path therein, including: processing the heterogeneous graph by taking a patient as a center, extracting a K-order sub-graph of the patient, and counting the occurrence times of each element path in each K-order sub-graph to obtain the importance of each element path, wherein the K value is determined according to the length of the element path; calculating a transition probability between two adjacent patients based on the meta-path; a high-order similarity between two patients is calculated based on the probability of transition between two adjacent patients based on the meta-path and the importance of the meta-path.

It should be noted that the meta-path is defined in the network mode of the graphThe upper path is represented by the following structure:it defines +.>To->A group of coincidence relations between->. Wherein->To->May be referred to as a neighbor node based on this meta-path, the length of which is determined by the number of edges it contains, e.g. a meta-path representation of length 2 +.>. As shown in FIG. 4, only patients 1 and 2 can be mined by +.>Patient 1 and patient 3 are not associated at this time, but if higher order implicit associations are mined by the meta-path, patient 1 and patient 3 are based on the meta-path +.>Is a neighbor node of (a).

It should be understood that different meta-paths have different ratios in the graph and that the information contained therein is different. Therefore, in this embodiment, the patient similarity is aided by calculating the importance of the corresponding meta-path in the heterogeneous map.

Specifically, for all patient nodes in the heterogram, taking the patient node as a center, extracting a K-level sub-graph of the patient, counting the occurrence times of each element path in each K-level sub-graph, and dividing the occurrence times of each element path by the total number of element paths to represent the importance of each element path. Wherein the K value is determined according to the length of the meta-path; preferably, the K value is equal to or greater than the length of the longest meta-path.

Further, the importance of the meta path is expressed as:

（6）

wherein,representing meta-path->Importance of (2); />Representing meta-path->Summation of the number of occurrences in each K-level subgraph; />Representing the sum of the number of occurrences of all meta-paths. />Representing a defined set of meta-paths,>representing the number of element paths; k denotes the kth meta-path.

In this embodiment, calculating the transition probability between two adjacent patients based on the meta-path specifically includes: and determining the transition probability from the patient node to the next patient node by calculating the product of the transition probabilities of each node in the meta-path to the next node, taking the transition probability as the weight of the edge, and calculating all weight sums of the patient passing through the first-order edge of the meta-path according to the weight so as to acquire the transition probability between two adjacent patients based on the meta-path.

Specifically, based on meta-pathsIs->To->The transition probability calculation formula of (2) is expressed as: />

（7）

Wherein,is based on meta-path->Is->To->L represents meta-path +.>The number of relationships in>Indicating patient->And->Relationship between->The number of occurrences>Indicating patient->Is->Relationship of->Number of occurrences. Determining patient +. >To->And takes this as the weight of the edge. Due to the function->This formula is asymmetric, i.e. node +.>To->And->To->Since the calculation results obtained by this formula are different, in order to use the end point information in the propagation path, and in consideration of symmetry, formula (7) is modified as follows:

（8）

further, the patientAnd patient->The expression of the higher order similarity of (2) is:

（9）

further, the expression of similarity between two patients is:

（10）

wherein,for patients->And patient->Similarity between; />For patients->And patient->The first-order similarity between the two patients is finally obtained through iterative computation by the formula (3), and the transition probability of the nodes is fused in the iterative computation process; />For patients->And patient->The higher-order similarity is calculated by a formula (9); />、/>All are super ginseng.

Specifically, for a clinical patient, preprocessing EHR data of the patient to obtain standardized EHR data, adding the standardized EHR data into an abnormal pattern, calculating the similarity between the patient and other patients, sorting the calculated similarity in descending order or ascending order according to the size of the patient, selecting the first M patients with the maximum similarity, and providing auxiliary decisions for clinicians by combining diagnosis and prognosis conditions in the electronic health record data of the M patients. Where the value of M can be set by the doctor himself.

It is worth mentioning that the embodiment of the invention also provides a clinical decision support system based on the similarity of topological features of patients.

Referring to fig. 5, the clinical decision support system based on the similarity of topological features of the patient of the invention comprises a data acquisition module, a data preprocessing module, a patient diagram structure building module, a patient similarity calculation module and an auxiliary clinical decision module.

In this embodiment, the data acquisition module is configured to acquire Electronic Health Record (EHR) data of a patient; among other things, electronic Health Record (EHR) data includes basic information, visit record information, diagnostic data during the viewing window, laboratory test data, surgical and therapeutic measure data, medication data, and the like.

In this embodiment, the data preprocessing module is configured to preprocess the electronic health record data acquired by the data acquisition module, so as to obtain a diagnosis and treatment event of each patient. The diagnosis and treatment events comprise a basic information set B, a medical examination set L, a medication set M, a medical diagnosis set D and a medical operation set H.

In this embodiment, the patient map structure building module is configured to build a heterogeneous map of a patient, where the heterogeneous map uses the patient and the diagnosis and treatment event as nodes, determines a relationship between the nodes as edges according to the diagnosis and treatment event of each patient, builds a bipartite map of the patient and the diagnosis and treatment event, and determines a relationship between different diagnosis and treatment events as edges according to a medical knowledge graph on the basis of the bipartite map.

In this embodiment, the patient similarity calculation module is configured to calculate a first-order similarity between two patients according to the constructed topology information of the patient and the bipartite graph of the diagnosis and treatment event; extracting a sub-graph taking the patient as a center according to the heterogeneous graph to calculate the importance of the meta-path, and calculating the high-order similarity between two patients according to the meta-path of the patients; similarity between two patients is based on first order similarity and higher order similarity.

Further, the patient similarity calculation module includes a first order similarity calculation module and a higher order similarity calculation module. The first-order similarity calculation module is used for calculating authority ranking scores of the patients and the diagnosis and treatment events according to topology information of the heterograms; fusing authoritative ranking scores of patients and diagnosis and treatment events with the transition probabilities of the nodes to obtain the transition probabilities of the fused nodes; and extracting first-order neighbor nodes of the patient nodes according to the constructed topological information of the heterograms, and iteratively calculating first-order similarity between two patients by introducing an evidence matrix and combining the transition probability of the fused nodes. The high-order similarity calculation module is used for processing the heterogeneous graph by taking a patient as a center, extracting a K-order sub-graph of the patient, and counting the occurrence times of each element path in each K-order sub-graph to obtain the importance of each element path, wherein the K value is determined according to the length of the element path; calculating a transition probability between two adjacent patients based on the meta-path; a high-order similarity between two patients is calculated based on the probability of transition between two adjacent patients based on the meta-path and the importance of the meta-path.

In this embodiment, the auxiliary clinical decision module is configured to sort the similarity between two patients, select the first M patients with the largest similarity, and provide auxiliary decisions for the clinician by combining the electronic health record data of the M patients.

Specifically, for a certain clinical patient, preprocessing the EHR data of the patient through a data preprocessing module to obtain standardized EHR data; then constructing EHR data of the patient into the heterogram through a patient diagram structure construction module; obtaining the first M patients similar to the patient through a patient similarity calculation module, and visually displaying information of the similar patients; the diagnosis and prognosis conditions in the electronic health record data of the M patients are combined, and then auxiliary decisions are output to a clinician. Where the value of M can be set by the doctor himself.

The objects and effects of the present invention will become more apparent by describing in detail the clinical decision support system and method based on similarity of topological features of patients according to the embodiments.

A medical institution is intended to provide a doctor with additional diagnostic decisions by mining patient electronic case information. K patients most similar to the patients are calculated according to the EHR data, then the similarity and the difference between the patients are analyzed, and effective diagnosis decision making, medication and other suggestions are provided for doctors according to treatment and prognosis schemes of the similar patients.

Taking an example of acute kidney injury caused by heart failure of the medical institution, the method specifically comprises the following steps:

s1, a data acquisition module extracts medical information data of 1000 patients of a related consulting room from an electronic medical record database,。

s2, preprocessing the acquired medical information data by a data preprocessing module: firstly deleting patient records with missing values, chronic kidney diseases, existing kidney resections, kidney transplants or AKI before operation, if the patient has a plurality of patient records, extracting diagnosis of patient history records, combining the patient records with patient basic information through ICD-10-cm codes as the past medical history of the patient to form a basic information set B, reserving characters B= { age, sex, ' myocardial infarction I42', ' hypertension I10', ' high cholesterol E79', ' before codes for facilitating reading,. Then, the diagnosis during the current observation window is extracted, and the ICD-10-cm code is used to add to the patient's medical diagnosis set D= { ' heart failure I50', ' myocardial infarction I42', ' hypertension I10', ' hypercholesterolemia E79', ' coronary heart disease I25.103', ' valvular disease I30' … }>Deleting abnormal value records exceeding a reference range in laboratory test data to form a laboratory test set L= { 'systolic pressure', 'diastolic pressure' 'potassium', 'glomerular filtration rate', 'hemoglobin', 'creatinine', 'urea', 'aspartate aminotransferase', 'alanine aminotransferase', 'triglyceride', 'high density lipoprotein cholesterol', 'low density lipoprotein cholesterol', 'brain natriuretic peptide precursor' … }, and the like>. CPT encoding the treatment and operation performed by the doctor to form a medical operation set H= { 'coronary fistulous repair 02Q00ZZ', 'aortic valve repair 02QF0ZZ', 'aortic valve replacement 02RF48Z', 'heart transplant 02YA0Z0' … }, and%>. Patient medication data was encoded using ATC to form a patient medication set m= { 'rimidine C02AC06', 'reserpine C02AA02', 'quinidine C01BA01', 'amlodipine and diuretic C08GA02' … }, respectively>。

S3, a patient diagram structure construction module constructs a patient heterogram: and constructing a heterogeneous graph G of 6 node types and 21 side types according to the medical knowledge graph knomed ct. Wherein, the total number of 6 nodesAnd total number of 21 edge typesThe method comprises the following steps of:

（11）

（12）

and S4, the patient similarity calculation module calculates the similarity between the patients who have been treated, and can also calculate the similarity between the patients who have been treated for the first time and the previous patients.

Taking the first visit of a patient as an example, the related data is preprocessed firstly, and then a patient node is generated according to the patient recordAnd constructing a side corresponding to the diagnosis and treatment event. At this point, the patient nodes total 1001. And (3) maintaining the diagnosis and treatment event and the edge of the diagnosis and treatment event constructed in the step (S3) unchanged. Calculating the importance of different patients according to the visit record of all patients and different patients by adopting the formula (5)>Sum of different diagnosis and treatment events +.>And (3) summing. The brain natriuretic peptide precursor is calculated to have the highest importance and the diastolic pressure and systolic pressure are calculated to have the lowest importance.

Ranking the resulting node importanceAnd node transition probability->Fusion (S)>Calculating first order similarity of patients. Initializing and defining the highest similarity of patients>Then pass->Calculating similarity between every two patients, and outputting calculation result after 5 times of iteration>。

Computing higher-order similarity between patients from heterogeneous pathsThe more patients connected via the same path, the more similar. Finally, the similarity between patients is obtained>。

S5, outputting and patient assisting clinical decision moduleThe most similar 5 patients, the diagnosis, medication and prognosis of these 5 patients were analyzed by spss statistics, giving patients +. >And medication advice.

Firstly, collecting electronic health record data of a patient; then, the original electronic health record data is converted into a calculable resource by a preprocessing method for the electronic health record data, and different types of numerical values in the electronic health record data are standardized, and the missing value and the abnormal value are processed; constructing a heterogeneous map of a patient, and constructing association of different diagnosis and treatment events according to the medical knowledge map; secondly, similarity based on topological structures among patients is calculated, co-occurrence characteristics among the patients are measured mainly through first-order shared neighbors, then high-order implicit association among the patients is mined through heterogeneous element paths, and then the two similarities are fused to obtain the similarity among the patients; and finally, sequencing the similarity to obtain M patients with highest similarity, and carrying out clinical analysis according to the electronic health record data of the M patients to obtain an analysis result, thereby providing guidance for doctors to design treatment schemes. The invention does not need expert knowledge base, can be compatible with different medical fields, avoids long-time training and black box characteristics, is beneficial to saving time cost and enhancing generalization capability and interpretability.

The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Claims

1. A clinical decision support method based on similarity of topological features of a patient, comprising the steps of:

in the step S3, the patient and the diagnosis and treatment event are taken as nodes, the relationship between the nodes is determined as edges according to the diagnosis and treatment event of each patient, a bipartite graph of the patient and the diagnosis and treatment event is constructed, and the relationship between different diagnosis and treatment events is determined as edges according to the medical knowledge graph on the basis of the bipartite graph, so as to construct an abnormal graph, which specifically comprises: firstly, taking basic information, medical examination, medication, medical diagnosis and medical operation in a patient and a diagnosis and treatment event as nodes, and representing different types of nodes by using different shapes, wherein the number of the patient nodes, the basic information nodes, the medical examination nodes, the medication nodes, the medical diagnosis nodes and the medical operation nodes is the total number of the patient, the global basic information, the medical examination, the medication, the medical diagnosis and the medical operation; then traversing the diagnosis and treatment event of each patient, judging whether basic information, medical examination, medication, medical diagnosis or medical operation data exist in the diagnosis and treatment event of the patient, and if the basic information, the medical examination, the medication, the medical diagnosis or the medical operation data exist in the diagnosis and treatment event of the patient, establishing a corresponding type of connection side; otherwise, the connecting edge is not established; and no connection edge is established between patient nodes; finally, constructing association edges between every two nodes of basic information nodes, medical examination nodes, medication nodes, medical diagnosis nodes and medical operation nodes according to the medical knowledge graph;

S4, calculating first-order similarity between two patients according to the constructed topological information of the patient and the bipartite graph of the diagnosis and treatment event; extracting a sub-graph taking the patient as a center according to the heterogeneous graph to calculate the importance of the meta-path, and calculating the high-order similarity between two patients according to the meta-path of the patients; calculating the similarity between two patients according to the first-order similarity and the higher-order similarity;

in the step S4, the calculating the first-order similarity between the two patients according to the constructed topological information of the two graphs of the patient and the diagnosis and treatment event specifically includes: calculating authority ranking scores of patients and diagnosis and treatment events according to topology information of different compositions; fusing authoritative ranking scores of patients and diagnosis and treatment events with the transition probabilities of the nodes to obtain the transition probabilities of the fused nodes; extracting first-order neighbor nodes of the patient nodes according to the constructed topological information of the heterograms, and iteratively calculating first-order similarity between two patients by introducing an evidence matrix and combining the transition probability of the fused nodes;

the authoritative ranking score of the patient and the diagnosis and treatment event is expressed as follows:

wherein, gamma _Y () Authoritative ranking score, gamma, representing different medical events _Y (a) Authoritative for representing a medical event aRanking score, gamma _Y (i) An authoritative ranking score representing a diagnostic event i; gamma ray _P (i) An authoritative ranking score representing the patient; alpha represents a super-ginseng;inverse number of times representing occurrence of diagnosis and treatment event a corresponding to all patients, +.>Indicating the number of times all diagnostic events co-occur with diagnostic event a,/->Representing patient P _i The times of all diagnosis and treatment events, N ^Y ＝N ^B +N ^L +N ^M +N ^D +N ^H ，N ^B 、N ^L 、N ^M 、N ^D 、N ^H 、N ^P Representing global basic information, medical examination, medication, kind count of medical diagnosis and medical operation and total patient count, respectively, W _YY Representing the connecting edges, W, among nodes of different diagnosis and treatment events _PY And W is equal to _YP Representing the connection edge of the patient node and the diagnosis and treatment event node, W _PY And W is equal to _YP Has symmetry;

the expression of the first order similarity between the two patients is:

wherein s is _t (p _i ,p _j ) Representing a first order similarity between patient i and patient j at iteration round t; i (p) _i ) For patient I in the diagnosis and treatment event set, a first-order neighbor node set, I (p _j ) For patient j in the diagnosis and treatment event set, a and b respectively represent a set I (p _i )、I(p _j ) Subscripts of any diagnosis and treatment event in the database are respectively in the value ranges of [1, I ] _a (p _i )]And (1) the number of the groups,I _b (p _j )]，y _a representing elements in a first-order neighbor node of patient i, y _b Representing elements in the first-order neighbor node of patient j; when i=j, s _t (p _i ,p _j ) Initialized to 1, if I (p _i ) And I (p) _j ) One is empty, s _t (p _i ,p _j ) =0; c is an attenuation factor; v is a bipartite graph node set; t is the current iteration round number; e (p) _i ,p _j ) In order to introduce the evidence matrix E,|I(p _i )∩I(p _j ) The I represents the number of commonly connected edges; phi (y) _a ,p _i ) Representing diagnosis and treatment event y _a Probability of transition to patient i; phi (y) _b ,p _j ) Representing diagnosis and treatment event y _b Probability of transition to patient j; p (P) _i Representing patient i, P _j Representing patient j;

in the step S4, the step of extracting the patient-centered sub-graph according to the isomerism graph to calculate the importance of the meta-path, and calculating the high-order similarity between the two patients according to the meta-path of the patients specifically includes: processing the heterogeneous graph by taking a patient as a center, extracting a K-order sub-graph of the patient, and counting the occurrence times of each element path in each K-order sub-graph to obtain the importance of each element path, wherein the K value is determined according to the length of the element path; calculating a transition probability between two adjacent patients based on the meta-path; calculating high-order similarity between two adjacent patients based on the transition probability between the two adjacent patients of the meta-path and the importance degree of the meta-path;

the calculating of the transition probability between two adjacent patients based on the meta-path specifically comprises: determining the transition probability from a patient node to a next patient node by calculating the product of the transition probabilities of each node in the meta-path to the next node, taking the transition probability as the weight of the edge, and calculating all weight sums of the patient passing through the first-order edge of the meta-path according to the weight so as to obtain the transition probability between two adjacent patients based on the meta-path;

The expression of similarity between two patients is:

ξ _ij ＝β ₁ *s(p _i ,p _j )+β ₂ w _ij

wherein, xi _ij For patient P _i And patient P _j Similarity between; s (p) _i ,p _j ) For patient P _i And patient P _j First order similarity between; w (w) _ij For patient P _i And patient P _j Higher order similarity between; beta ₁ 、β ₂ All are super ginseng;

2. The patient topological feature similarity based clinical decision support method according to claim 1, wherein said step S2 comprises the sub-steps of:

3. The patient topological feature similarity based clinical decision support method according to claim 1, wherein the medical knowledge graph comprises UMLS, knomed ct and CUMLS.

4. A patient topological feature similarity based clinical decision support system for implementing the patient topological feature similarity based clinical decision support method of any one of claims 1 to 3, the system comprising:

the patient similarity calculation module is used for calculating first-order similarity between two patients according to the constructed topological information of the two graphs of the patient and the diagnosis and treatment event; extracting a sub-graph taking the patient as a center according to the heterogeneous graph to calculate the importance of the meta-path, and calculating the high-order similarity between two patients according to the meta-path of the patients; calculating the similarity between two patients according to the first-order similarity and the higher-order similarity; and

5. The patient topological feature similarity based clinical decision support system of claim 4, wherein the patient similarity calculation module comprises a first order similarity calculation module and a higher order similarity calculation module;