CN115083616A - Chronic nephropathy subtype mining system based on self-supervision graph clustering - Google Patents

Chronic nephropathy subtype mining system based on self-supervision graph clustering Download PDF

Info

Publication number
CN115083616A
CN115083616A CN202210980822.5A CN202210980822A CN115083616A CN 115083616 A CN115083616 A CN 115083616A CN 202210980822 A CN202210980822 A CN 202210980822A CN 115083616 A CN115083616 A CN 115083616A
Authority
CN
China
Prior art keywords
node
kidney disease
clustering
chronic kidney
visit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210980822.5A
Other languages
Chinese (zh)
Other versions
CN115083616B (en
Inventor
李劲松
池胜强
徐铭鸿
李雪瑶
田雨
周天舒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202210980822.5A priority Critical patent/CN115083616B/en
Publication of CN115083616A publication Critical patent/CN115083616A/en
Application granted granted Critical
Publication of CN115083616B publication Critical patent/CN115083616B/en
Priority to JP2023092731A priority patent/JP7404581B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention discloses a chronic kidney disease subtype mining system based on self-supervision graph clustering, which comprises the following steps: a data acquisition module: the system is used for collecting the structured data in the diagnosis and treatment record of the chronic kidney disease; the data extraction and pretreatment module comprises: the system is used for extracting and preprocessing the structured data to obtain an entity set and a visit set; chronic kidney disease subtype mining module: the entity set and the visit set are used for constructing a chronic kidney disease subtype mining model; chronic kidney disease phenotype subtype assessment module: for evaluating the chronic kidney disease subtype mining model; chronic kidney disease subtype prediction module: for predicting structured data of a patient. The invention solves the problem that the process mining method can not process the coexistence of multi-granularity information such as event information in a single visit and event information among multiple visits in longitudinal electronic medical record data.

Description

Chronic nephropathy subtype mining system based on self-supervision graph clustering
Technical Field
The invention relates to the technical field of medical health information, in particular to a chronic kidney disease subtype mining system based on self-supervision graph clustering.
Background
According to clinical guidelines, chronic kidney disease is graded based on the patient's estimated glomerular filtration rate (eGFR) and urinary albumin-creatinine ratio (UACR). While eGFR and UACR can be used for screening and monitoring of chronic kidney disease, phenotypic differences in disease between individuals with chronic kidney disease cannot be characterized based on eGFR and UACR alone. Chronic kidney disease is a highly heterogeneous disease, closely related to systemic diseases and conditions, such as diabetes, hypertension, autoimmune diseases, genetic predisposition or congenital abnormalities. There are significant differences between individuals with chronic kidney disease, which can be described by disease phenotypes such as laboratory tests, medical history, medication history, and social factors. The initial phenotype difference of chronic kidney disease patients also causes the diagnosis and treatment process and complications of individuals to be different. A rational phenotypic classification of chronic kidney disease should differentiate between different subpopulations of patients, revealing disease characteristics and underlying disease pathology of the different subpopulations, thereby helping to better understand the different mechanisms of disease progression and progression.
The existing classification method of the chronic kidney disease subtype is mainly based on the clustering analysis of initial static phenotype data of a patient. The method mainly utilizes multidimensional data such as patient demographics, biomarkers and clinical characteristics collected at the beginning of research and mines the phenotype classification of chronic kidney disease patients based on common clustering algorithms such as hierarchical clustering and consistency clustering. However, chronic kidney disease patients have long disease process and many complications, which causes great difference in diagnosis and treatment process among patients. The clinical process data may imply important information for distinguishing different phenotypes of chronic kidney disease patients. In the data of the patient diagnosis and treatment process collected and stored in the electronic medical record system, event information such as operation, examination, inspection and medication for a specific patient and the occurrence time of the events can be extracted. The method utilizes the diagnosis and treatment process data of the patients to perform clustering, researches the disease phenotype mode of the patients, and has important significance for identifying and researching the characteristics of different subgroups of patients. The commonly used method for mining the data of the disease diagnosis and treatment process comprises the following steps: (1) the process mining method comprises the following steps: information is extracted from an event log generated in the process of diagnosis and treatment of a patient, and diagnosis and treatment event sequences are formed by arranging according to time sequence. Different patterns in the sequence of clinical events are then mined as different clinical paths for the disease, thereby classifying the disease phenotype of the patient. The method is difficult to utilize the co-occurrence information among the events, and cannot process the event incidence relation and the sequence relation in the longitudinal electronic medical record multi-time visit data. The excavation diagnosis and treatment process is complex, and the representativeness and the coverage rate are poor. (2) Tensor decomposition-based approach: and combining the information of the three dimensions of the patient, the time and the phenotype into a third-order tensor, and decomposing the third-order tensor so as to mine the potential phenotype classification of the patient. The method only considers disease phenotype conversion between continuous diagnosis and treatment and cannot process phenotype evolution information in a long-distance diagnosis and treatment process.
Therefore, we propose a chronic kidney disease subtype mining system based on the self-supervision graph clustering to solve the above technical problem.
Disclosure of Invention
In order to solve the technical problems, the invention provides a chronic kidney disease subtype mining system based on self-supervision graph clustering.
The technical scheme adopted by the invention is as follows:
a chronic kidney disease subtype mining system based on self-supervision picture clustering comprises:
a data acquisition module: the system is used for collecting the structured data in the diagnosis and treatment record of the chronic kidney disease;
the data extraction and pretreatment module: the system is used for extracting and preprocessing the structured data to obtain an entity set and a visit set;
chronic kidney disease subtype mining module: the entity set and the visit set are used for constructing a chronic kidney disease subtype mining model;
a chronic kidney disease phenotype subtype evaluation module: for evaluating the chronic kidney disease subtype mining model;
chronic kidney disease subtype prediction module: for predicting structured data of a patient.
Further, the structured data includes basic information of the patient, medical records, diagnoses during a viewing window, laboratory tests, medical examinations, surgeries, and/or medication data.
Further, the data extraction and preprocessing module is specifically configured to preprocess the structured data, extract the structured data in the diagnosis and treatment record of chronic kidney disease in the electronic medical record system, and preprocess the extracted structured data, where the structured data includes basic information of a patient, a diagnosis record, diagnosis during an observation window, laboratory test, medical examination, surgical data, and medication data, and the laboratory test data only focuses on an abnormal test item according to a normal reference range, divides the result of the abnormal test item into two categories, namely a lower category and a higher category, and retains the name of the abnormal test item and the abnormal category; medical examination and operation data are processed by a simple natural language processing technology, and the examined part, the examined type and the operation name are reserved; the medication data only pay attention to the use of six types of medicines, namely antihyperglycemic medicines, antihypertensive medicines, lipid regulating medicines, non-steroidal anti-inflammatory medicines, antiplatelet medicines and steroids, the six types of medicines in the medication data are classified, and the medicine categories are reserved; obtaining a diagnosis set, a medication set, an operation set, a test set, the number of diagnosis types, the number of medication types, the number of operation types, the number of test types and the number of treatment records, combining the diagnosis set, the medication set, the operation set and the test set to form an entity set, and combining the treatment records of patients to form a treatment set.
Further, the chronic kidney disease subtype mining module specifically comprises:
a visit network construction unit: a network for constructing a visit network using the visit set and the entity set;
an embedded representation construction unit: the entity co-occurrence matrix is constructed by utilizing the entity set, the entity node initial embedded representation and the clinic node initial embedded representation are obtained through the entity co-occurrence matrix, and the entity node initial embedded representation and the clinic node initial embedded representation form the node initial embedded representation;
a clustering network construction unit: the system comprises a node clustering network model, a node clustering model and a node clustering model, wherein the node clustering network model is used for constructing an adjacency matrix by utilizing the relationship among nodes in the visit network, and training the visit node clustering network model based on self-supervision graph clustering through the adjacency matrix and the initial embedded representation of the nodes;
the chronic kidney disease subtype mining model construction unit: and the method is used for constructing the chronic kidney disease subtype mining model through the self-supervision graph clustering-based visit node clustering network model.
Further, the visiting network constructing unit specifically includes:
the system is used for forming the visit set and the entity set into a node set;
the edge set is constructed through the node co-occurrence relations in the node set;
for constructing a treatment network using the set of nodes and the set of edges.
Further, the embedded representation building unit specifically includes:
the entity co-occurrence matrix is constructed by utilizing the entity set;
the initial embedded representation of each entity node is obtained through calculation of a GloVe algorithm based on the entity co-occurrence matrix;
the node initial embedded representation is obtained by calculating an average value of the entity node initial embedded representations of all adjacent entity nodes, and the clinic node initial embedded representation and the entity node initial embedded representation form the node initial embedded representation.
Further, the clustering network constructing unit specifically includes:
the self-supervision graph clustering based visit node clustering network model is used for constructing an adjacency matrix by utilizing the relationship among the nodes in the visit network, inputting the adjacency matrix and the initial node embedded representation into the visit node clustering network model based on the self-supervision graph clustering for graph attention training, and obtaining a node embedded representation, wherein the node embedded representation comprises a visit node embedded representation and an entity node embedded representation;
the node embedded representation is used for reconstructing the visit network and calculating a visit network reconstruction error;
the decoder is used for inputting the entity node embedded representation into the neural network for training, the output of the last layer of the decoder is used as entity node reconstruction embedded representation, and entity node reconstruction errors are calculated;
the system is used for performing softmax regression operation on the embedded expression of the treatment nodes to obtain the probability distribution of the treatment nodes, and calculating the clustering loss according to the probability distribution of the treatment nodes;
and the overall loss function is used for constructing the visit node clustering network model based on the self-supervision graph clustering according to the visit network reconstruction error, the entity node reconstruction error and the clustering loss.
Further, the chronic kidney disease subtype mining model construction unit specifically includes:
the self-supervision graph clustering-based diagnosis node clustering network model is used for obtaining diagnosis node clustering distribution as classification distribution of the diagnosis nodes, selecting the classification with the highest probability in the classification distribution as a classification label of the diagnosis nodes, and arranging all the diagnosis nodes of each patient according to a time sequence;
the event matrix is constructed by arranging the diagnosis nodes;
the method is used for searching for frequent event determination nodes, the frequent events are used as nodes in an event flow, the rest events directly enter an end node, each event in the frequent events is used as an initial node of the next search, a corresponding event vector is extracted to be combined into a new event matrix, the same frequent event searching operation is carried out after the first column is removed, the node obtained by each search is connected with the initial node so as to prolong the event flow until the frequent event is empty or the event flow length reaches the maximum event flow length, and a chronic kidney disease subtype mining model is obtained after the circulation is ended.
Further, the module for predicting the subtype of chronic kidney disease specifically comprises:
the self-supervision graph clustering-based visit node clustering network model is used for inputting the preprocessed patient structured data into the visit node clustering network model for prediction to obtain the probability distribution of the visit node of the patient;
the cluster type of the treatment nodes is judged according to the probability distribution of the treatment nodes, and a treatment event sequence is constructed;
the system is used for inputting the treatment event sequence into the chronic kidney disease subtype mining model, fitting nodes in the chronic kidney disease subtype mining model according to the sequence to obtain an event flow, and judging which chronic kidney disease subtype belongs to through the event flow.
The invention has the beneficial effects that: the invention provides a chronic kidney disease subtype mining system based on self-supervision graph clustering. Firstly, longitudinal electronic medical record data of a patient for multiple times of treatment is constructed into a treatment network, and the treatment network comprises multi-dimensional patient diagnosis and treatment event information such as treatment, diagnosis, laboratory examination, medical examination, operation, medication and the like. And secondly, acquiring vector representation of the diagnosis and treatment events by using the co-occurrence information of the diagnosis and treatment events. And clustering the treatment events by using a treatment node clustering network model based on the self-supervision graph clustering, and labeling each treatment event. Then, on the aspect of the treatment, the diagnosis and treatment path of the patient is excavated to obtain different subtypes of the chronic kidney disease phenotype. Finally, a phenotypic subtype assessment method is provided to assess whether clinically interpretable differences exist among the different mined subtypes, including a series of comprehensive indicators of patient demographics, medication, complications, and survival rates.
The method comprises the steps that diagnosis, laboratory inspection, medical examination, operation, medication and other event information in each visit are trained through a visit node clustering network model based on self-supervision graph clustering to obtain category labels of each visit, and low-level and fine-grained information is gathered into high-level and coarse-grained general information in the process; and the type label of the diagnosis is used for a diagnosis and treatment path mining mode, so that the problem that multi-granularity information such as event information in a single diagnosis and event information among multiple times of diagnoses cannot be processed in longitudinal electronic medical record data by the process mining method is solved.
The event vector representation is obtained based on the co-occurrence information and used for the graph model, the problem that the process mining method is difficult to utilize the event co-occurrence information is effectively solved, and the full feature mining of the diseases by simultaneously utilizing the cross section and the longitudinal electronic medical record data is realized.
The self-supervision graph clustering algorithm provided by the invention brings the multi-time diagnosis information of the patient into a diagnosis node clustering network model based on self-supervision graph clustering, trains the embedded expression of the nodes, and can process the phenotype evolution information in the long-distance diagnosis and treatment process. Then, different nodes and relations in the treatment network are supervised and learned respectively. Computing a reconstruction error of the node using the L2 norm based on the decoder reconstructing the embedded representation of the lower level node; calculating the reconstruction error of the graph relation by using the cross entropy; and calculating the clustering error of the treatment nodes by utilizing the KL divergence.
Based on the distribution similarity of the event labels of the diagnosis nodes, similar adjacent events are combined, the process mining method is optimized, the mined diagnosis and treatment process is simplified, and the representativeness and the coverage rate of the diagnosis and treatment process are improved.
Drawings
FIG. 1 is a schematic structural diagram of a chronic kidney disease subtype mining system based on self-supervision graph clustering according to the present invention;
FIG. 2 is a functional flow diagram of a chronic kidney disease subtype mining system based on self-supervision picture clustering according to the present invention;
FIG. 3 is a treatment network according to an embodiment of the present invention;
FIG. 4 is a co-occurrence matrix of an embodiment of the present invention;
fig. 5 is a diagram of a self-supervision graph clustering-based clinic node clustering network model structure according to an embodiment of the present invention.
Detailed Description
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a chronic kidney disease subtype mining system based on self-supervision map clustering comprises:
a data acquisition module: the system is used for collecting the structured data in the chronic kidney disease diagnosis and treatment record;
the data extraction and pretreatment module: the system is used for extracting and preprocessing the structured data to obtain an entity set and a visit set;
chronic kidney disease subtype mining module: the entity set and the visit set are used for constructing a chronic kidney disease subtype mining model;
chronic kidney disease phenotype subtype assessment module: for evaluating the chronic kidney disease subtype mining model;
chronic kidney disease subtype prediction module: for predicting structured data of a patient.
Referring to fig. 2, a functional process of a chronic kidney disease subtype mining system based on self-supervision graph clustering comprises the following steps:
step S1: the method comprises the steps of collecting structural data in a chronic kidney disease diagnosis and treatment record to construct a data set through a data collection module; the structured data includes patient basic information, medical records, diagnoses during viewing windows, laboratory tests, medical examinations, surgery and/or medication data;
step S2: preprocessing the structured data through a data extraction and preprocessing module to obtain a doctor seeing set and an entity set; preprocessing the data set, extracting structured data in the diagnosis and treatment record of the chronic kidney disease in an electronic medical record system, wherein the structured data comprises basic information of a patient, a diagnosis record, diagnosis during an observation window, laboratory inspection, medical examination, operation data and medication data, preprocessing the extracted structured data, only paying attention to abnormal inspection items according to a normal reference range, dividing results of the abnormal inspection items into a lower type and a higher type, and keeping names and abnormal types of the abnormal inspection items; medical examination and operation data are processed by a simple natural language processing technology, and the examined part, the examined type and the operation name are reserved; the medication data only pay attention to the use of six types of medicines, namely antihyperglycemic medicines, antihypertensive medicines, lipid regulating medicines, non-steroidal anti-inflammatory medicines, antiplatelet medicines and steroids, the six types of medicines in the medication data are classified, and the medicine categories are reserved; obtaining a diagnosis set, a medication set, an operation set, a test set, the number of diagnosis types, the number of medication types, the number of operation types, the number of test types and the number of treatment records, combining the diagnosis set, the medication set, the operation set and the test set to form an entity set, and combining the treatment records of patients to form a treatment set.
Step S3: inputting the treatment set and the entity set into a chronic kidney disease subtype mining module, and constructing a chronic kidney disease subtype mining model through the chronic kidney disease subtype mining module;
step S31: constructing a treatment network by using the treatment set and the entity set;
step S311: forming a node set by the visit set and the entity set;
step S312: constructing an edge set through the node co-occurrence relationship in the node set;
step S313: and constructing a treatment network by using the node set and the edge set.
Step S32: constructing an entity co-occurrence matrix by using the entity set, acquiring an entity node initial embedded representation and a diagnosis node initial embedded representation through the entity co-occurrence matrix, and forming the entity node initial embedded representation and the diagnosis node initial embedded representation into a node initial embedded representation;
step S321: constructing an entity co-occurrence matrix by using the entity set;
step S322: based on the entity co-occurrence matrix, calculating by a GloVe algorithm to obtain an initial embedded representation of each entity node;
step S323: obtaining a visit node initial embedded representation by calculating an average value of the entity node initial embedded representations of all adjacent entity nodes, wherein the visit node initial embedded representation and the entity node initial embedded representation form a node initial embedded representation.
Step S33: constructing an adjacency matrix by utilizing the relation between nodes in the visit network, and initially embedding the adjacency matrix and the nodes to express and train a visit node clustering network model based on self-supervision graph clustering;
step S331: constructing an adjacency matrix by utilizing the relationship among the nodes in the visit network, inputting the adjacency matrix and the initial node embedded representation into the visit node clustering network model based on the self-supervision graph clustering for graph attention training to obtain a node embedded representation, wherein the node embedded representation comprises a visit node embedded representation and an entity node embedded representation;
step S332: reconstructing the visit network by using the node embedded representation, and calculating a visit network reconstruction error;
step S333: inputting the entity node embedded representation into a decoder of a neural network for training, taking the output of the last layer of the decoder as an entity node reconstruction embedded representation, and calculating an entity node reconstruction error;
step S334: performing softmax regression operation on the embedded representation of the treatment nodes to obtain the probability distribution of the treatment nodes, and calculating clustering loss according to the probability distribution of the treatment nodes;
step S335: and constructing an overall loss function of the visit node clustering network model based on the self-supervision graph clustering according to the visit network reconstruction error, the entity node reconstruction error and the clustering loss.
Step S34: and constructing a chronic kidney disease subtype mining model through the diagnosis node clustering network model based on the self-supervision graph clustering.
Step S341: using the clinic node cluster distribution obtained by the clinic node cluster network model based on the self-supervision graph cluster as the class distribution of the clinic nodes, selecting the class with the highest probability in the class distribution as the class label of the clinic nodes, and arranging all the clinic nodes of each patient according to the time sequence;
step S342: determining to combine or separately reserve the treatment nodes by calculating cosine similarity between category distributions of the continuous treatment nodes having the same category label, and constructing an event matrix by arranging the treatment nodes;
step S343: searching frequent event determination nodes, connecting the diagnosis nodes in sequence to form an event flow, starting from a first column of the event matrix, selecting events with the frequency of occurrence of the events in each column being greater than a threshold value as frequent events, using the frequent events as nodes in the event flow, directly entering the remaining events into a terminal node, taking each event in the frequent events as a starting node of the next round of searching, extracting corresponding event vectors, combining the event vectors into a new event matrix, removing the first column, performing the same operation of searching the frequent events, connecting the nodes obtained by each round of searching with the starting node so as to prolong the event flow until the frequent events are empty or the event flow length reaches the maximum event flow length, and obtaining a chronic kidney disease subtype mining model after the cycle is finished.
Step S4: evaluating the chronic kidney disease subtype mining model through a chronic kidney disease phenotype subtype evaluation module;
step S5: predicting structured data of a patient by a chronic kidney disease subtype prediction module;
step S51: preprocessing structured data of a patient, inputting the preprocessed structured data into the visit node clustering network model based on the self-supervision graph clustering for prediction, and obtaining probability distribution of the visit nodes of the patient;
step S52: judging the cluster type of the treatment nodes according to the probability distribution of the treatment nodes, and constructing a treatment event sequence;
step S53: inputting the diagnosis event sequence into the chronic kidney disease subtype mining model, fitting nodes in the chronic kidney disease subtype mining model according to the sequence to obtain an event flow, and judging which chronic kidney disease subtype belongs to through the event flow.
Example (b):
a chronic kidney disease subtype mining system based on self-supervision picture clustering comprises:
a data acquisition module: the system is used for acquiring structured data in the diagnosis and treatment record of chronic kidney disease to construct a data set; the structured data includes basic information of the patient, medical records, diagnoses during viewing windows, laboratory tests, medical examinations, surgery, and/or medication data;
the data extraction and pretreatment module: the system is used for extracting and preprocessing the structured data to obtain a doctor seeing set and an entity set; the data extraction and preprocessing module is specifically used for preprocessing the structured data, extracting the structured data in the chronic kidney disease diagnosis and treatment records in the electronic medical record system, wherein the structured data comprises basic information of a patient, a diagnosis record, diagnosis during an observation window, laboratory test, medical examination, operation data and medication data, preprocessing the extracted structured data, only paying attention to an abnormal test item according to a normal reference range, dividing the result of the abnormal test item into a lower type and a higher type, and keeping the name and the type of the abnormal test item; medical examination and operation data are processed by a simple natural language processing technology, and the examined part, the examined type and the operation name are reserved; the medication data only pay attention to the use of six types of medicines, namely antihyperglycemic medicines, antihypertensive medicines, lipid regulating medicines, non-steroidal anti-inflammatory medicines, antiplatelet medicines and steroids, the six types of medicines in the medication data are classified, and the medicine categories are reserved; obtaining a diagnosis set, a medication set, an operation set, a test set, the number of diagnosis types, the number of medication types, the number of operation types, the number of test types and the number of treatment records, combining the diagnosis set, the medication set, the operation set and the test set to form an entity set, and combining the treatment records of patients to form a treatment set.
Chronic kidney disease subtype mining module: the system is used for inputting the treatment set and the entity set into a chronic kidney disease subtype mining module, and a chronic kidney disease subtype mining model is constructed through the chronic kidney disease subtype mining module;
a visit network construction unit: a network for constructing a visit network using the visit set and the entity set;
the system is used for forming the visit set and the entity set into a node set;
the doctor is integrated into
Figure DEST_PATH_IMAGE001
In which
Figure DEST_PATH_IMAGE002
Indicating the number of visits.
Figure DEST_PATH_IMAGE003
Respectively a diagnosis set, a medication set, an operation set and a test set,
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE005
Figure DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE007
in which
Figure DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
Figure DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE011
Respectively representing the diagnosis type quantity, the medicine type quantity, the operation type quantity and the inspection type quantity.
Figure DEST_PATH_IMAGE012
Composing collections of entities
Figure DEST_PATH_IMAGE013
The number of entity set types is
Figure DEST_PATH_IMAGE014
The entity set and the visit set form a node set
Figure DEST_PATH_IMAGE015
Number of nodes
Figure DEST_PATH_IMAGE016
The edge set is constructed through the node co-occurrence relations in the node set;
the same visit will be (
Figure DEST_PATH_IMAGE017
) The entities present in constitute a subset of entities
Figure DEST_PATH_IMAGE018
Figure DEST_PATH_IMAGE019
Representing a subset of entities
Figure DEST_PATH_IMAGE020
The number of the entities in the group,
Figure DEST_PATH_IMAGE021
. Each entity subset and the corresponding visit form a visit UNICOM subset
Figure DEST_PATH_IMAGE022
. One of the visit unicom subsets comprises a visit node and all entity nodes in the visit, all nodes in one visit unicom subset have a co-occurrence relationship, and the nodes are connected pairwise to form an edge subset; all the edge subsets form an edge set, and the edge set is
Figure DEST_PATH_IMAGE023
For constructing a treatment network using the node set and the edge set
Figure DEST_PATH_IMAGE024
Referring to FIG. 3, at the visit
Figure DEST_PATH_IMAGE025
In the middle, the physician prescribes goiter (
Figure DEST_PATH_IMAGE026
) Thyroid nodule (A)
Figure DEST_PATH_IMAGE027
) Two diagnoses, partial thyroidectomy (
Figure DEST_PATH_IMAGE028
) And the levothyroxine sodium tablet (
Figure DEST_PATH_IMAGE029
) The medicine is prepared. Then
Figure DEST_PATH_IMAGE030
A subset of visit links is formed, and the 5 nodes in the visit network are connected pairwise. At the moment of treatment
Figure DEST_PATH_IMAGE031
In (A), the doctor has carried out TSH measurement: (A)
Figure DEST_PATH_IMAGE032
) After that, hypothyroidism (
Figure DEST_PATH_IMAGE033
) Diagnosis and development of levothyroxine sodium tablet (
Figure DEST_PATH_IMAGE034
) And (4) medicine preparation. Then
Figure DEST_PATH_IMAGE035
Is also a subset of treatment links, and the 4 nodes are connected in pairs in the treatment network. Due to the fact that
Figure DEST_PATH_IMAGE036
At the same time appear in
Figure DEST_PATH_IMAGE037
And
Figure DEST_PATH_IMAGE038
in the visit network
Figure DEST_PATH_IMAGE039
To the other nodes in both of these subsets of patient associations.
An embedded representation construction unit: the entity co-occurrence matrix is constructed by utilizing the entity set, the entity node initial embedded representation and the clinic node initial embedded representation are obtained through the entity co-occurrence matrix, and the entity node initial embedded representation and the clinic node initial embedded representation form the node initial embedded representation;
for constructing an entity co-occurrence matrix using the set of entities;
utilizing entity collections
Figure DEST_PATH_IMAGE040
Constructing entity co-occurrence matrices
Figure DEST_PATH_IMAGE041
Referring to FIG. 4, the entity co-occurrence matrix
Figure 3170DEST_PATH_IMAGE041
Has the dimension of
Figure DEST_PATH_IMAGE042
Each row and column representing a set of entities
Figure 592414DEST_PATH_IMAGE040
In the context of one of the entities,
Figure DEST_PATH_IMAGE043
representing entities
Figure DEST_PATH_IMAGE044
And entities
Figure DEST_PATH_IMAGE045
Co-occurrence information of (a).
Figure DEST_PATH_IMAGE046
The calculation formula of (2) is as follows:
Figure DEST_PATH_IMAGE047
Figure DEST_PATH_IMAGE048
wherein, if the entity
Figure DEST_PATH_IMAGE049
And entities
Figure DEST_PATH_IMAGE050
At the moment of treatment
Figure DEST_PATH_IMAGE051
When the two occur at the same time, then
Figure DEST_PATH_IMAGE052
Equal to 1; if not, it is noted as 0. Wherein
Figure DEST_PATH_IMAGE053
To be at the clinic
Figure DEST_PATH_IMAGE054
All entities present in (a) constitute a subset of entities. Entity co-occurrence matrix
Figure DEST_PATH_IMAGE055
The two-dimensional mirror is symmetrical to each other,
Figure 995321DEST_PATH_IMAGE043
and
Figure DEST_PATH_IMAGE056
equal, co-occurrence information of the same entity on the diagonal is marked as 0.
The initial embedded representation of each entity node is obtained through calculation of a GloVe algorithm based on the entity co-occurrence matrix;
the relationship between the entity node initial embedded representation and the entity co-occurrence matrix is represented as:
Figure DEST_PATH_IMAGE057
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE058
and
Figure DEST_PATH_IMAGE059
respectively, the entities that ultimately need to be solved
Figure DEST_PATH_IMAGE060
And entities
Figure DEST_PATH_IMAGE061
The entity node of (1) is initially embedded and expressed, and is randomly initialized into a random vector with 128 dimensions and the value between-0.1 and 0.1; upper label
Figure DEST_PATH_IMAGE062
Is a transposition operation;
Figure DEST_PATH_IMAGE063
and
Figure DEST_PATH_IMAGE064
the bias terms are respectively represented by the initial embedding of two entity nodes, and the initial value is 0.
Constructing an objective function based on the relation between the entity co-occurrence matrix and the entity node initial embedded representation
Figure DEST_PATH_IMAGE065
Figure DEST_PATH_IMAGE066
Figure DEST_PATH_IMAGE067
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE068
is the co-occurrence information threshold value and,
Figure DEST_PATH_IMAGE069
is an exponential parameter.
If two physical nodes do not appear together, i.e.
Figure DEST_PATH_IMAGE070
They do not participate in the calculation of the objective function. Optimizing the objective function through AdaDelta gradient descent algorithm until convergence, and obtaining each entity in the entity set
Figure DEST_PATH_IMAGE071
Corresponding entity node initial embedded representation
Figure DEST_PATH_IMAGE072
The node initial embedded representation is obtained by calculating an average value of the entity node initial embedded representations of all adjacent entity nodes, and the clinic node initial embedded representation and the entity node initial embedded representation form a node initial embedded representation;
for the point of visit
Figure DEST_PATH_IMAGE073
The set of all adjacent entity nodes is
Figure DEST_PATH_IMAGE074
Figure DEST_PATH_IMAGE075
The initial embedding of the node is represented as:
Figure DEST_PATH_IMAGE076
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE077
is that
Figure DEST_PATH_IMAGE078
The number of intermediate entity nodes.
Node initial embedded representation
Figure DEST_PATH_IMAGE079
Figure DEST_PATH_IMAGE080
Is the initial embedded representation of the treatment node,
Figure DEST_PATH_IMAGE081
is the entity node initial embedded representation.
A clustering network construction unit: the system comprises a node clustering network model, a node clustering model and a node clustering model, wherein the node clustering network model is used for constructing an adjacency matrix by utilizing the relationship among nodes in the visit network, and training the visit node clustering network model based on self-supervision graph clustering through the adjacency matrix and the initial embedded representation of the nodes; referring to fig. 5, the self-supervision graph clustering-based diagnosis node clustering network model consists of 3 parts of graph attention, self-encoder and self-supervision.
For constructing an adjacency matrix using relationships between nodes in the treatment network
Figure DEST_PATH_IMAGE082
Connecting the adjacent matrixes
Figure 935070DEST_PATH_IMAGE082
And the node initial embedded representation
Figure DEST_PATH_IMAGE083
Inputting the information into the visit node clustering network model based on the self-supervision graph clustering
Figure DEST_PATH_IMAGE084
Attention-oriented exercise of secondary drawings, first
Figure DEST_PATH_IMAGE085
Node embedding of a layer is represented as
Figure DEST_PATH_IMAGE086
The calculation method is as follows:
Figure DEST_PATH_IMAGE087
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE088
is the function of the activation of the relu,
Figure DEST_PATH_IMAGE089
is the first
Figure DEST_PATH_IMAGE090
The layer map is aware of the force weights.
Figure DEST_PATH_IMAGE091
Figure DEST_PATH_IMAGE092
Is a normalized adjacency matrix that is,
Figure DEST_PATH_IMAGE093
is an identity matrix
Figure DEST_PATH_IMAGE094
. In the process of passing
Figure DEST_PATH_IMAGE095
After the layer diagram attention training, the node embedding expression is obtained
Figure DEST_PATH_IMAGE096
Figure DEST_PATH_IMAGE097
With node initial embedded representation
Figure DEST_PATH_IMAGE098
Likewise, the embedded representation by the updated treatment node
Figure DEST_PATH_IMAGE099
And entity node embedded representation
Figure DEST_PATH_IMAGE100
The structure of the utility model is that the material,
Figure DEST_PATH_IMAGE101
the node embedded representation is used for reconstructing the visit network and calculating a visit network reconstruction error;
reconstructed adjacency matrix
Figure DEST_PATH_IMAGE102
Comprises the following steps:
Figure DEST_PATH_IMAGE103
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE104
is that
Figure DEST_PATH_IMAGE105
The transpose matrix of (a) is,
Figure DEST_PATH_IMAGE106
is the sigmoid activation function.
Calculating the network reconstruction error of the visit
Figure DEST_PATH_IMAGE107
Figure DEST_PATH_IMAGE108
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE109
Figure DEST_PATH_IMAGE110
for embedding a physical node into a representation
Figure DEST_PATH_IMAGE111
Input device
Figure DEST_PATH_IMAGE112
The decoder of the layer neural network is trained, the node is in the second
Figure DEST_PATH_IMAGE113
Representation in a layer decoder as
Figure DEST_PATH_IMAGE114
The following calculation formula is used to obtain:
Figure DEST_PATH_IMAGE115
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE116
is the first
Figure DEST_PATH_IMAGE117
The network weights of the layer decoder are set,
Figure DEST_PATH_IMAGE118
is a deviation, the input of the decoder is
Figure DEST_PATH_IMAGE119
. Embedding representation with output of last layer of decoder as solid node reconstruction
Figure DEST_PATH_IMAGE120
Calculating the error of reconstruction of the physical node
Figure DEST_PATH_IMAGE121
Figure DEST_PATH_IMAGE122
For embedding representations for treatment nodes
Figure DEST_PATH_IMAGE123
Performing softmax regression operation to obtain the probability distribution of the treatment nodes:
Figure DEST_PATH_IMAGE124
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE125
is of the dimension of
Figure DEST_PATH_IMAGE126
Figure DEST_PATH_IMAGE127
The preset number of the clustering centers, namely the number of the categories of the treatment nodes, is selected according to experience attempts 3, 5 and 10, and the category number with a better result is obtained.
Figure DEST_PATH_IMAGE128
Is shown as
Figure DEST_PATH_IMAGE129
A sample belongs to
Figure 464622DEST_PATH_IMAGE077
The probability of a class.
Calculating clustering loss according to the probability distribution of the treatment nodes;
for the first
Figure DEST_PATH_IMAGE130
Individual visit sample and
Figure DEST_PATH_IMAGE131
cluster clustering using student t-distribution to judge data characterization
Figure DEST_PATH_IMAGE132
And a cluster center
Figure DEST_PATH_IMAGE133
The similarity of (c).
Figure DEST_PATH_IMAGE134
Is that
Figure DEST_PATH_IMAGE135
To (1) a
Figure DEST_PATH_IMAGE136
The number of rows is such that,
Figure DEST_PATH_IMAGE137
is based on the probability distribution of the treatment node
Figure 536352DEST_PATH_IMAGE135
A clustering center initialized by a K-means method,
Figure DEST_PATH_IMAGE138
is the degree of freedom of the distribution of the student t,
Figure DEST_PATH_IMAGE139
the calculation formula of (2) is as follows:
Figure DEST_PATH_IMAGE140
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE141
is the first
Figure DEST_PATH_IMAGE142
A sample belongs to
Figure 669566DEST_PATH_IMAGE019
Probability of each cluster being aggregated. Is provided with
Figure DEST_PATH_IMAGE143
Cluster the distributed set for all samples. Obtaining a cluster distribution
Figure DEST_PATH_IMAGE144
Then, the target distribution is calculated
Figure DEST_PATH_IMAGE145
Target distribution
Figure 274991DEST_PATH_IMAGE145
Sample assignment with higher confidence, and therefore can be based on
Figure 412711DEST_PATH_IMAGE145
To optimize the data distribution so that the data is closer to the cluster center.
Figure 609337DEST_PATH_IMAGE145
And
Figure 98087DEST_PATH_IMAGE144
is of the dimension of
Figure DEST_PATH_IMAGE146
. Target distribution
Figure 5738DEST_PATH_IMAGE145
Each element of
Figure DEST_PATH_IMAGE147
The calculation formula of (2) is as follows:
Figure DEST_PATH_IMAGE148
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE149
. Target distribution
Figure 201227DEST_PATH_IMAGE145
In the step (1), the first step,
Figure DEST_PATH_IMAGE150
is squared, so
Figure 303175DEST_PATH_IMAGE145
With a higher confidence. The calculation formula of the clustering loss is as follows:
Figure DEST_PATH_IMAGE151
for reconstructing errors from the visit network
Figure DEST_PATH_IMAGE152
Entity node reconstruction error
Figure DEST_PATH_IMAGE153
And cluster loss
Figure DEST_PATH_IMAGE154
And constructing a total loss function of the visit node clustering network model based on the self-supervision graph clustering. The overall loss function is:
Figure DEST_PATH_IMAGE155
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE156
is a super parameter for adjusting the importance of different loss items, and is set to be 0.1 by default.
The chronic kidney disease subtype mining model construction unit: and the method is used for constructing the chronic kidney disease subtype mining model through the self-supervision graph clustering-based visit node clustering network model.
The visit node clustering distribution obtained by the visit node clustering network model based on the self-supervision graph clustering
Figure DEST_PATH_IMAGE157
Selecting the category with the highest probability in the category distribution as the category label of the treatment node; medical treatment node
Figure DEST_PATH_IMAGE158
The corresponding category label is
Figure DEST_PATH_IMAGE159
. All the treatment nodes of each patient are arranged in time sequence by taking the recording time of the first medical record of a single treatment as the starting time of the treatment node and the recording time of the last medical record as the ending time of the treatment node.
Determining to combine or separately reserve the treatment nodes by calculating cosine similarity between category distributions of successive treatment nodes having the same category label, and constructing an event matrix by arranging the treatment nodes;
for two consecutive treatment nodes with the same category label
Figure DEST_PATH_IMAGE160
Calculating
Figure 731752DEST_PATH_IMAGE160
Cosine similarity between class distributions:
Figure DEST_PATH_IMAGE161
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE162
is an event
Figure DEST_PATH_IMAGE163
The class distribution of (2).
Combining the front and back treatment nodes with cosine similarity larger than 0.8 into one treatment node, wherein the category of the combined treatment node is distributed as
Figure DEST_PATH_IMAGE164
Otherwise, the two treatment nodes are kept separately. And (4) for a plurality of continuous treatment nodes with the same category label, performing cosine similarity judgment from front to back according to the arrangement sequence, and determining to merge or separately reserve.
The final visit nodes for each patient are arranged into an event vector
Figure DEST_PATH_IMAGE165
Figure DEST_PATH_IMAGE166
The node number of the patient with the most visiting nodes is insufficient
Figure 115197DEST_PATH_IMAGE166
Fills the event vector with 0. Combining event vectors for all patients into an event matrix
Figure DEST_PATH_IMAGE167
The event matrix
Figure 430772DEST_PATH_IMAGE167
Comprises the following steps:
Figure DEST_PATH_IMAGE168
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE169
has a dimension of
Figure DEST_PATH_IMAGE170
Figure DEST_PATH_IMAGE171
Is that
Figure DEST_PATH_IMAGE172
The method is used for searching for frequent event determination nodes, the frequent events are used as nodes in an event flow, the rest events directly enter an end node, each event in the frequent events is used as an initial node of the next search, a corresponding event vector is extracted to be combined into a new event matrix, the same frequent event searching operation is carried out after the first column is removed, the node obtained by each search is connected with the initial node so as to prolong the event flow until the frequent event is empty or the event flow length reaches the maximum event flow length, and a chronic kidney disease subtype mining model is obtained after the circulation is ended.
Chronic kidney disease phenotype subtype assessment module: for evaluating the chronic kidney disease subtype mining model;
and comparing the differences of the patients with different phenotype subtypes, and checking whether the characteristics of the excavated different subtypes have statistical differences, thereby evaluating whether the disease subtypes obtained by the phenotype subtype excavation method have clinical significance. The specific evaluation protocol was as follows:
and calculating indexes such as sex, age, glomerular filtration rate and the like of the patients with different phenotype subtypes, and judging whether the clinical manifestations of the patients with different phenotype subtypes are different by using a statistical test method.
And (4) counting whether difference exists in important medication data such as the use amount of recombinant human erythropoietin, metformin, candesartan and pravastatin of patients with different subtypes, and analyzing by using a statistical test method.
Counting the number of the patients with various complications of each subtype, including heart failure, coronary heart disease, hypertension, diabetes and hyperlipidemia, calculating the ratio of each complication, and checking whether the ratio of the complications in different subtypes is different.
And counting the total number of all subtypes and the survival number at different time points, and comparing the survival rates of different subtype patients. The difference in survival rates over time for patients of different subtypes was observed and analyzed using the Log-rank test.
If the characteristics of the patient groups of different subtypes are remarkably different by more than 50 percent, the excavated subtypes have better clinical use value.
Chronic kidney disease subtype prediction module: for predicting structured data of a patient;
the self-supervision graph clustering-based diagnosis node clustering network model is used for inputting the preprocessed patient structural data into the diagnosis node clustering network model for prediction to obtain the probability distribution of the diagnosis nodes of the patient;
the cluster type of the treatment nodes is judged according to the probability distribution of the treatment nodes, and a treatment event sequence is constructed;
and the system is used for inputting the visit event sequence into the chronic kidney disease subtype mining model, fitting nodes in the chronic kidney disease subtype mining model according to the sequence to obtain an event flow, and judging which chronic kidney disease subtype belongs to through the event flow.
The invention provides a diagnosis node clustering network model based on self-supervision graph clustering, wherein a decoder is added in graph attention training for reconstructing node embedded representation; adding self-supervision loss for training a clustering model; the method comprises the steps that a clinic node clustering network model based on self-supervision graph clustering is used for gathering low-level and fine-grained chronic nephropathy patient information into high-level and coarse-grained general information for diagnosis and treatment process mining, and the problem that multi-grained information such as event information in a single clinic and event information among multiple diagnoses cannot be processed in longitudinal electronic medical record data in process mining is solved; based on an automatic supervision graph clustering method, multi-dimensional diagnosis and treatment information in a single diagnosis of a patient and time sequence information among multiple diagnoses are fully integrated, and meanwhile, full feature mining is carried out on electronic medical record data from two dimensions, namely a cross section and a longitudinal dimension; based on the distribution similarity of event labels of the diagnosis nodes, similar adjacent events are combined, the process mining method is optimized, the mined diagnosis and treatment process is simplified, and the representativeness and the coverage rate of the diagnosis and treatment process are improved.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A chronic kidney disease subtype mining system based on self-supervision graph clustering is characterized by comprising:
a data acquisition module: the system is used for collecting the structured data in the diagnosis and treatment record of the chronic kidney disease;
the data extraction and pretreatment module: the system is used for extracting and preprocessing the structured data to obtain an entity set and a visit set;
chronic kidney disease subtype mining module: the entity set and the visit set are used for constructing a chronic kidney disease subtype mining model;
chronic kidney disease phenotype subtype assessment module: for evaluating the chronic kidney disease subtype mining model;
chronic kidney disease subtype prediction module: for predicting structured data of a patient.
2. The system of claim 1, wherein the structured data comprises basic patient information, medical records, diagnosis during observation windows, laboratory tests, medical examinations, surgery and/or medication data.
3. The chronic kidney disease subtype mining system based on the autopsy clustering as claimed in claim 1, wherein the data extraction and preprocessing module is specifically configured to preprocess the structured data, extract the structured data in the chronic kidney disease diagnosis and treatment records in the electronic medical record system, including basic information of a patient, a diagnosis record, diagnosis during an observation window, laboratory tests, medical examinations, surgical data, and medication data, preprocess the extracted structured data, focus on only abnormal test items according to a normal reference range, classify results of the abnormal test items into lower and higher categories, and retain names and classes of the abnormal test items; medical examination and operation data are processed by a simple natural language processing technology, and the examined part, the examined type and the operation name are reserved; the medication data only pay attention to the use of six types of medicines, namely antihyperglycemic medicines, antihypertensive medicines, lipid regulating medicines, non-steroidal anti-inflammatory medicines, antiplatelet medicines and steroids, the six types of medicines in the medication data are classified, and the medicine categories are reserved; obtaining a diagnosis set, a medication set, an operation set, a test set, the number of diagnosis types, the number of medication types, the number of operation types, the number of test types and the number of treatment records, combining the diagnosis set, the medication set, the operation set and the test set to form an entity set, and combining the treatment records of patients to form a treatment set.
4. The chronic kidney disease subtype mining system based on unsupervised graph clustering of claim 1, wherein the chronic kidney disease subtype mining module specifically includes:
a visit network construction unit: a network for constructing a visit network using the visit set and the entity set;
an embedded representation construction unit: the entity co-occurrence matrix is constructed by utilizing the entity set, the entity node initial embedded representation and the clinic node initial embedded representation are obtained through the entity co-occurrence matrix, and the entity node initial embedded representation and the clinic node initial embedded representation form the node initial embedded representation;
a clustering network construction unit: the system comprises a node clustering network model, a node clustering model and a node clustering model, wherein the node clustering network model is used for constructing an adjacency matrix by utilizing the relationship among nodes in the visit network, and training the visit node clustering network model based on self-supervision graph clustering through the adjacency matrix and the initial embedded representation of the nodes;
the chronic kidney disease subtype mining model construction unit: and the method is used for constructing the chronic kidney disease subtype mining model through the self-supervision graph clustering-based visit node clustering network model.
5. The chronic kidney disease subtype mining system based on self-supervision picture clustering as claimed in claim 4, wherein the visiting network constructing unit specifically includes:
the system is used for forming the visit set and the entity set into a node set;
the edge set is constructed through the node co-occurrence relations in the node set;
for constructing a treatment network using the set of nodes and the set of edges.
6. The chronic kidney disease subtype mining system based on unsupervised graph clustering as claimed in claim 4, wherein said embedded representation construction unit specifically includes:
for constructing an entity co-occurrence matrix using the set of entities;
the initial embedded representation of each entity node is obtained through calculation of a GloVe algorithm based on the entity co-occurrence matrix;
the node initial embedded representation is obtained by calculating an average value of the entity node initial embedded representations of all adjacent entity nodes, and the clinic node initial embedded representation and the entity node initial embedded representation form the node initial embedded representation.
7. The chronic kidney disease subtype mining system based on self-supervision graph clustering according to claim 4, characterized in that the clustering network construction unit specifically includes:
the self-supervision graph clustering-based visit node clustering network model is used for constructing an adjacency matrix by utilizing the relationship among the nodes in the visit network, inputting the adjacency matrix and the initial node embedded representation into the visit node clustering network model based on self-supervision graph clustering for graph attention training, and obtaining a node embedded representation, wherein the node embedded representation comprises a visit node embedded representation and an entity node embedded representation;
the node embedded representation is used for reconstructing the diagnosis network and calculating the diagnosis network reconstruction error;
the decoder is used for inputting the entity node embedded representation into the neural network for training, the output of the last layer of the decoder is used as entity node reconstruction embedded representation, and entity node reconstruction errors are calculated;
the system is used for performing softmax regression operation on the embedded expression of the treatment nodes to obtain the probability distribution of the treatment nodes, and calculating clustering loss according to the probability distribution of the treatment nodes;
and the overall loss function is used for constructing the self-supervision graph clustering-based visit node clustering network model according to the visit network reconstruction error, the entity node reconstruction error and the clustering loss.
8. The chronic kidney disease subtype mining system based on self-supervision picture clustering according to claim 4, characterized in that the chronic kidney disease subtype mining model building unit specifically includes:
the self-supervision graph clustering-based visit node clustering network model is used for obtaining visit node clustering distribution as the category distribution of the visit nodes, selecting the category with the highest probability in the category distribution as the category label of the visit nodes, and arranging all the visit nodes of each patient according to the time sequence;
determining to combine or separately reserve the treatment nodes by calculating cosine similarity between category distributions of successive treatment nodes having the same category label, and constructing an event matrix by arranging the treatment nodes;
the method is used for searching for frequent event determination nodes, the frequent events are used as nodes in an event flow, the rest events directly enter an end node, each event in the frequent events is used as an initial node of the next search, a corresponding event vector is extracted to be combined into a new event matrix, the same frequent event searching operation is carried out after the first column is removed, the node obtained by each search is connected with the initial node so as to prolong the event flow until the frequent event is empty or the event flow length reaches the maximum event flow length, and a chronic kidney disease subtype mining model is obtained after the circulation is ended.
9. The chronic kidney disease subtype mining system based on unsupervised graph clustering of claim 1, wherein the chronic kidney disease subtype prediction module specifically includes:
the self-supervision graph clustering-based visit node clustering network model is used for inputting the preprocessed patient structured data into the visit node clustering network model for prediction to obtain the probability distribution of the visit node of the patient;
the cluster type of the treatment nodes is judged according to the probability distribution of the treatment nodes, and a treatment event sequence is constructed;
and the system is used for inputting the visit event sequence into the chronic kidney disease subtype mining model, fitting nodes in the chronic kidney disease subtype mining model according to the sequence to obtain an event flow, and judging which chronic kidney disease subtype belongs to through the event flow.
CN202210980822.5A 2022-08-16 2022-08-16 Chronic nephropathy subtype mining system based on self-supervision graph clustering Active CN115083616B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210980822.5A CN115083616B (en) 2022-08-16 2022-08-16 Chronic nephropathy subtype mining system based on self-supervision graph clustering
JP2023092731A JP7404581B1 (en) 2022-08-16 2023-06-05 Chronic nephropathy subtype mining system based on self-supervised graph clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210980822.5A CN115083616B (en) 2022-08-16 2022-08-16 Chronic nephropathy subtype mining system based on self-supervision graph clustering

Publications (2)

Publication Number Publication Date
CN115083616A true CN115083616A (en) 2022-09-20
CN115083616B CN115083616B (en) 2022-11-08

Family

ID=83244725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210980822.5A Active CN115083616B (en) 2022-08-16 2022-08-16 Chronic nephropathy subtype mining system based on self-supervision graph clustering

Country Status (2)

Country Link
JP (1) JP7404581B1 (en)
CN (1) CN115083616B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116364299A (en) * 2023-03-30 2023-06-30 之江实验室 Disease diagnosis and treatment path clustering method and system based on heterogeneous information network

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108231201A (en) * 2018-01-25 2018-06-29 华中科技大学 A kind of construction method, system and the application of disease data analyzing and processing model
CN108417271A (en) * 2018-01-11 2018-08-17 复旦大学 Mental inhibitor object based on phrenoblabia Subtypes recommends method and system
CN109830303A (en) * 2019-02-01 2019-05-31 上海众恒信息产业股份有限公司 Clinical data mining analysis and aid decision-making method based on internet integration medical platform
WO2021096932A1 (en) * 2019-11-13 2021-05-20 Memorial Sloan Kettering Cancer Center Classifier models to predict tissue of origin from targeted tumor dna sequencing
CN112992370A (en) * 2021-05-06 2021-06-18 四川大学华西医院 Unsupervised electronic medical record-based medical behavior compliance assessment method
CN113161001A (en) * 2021-05-12 2021-07-23 东北大学 Process path mining method based on improved LDA
CN114049966A (en) * 2022-01-12 2022-02-15 中国科学院计算机网络信息中心 Food-borne disease outbreak identification method and system based on link prediction
CN114093445A (en) * 2021-11-18 2022-02-25 重庆邮电大学 Patient screening and marking method based on multi-label learning
CN114242194A (en) * 2021-12-07 2022-03-25 深圳市云影医疗科技有限公司 Natural language processing device and method for medical image diagnosis report based on artificial intelligence
CN114639483A (en) * 2022-03-23 2022-06-17 浙江大学 Electronic medical record retrieval method and device based on graph neural network
CN114664463A (en) * 2022-03-18 2022-06-24 中南大学湘雅医院 General practitioner diagnoses auxiliary system
CN114864107A (en) * 2021-02-03 2022-08-05 阿里巴巴集团控股有限公司 Clinical pathway variation analysis method, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920547A (en) 2019-03-05 2019-06-21 北京工业大学 A kind of diabetes prediction model construction method based on electronic health record data mining

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108417271A (en) * 2018-01-11 2018-08-17 复旦大学 Mental inhibitor object based on phrenoblabia Subtypes recommends method and system
CN108231201A (en) * 2018-01-25 2018-06-29 华中科技大学 A kind of construction method, system and the application of disease data analyzing and processing model
CN109830303A (en) * 2019-02-01 2019-05-31 上海众恒信息产业股份有限公司 Clinical data mining analysis and aid decision-making method based on internet integration medical platform
WO2021096932A1 (en) * 2019-11-13 2021-05-20 Memorial Sloan Kettering Cancer Center Classifier models to predict tissue of origin from targeted tumor dna sequencing
CN114864107A (en) * 2021-02-03 2022-08-05 阿里巴巴集团控股有限公司 Clinical pathway variation analysis method, equipment and storage medium
CN112992370A (en) * 2021-05-06 2021-06-18 四川大学华西医院 Unsupervised electronic medical record-based medical behavior compliance assessment method
CN113161001A (en) * 2021-05-12 2021-07-23 东北大学 Process path mining method based on improved LDA
CN114093445A (en) * 2021-11-18 2022-02-25 重庆邮电大学 Patient screening and marking method based on multi-label learning
CN114242194A (en) * 2021-12-07 2022-03-25 深圳市云影医疗科技有限公司 Natural language processing device and method for medical image diagnosis report based on artificial intelligence
CN114049966A (en) * 2022-01-12 2022-02-15 中国科学院计算机网络信息中心 Food-borne disease outbreak identification method and system based on link prediction
CN114664463A (en) * 2022-03-18 2022-06-24 中南大学湘雅医院 General practitioner diagnoses auxiliary system
CN114639483A (en) * 2022-03-23 2022-06-17 浙江大学 Electronic medical record retrieval method and device based on graph neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宫雪 崔雷: ""基于医学主题词共现网络的链接预测研究"", 《情报杂志》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116364299A (en) * 2023-03-30 2023-06-30 之江实验室 Disease diagnosis and treatment path clustering method and system based on heterogeneous information network
CN116364299B (en) * 2023-03-30 2024-02-13 之江实验室 Disease diagnosis and treatment path clustering method and system based on heterogeneous information network

Also Published As

Publication number Publication date
JP7404581B1 (en) 2023-12-25
CN115083616B (en) 2022-11-08
JP2024027086A (en) 2024-02-29

Similar Documents

Publication Publication Date Title
Esfahani et al. Cardiovascular disease detection using a new ensemble classifier
CN111261282A (en) Sepsis early prediction method based on machine learning
Mattila et al. A disease state fingerprint for evaluation of Alzheimer's disease
CN108648827A (en) Cardiovascular and cerebrovascular disease Risk Forecast Method and device
CN112201330B (en) Medical quality monitoring and evaluating method combining DRGs tool and Bayesian model
CN108742513A (en) Patients with cerebral apoplexy rehabilitation prediction technique and system
CN111081379A (en) Disease probability decision method and system
CN111081381A (en) Intelligent screening method for critical indexes of prediction of nosocomial fatal gastrointestinal rebleeding
Mounika et al. Prediction of type-2 diabetes using machine learning algorithms
CN115083616B (en) Chronic nephropathy subtype mining system based on self-supervision graph clustering
CN113593708A (en) Sepsis prognosis prediction method based on integrated learning algorithm
CN114023441A (en) Severe AKI early risk assessment model and device based on interpretable machine learning model and development method thereof
Razavi et al. Predicting metastasis in breast cancer: comparing a decision tree with domain experts
Samet et al. Predicting and staging chronic kidney disease using optimized random forest algorithm
CN113128654A (en) Improved random forest model for coronary heart disease pre-diagnosis and pre-diagnosis system thereof
Thelagathoti et al. A population analysis approach using mobility data and correlation networks for depression episodes detection
Kalogiannis et al. Geriatric group analysis by clustering non-linearly embedded multi-sensor data
CN116469570A (en) Malignant tumor complication analysis method based on electronic medical record
Conforti et al. Kernel-based support vector machine classifiers for early detection of myocardial infarction
Thelagathoti et al. A data-driven approach for the analysis of behavioral disorders with a focus on classification and severity estimation
Tolentino et al. CAREdio: Health screening and heart disease prediction system for rural communities in the Philippines
Bose et al. Female Diabetic Prediction in India Using Different Learning Algorithms
Ndirangu et al. Support vector machine based disease diagnostic assistant
CN111028953B (en) Control method for prompting marking of medical data
AU2021102832A4 (en) System & method for automatic health prediction using fuzzy based machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant