CN113990495A - Disease diagnosis prediction system based on graph neural network - Google Patents

Disease diagnosis prediction system based on graph neural network Download PDF

Info

Publication number
CN113990495A
CN113990495A CN202111609275.1A CN202111609275A CN113990495A CN 113990495 A CN113990495 A CN 113990495A CN 202111609275 A CN202111609275 A CN 202111609275A CN 113990495 A CN113990495 A CN 113990495A
Authority
CN
China
Prior art keywords
disease
symptom
patient
graph
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111609275.1A
Other languages
Chinese (zh)
Other versions
CN113990495B (en
Inventor
李劲松
池胜强
王宇清
田雨
周天舒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202111609275.1A priority Critical patent/CN113990495B/en
Publication of CN113990495A publication Critical patent/CN113990495A/en
Application granted granted Critical
Publication of CN113990495B publication Critical patent/CN113990495B/en
Priority to PCT/CN2022/116970 priority patent/WO2023124190A1/en
Priority to JP2023536567A priority patent/JP7459386B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Abstract

The invention discloses a disease diagnosis prediction system based on a graph neural network, which comprises a knowledge map construction module, a data extraction and preprocessing module, a disease diagnosis model construction module and a disease diagnosis model application module. The invention effectively integrates expert knowledge and electronic medical record data in the knowledge map and constructs the heteromorphic graph network. On the heteromorphic graph network, local information and global information of the heteromorphic graph network are learned by utilizing a graph convolution neural network method. The disease diagnosis model can train the knowledge and data end to end simultaneously. In the model optimization target, besides the disease prediction task, supervision information on the knowledge relationship is added, so that the disease prediction task can effectively utilize knowledge, and the knowledge representation is not influenced by data noise. Aiming at the problems that the number of predicted diseases is large and the number of patients corresponding to part of the diseases is limited, multi-label hierarchical classification is designed for improving the prediction effect of few-sample class diseases.

Description

Disease diagnosis prediction system based on graph neural network
Technical Field
The invention belongs to the technical field of medical health information, and particularly relates to a disease diagnosis and prediction system based on a graph neural network.
Background
In the field of medical care, a plurality of knowledge maps with good organization, such as international disease classification, drug Bank, clinical guidelines and consensus, and the like, have hierarchical information and complex association relationship which accord with human cognition. A knowledge graph is a heterogeneous graph network that contains a variety of relationships. How to simultaneously utilize expert knowledge and electronic medical record data in the knowledge map and integrate the knowledge and the data for modeling has an important role in disease diagnosis and prediction.
The existing method for predicting diseases based on a graph neural network model lacks a method for effectively fusing a medical knowledge graph and electronic medical record data to construct a heteromorphic graph network. The main methods at present are as follows: (1) data-based graphical network modeling: constructing a graph network based on the electronic medical record data, and predicting diseases by utilizing a graph neural network model; the method does not fully utilize existing sources of medical knowledge. (2) Knowledge representation learning and disease prediction staged modeling approach: performing expression learning on the medical knowledge map to obtain vector expression of knowledge, and then integrating the vector expression into electronic medical record data to perform disease prediction; the staged training approach does not yield a knowledge representation that is best suited for disease prediction. (3) End-to-end modeling methods that focus only on disease prediction tasks: fusing medical knowledge maps and electronic medical record data, constructing a heteromorphic graph network, and predicting diseases by utilizing a graph neural network model; although the method solves the defects existing in the two methods, the learned knowledge representation is possibly influenced by noise in data because the model only optimizes the disease prediction task.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a disease diagnosis and prediction system based on a graph neural network.
The purpose of the invention is realized by the following technical scheme: a graphical neural network based disease diagnosis prediction system, the system comprising:
(1) a knowledge graph construction module: constructing a disease-symptom knowledge map based on the medical knowledge source;
(2) the data extraction and pretreatment module: extracting electronic medical record data of a patient from the electronic medical record system, wherein the electronic medical record data comprises disease diagnosis and symptom data of the patient and is stored in a triple form;
(3) the disease diagnosis model building module: performing graph neural network learning and predictive modeling on disease-symptom knowledge maps and electronic medical record data, comprising:
constructing a heterogeneous graph network, wherein the heterogeneous graph network comprises a disease-symptom subgraph constructed by extracting disease-symptom relations from a disease-symptom knowledge graph and a patient-symptom subgraph constructed by utilizing patient disease diagnosis and symptom data in a triple form;
constructing a disease diagnosis model, wherein the disease diagnosis model consists of a graph encoder and a graph decoder;
the graph encoder is realized on the basis of a graph convolution neural network, the input of the graph encoder is node initial embedded representation of diseases, symptoms and patients obtained by utilizing a disease-symptom co-occurrence matrix, a disease-symptom adjacent matrix and a patient-symptom adjacent matrix, different types of nodes transmit information through connecting edges, the node embedded representation of the diseases, the symptoms and the patients is obtained through node embedded representation updating operation, and the graph encoder is input;
the graph decoder performs multi-task learning using node-embedded representations, including three parts:
a) multi-label hierarchical classification of patient disease diagnosis prognosis: constructing a disease hierarchical relation by using a disease hierarchical structure, wherein the disease hierarchical relation comprises a disease layer needing diagnosis and prediction and a disease system classification layer obtained according to medical knowledge; constructing a multi-label hierarchical classifier, and designing a loss function of the multi-label hierarchical classification;
b) disease comparison and learning: constructing a disease pair system category discriminator, calculating the distance between two diseases in a disease pair, and designing a loss function for disease comparison learning;
c) disease-symptom relationship learning: constructing a disease-symptom relation learning device, calculating the probability of the incidence relation between the disease and the symptom in the disease-symptom pair, and designing a loss function for the disease-symptom relation learning;
adding the loss function of the multi-label hierarchical classification, the loss function of the disease contrast learning and the loss function of the disease-symptom relation learning to obtain a loss function of a disease diagnosis model;
(4) disease diagnosis model application module: and (4) performing disease diagnosis prediction on the input symptoms of the new patient by using the disease diagnosis model.
Further, in the knowledge graph building module, the disease-symptom knowledge graph comprises a disease, a symptom two node type and a disease-symptom one relation.
Further, the heteromorphic graph network is constructed based on a disease-symptom knowledge graph and electronic medical record data and comprises three node types of diseases, symptoms and patients, wherein the symptoms are intermediate nodes connected between the diseases and the patients, and the heteromorphic graph network integrates relationship subgraphs related to the diseases and the symptoms in the disease-symptom knowledge graph and relationship subgraphs related to the patients and the symptoms in the electronic medical record data.
Further, the heterogeneous graph network
Figure 19150DEST_PATH_IMAGE001
Expressed as:
Figure 216914DEST_PATH_IMAGE002
wherein the node set
Figure 374357DEST_PATH_IMAGE003
D, S, P are the disease set, symptom set, and patient set, respectively,
Figure 934651DEST_PATH_IMAGE004
Figure 253768DEST_PATH_IMAGE005
Figure 266723DEST_PATH_IMAGE006
Figure 756086DEST_PATH_IMAGE007
Figure 905307DEST_PATH_IMAGE008
Figure 316697DEST_PATH_IMAGE009
respectively representing the disease type, symptom type and patient number; edge set
Figure 302102DEST_PATH_IMAGE010
Figure 847353DEST_PATH_IMAGE011
Figure 601813DEST_PATH_IMAGE012
Figure 184104DEST_PATH_IMAGE013
Respectively, a disease-symptom relationship stored in a disease-symptom adjacency matrix and a patient-symptom relationship stored in a patient-symptom adjacency matrix.
Further, the generating of the node initial embedded representation comprises:
construction of disease-symptom co-occurrence matrices
Figure 968390DEST_PATH_IMAGE014
Matrix of
Figure 740168DEST_PATH_IMAGE015
To (1) a
Figure 801665DEST_PATH_IMAGE016
Line and first
Figure 679491DEST_PATH_IMAGE017
Is listed as
Figure 654136DEST_PATH_IMAGE018
Indicating a diagnosis of a disease in electronic medical record data
Figure 478873DEST_PATH_IMAGE019
In patients in whom symptoms appear
Figure 394876DEST_PATH_IMAGE020
The number of (2);
to pair
Figure 194336DEST_PATH_IMAGE015
Performing row normalization to obtain
Figure 890896DEST_PATH_IMAGE021
Disease of
Figure 191428DEST_PATH_IMAGE022
Is expressed as
Figure 774987DEST_PATH_IMAGE023
I.e. by
Figure 260195DEST_PATH_IMAGE024
To (1) a
Figure 929205DEST_PATH_IMAGE016
A row;
to pair
Figure 767848DEST_PATH_IMAGE015
Performing column normalization to obtain
Figure 986340DEST_PATH_IMAGE025
Symptoms of
Figure 390252DEST_PATH_IMAGE026
Is expressed as
Figure 467929DEST_PATH_IMAGE027
I.e. by
Figure 172580DEST_PATH_IMAGE025
To (1) a
Figure 793048DEST_PATH_IMAGE016
Columns;
calculating the patient
Figure 557742DEST_PATH_IMAGE028
Initial embedded representation of
Figure 388295DEST_PATH_IMAGE029
The calculation formula is as follows:
Figure 647369DEST_PATH_IMAGE030
wherein the content of the first and second substances,
Figure 637190DEST_PATH_IMAGE031
for the patient
Figure 323518DEST_PATH_IMAGE028
The number of symptoms of (a).
Further, the initial embedded representations of different types of nodes are respectively input into a multi-layer perceptron to obtain the initial embedded representations of the same dimension, and then input into a graph encoder.
Further, in the picture encoder, for diseases
Figure 110208DEST_PATH_IMAGE022
Of 1 at
Figure 687820DEST_PATH_IMAGE032
Node-embedded representation of a layer
Figure 285810DEST_PATH_IMAGE033
The calculation formula is as follows:
Figure 392306DEST_PATH_IMAGE034
for symptoms
Figure 666293DEST_PATH_IMAGE026
Of 1 at
Figure 329487DEST_PATH_IMAGE032
Node-embedded representation of a layer
Figure 231584DEST_PATH_IMAGE035
The calculation formula is as follows:
Figure 259714DEST_PATH_IMAGE036
for the patient
Figure 552155DEST_PATH_IMAGE037
Of 1 at
Figure 940411DEST_PATH_IMAGE032
Node-embedded representation of a layer
Figure 510063DEST_PATH_IMAGE038
The calculation formula is as follows:
Figure 958362DEST_PATH_IMAGE039
wherein the content of the first and second substances,
Figure 472520DEST_PATH_IMAGE040
is the function of the activation of the function,
Figure 412270DEST_PATH_IMAGE041
Figure 85697DEST_PATH_IMAGE042
are respectively the first
Figure 455629DEST_PATH_IMAGE032
A disease-symptom associated weight matrix and a patient-symptom associated weight matrix obtained by training a layer disease diagnosis model;
Figure 191504DEST_PATH_IMAGE043
are respectively diseases
Figure 249459DEST_PATH_IMAGE022
Symptoms of
Figure 465808DEST_PATH_IMAGE026
And the patient
Figure 193592DEST_PATH_IMAGE044
In the first place
Figure 10238DEST_PATH_IMAGE045
A node-embedded representation of a layer;
Figure 560300DEST_PATH_IMAGE046
indicating a disease
Figure 677160DEST_PATH_IMAGE022
A set of adjacent symptom nodes is provided,
Figure 575846DEST_PATH_IMAGE047
indicating symptoms
Figure 356153DEST_PATH_IMAGE026
A set of adjacent disease nodes, wherein the disease nodes are selected,
Figure 959172DEST_PATH_IMAGE048
indicating symptoms
Figure 681272DEST_PATH_IMAGE026
A set of adjacent patient nodes that are,
Figure 16438DEST_PATH_IMAGE049
representing the patient
Figure 276518DEST_PATH_IMAGE044
A set of adjacent symptom nodes.
Further, in the graphical decoder, the multi-label hierarchical classification of the patient disease diagnosis prediction comprises:
constructing disease hierarchy relationship, and recording the disease types of disease layers
Figure 230699DEST_PATH_IMAGE050
Disease System Classification level
Figure 259835DEST_PATH_IMAGE051
Figure 765902DEST_PATH_IMAGE052
Figure 264011DEST_PATH_IMAGE053
Number of disease system classifications;
construction of a container containing
Figure 5571DEST_PATH_IMAGE054
A multi-label level classifier of a plurality of classifiers,
Figure 902595DEST_PATH_IMAGE054
a two classifiers as
Figure 579564DEST_PATH_IMAGE055
Figure 814236DEST_PATH_IMAGE056
Figure 110220DEST_PATH_IMAGE057
(ii) a The patient is treated
Figure 113948DEST_PATH_IMAGE044
Node-embedded representation of respective inputs
Figure 961818DEST_PATH_IMAGE054
A two classifiers to obtain
Figure 434519DEST_PATH_IMAGE054
A prediction probability, is
Figure 517881DEST_PATH_IMAGE058
Therein, two classifiers
Figure 392428DEST_PATH_IMAGE059
The corresponding label classifies the disease system of the patient; two-classifier
Figure 145620DEST_PATH_IMAGE060
The corresponding label is the disease diagnosis of the patient and the corresponding model parameters are
Figure 354884DEST_PATH_IMAGE061
Calculating the patient
Figure 995600DEST_PATH_IMAGE037
The appearance of disease
Figure 973920DEST_PATH_IMAGE062
Probability of (2)
Figure 163593DEST_PATH_IMAGE063
Wherein, in the step (A),
Figure 345307DEST_PATH_IMAGE064
Figure 973734DEST_PATH_IMAGE065
is a classifier of two
Figure 885190DEST_PATH_IMAGE066
Predicting the presence of disease in a patient
Figure 980185DEST_PATH_IMAGE062
The probability of (d); hypothesis of disease
Figure 164041DEST_PATH_IMAGE062
Is classified into
Figure 81313DEST_PATH_IMAGE067
Figure 96542DEST_PATH_IMAGE068
Is a classifier of two
Figure 362439DEST_PATH_IMAGE069
Predicting whether a patient presents with a systemic classification of disease
Figure 515815DEST_PATH_IMAGE067
The probability of (d);
computing a loss function for multi-label hierarchical classification
Figure 282783DEST_PATH_IMAGE070
The formula is as follows:
Figure 840934DEST_PATH_IMAGE071
Figure 543311DEST_PATH_IMAGE072
Figure 232918DEST_PATH_IMAGE073
Figure 491992DEST_PATH_IMAGE074
wherein the content of the first and second substances,
Figure 91601DEST_PATH_IMAGE075
for the patient
Figure 292775DEST_PATH_IMAGE076
The appearance of disease
Figure 423673DEST_PATH_IMAGE062
The real label of (a) is,
Figure 798023DEST_PATH_IMAGE077
for the patient
Figure 986559DEST_PATH_IMAGE076
The disease diagnosis of (a) corresponds to a true label of a disease system classification,
Figure 670936DEST_PATH_IMAGE078
the norm of L1 is shown,
Figure 600715DEST_PATH_IMAGE079
for disease
Figure 467170DEST_PATH_IMAGE080
And disease
Figure 369267DEST_PATH_IMAGE081
The similarity between the two is calculated according to the following formula:
Figure 397397DEST_PATH_IMAGE082
wherein the content of the first and second substances,
Figure 424259DEST_PATH_IMAGE083
respectively indicate diseases
Figure 874832DEST_PATH_IMAGE080
And disease
Figure 116589DEST_PATH_IMAGE081
The distribution of the real label of (a) is,
Figure 768150DEST_PATH_IMAGE084
Figure 344625DEST_PATH_IMAGE085
and
Figure 346691DEST_PATH_IMAGE086
respectively represent the patients
Figure 957801DEST_PATH_IMAGE076
The appearance of disease
Figure 514685DEST_PATH_IMAGE080
And disease
Figure 329188DEST_PATH_IMAGE081
The real tag of (1).
Further, in the image decoder, the disease contrast learning includes:
combining the diseases in the disease set D in pairs to obtain a disease pair set DD with the number of disease pairs
Figure 387143DEST_PATH_IMAGE087
(ii) a Any disease pair in pair DD
Figure 603491DEST_PATH_IMAGE088
Disease pair tags if two diseases belong to the same phylogenetic classification
Figure 331276DEST_PATH_IMAGE089
If the two diseases belong to different phylogenetic classes, then
Figure 882343DEST_PATH_IMAGE090
Construction of disease-to-System class Distinguishing device
Figure 229142DEST_PATH_IMAGE091
To treat diseases
Figure 814844DEST_PATH_IMAGE088
Node-embedded representation of two diseases
Figure 795088DEST_PATH_IMAGE092
Input device
Figure 505555DEST_PATH_IMAGE091
In (1), calculating the distance between two diseases
Figure 639733DEST_PATH_IMAGE093
Figure 830674DEST_PATH_IMAGE094
Wherein the content of the first and second substances,
Figure 900262DEST_PATH_IMAGE095
represents the L2 norm;
calculating loss function for disease contrast learning
Figure 425921DEST_PATH_IMAGE096
The formula is as follows:
Figure 114522DEST_PATH_IMAGE097
wherein the content of the first and second substances,mlower bounds on the distance between representations are embedded for different disease system classes.
Further, in the graph encoder, the disease-symptom relationship learning includes:
selecting a disease and a symptom from the disease set D and the symptom set S respectively to obtain a disease-symptom pair set DS, wherein the number of the disease-symptom pairs is
Figure 409237DEST_PATH_IMAGE098
(ii) a For any disease-symptom pair in DS
Figure 915305DEST_PATH_IMAGE099
Disease-symptom pair labels if there is a relationship between the disease-symptom in the disease-symptom knowledge map
Figure 413414DEST_PATH_IMAGE100
If no association exists, then
Figure 154974DEST_PATH_IMAGE101
Construction of disease-symptom relationship learner
Figure 51998DEST_PATH_IMAGE102
Will be
Figure 728967DEST_PATH_IMAGE103
Node-embedded representation of diseases and symptoms in (1)
Figure 760377DEST_PATH_IMAGE104
Input device
Figure 994043DEST_PATH_IMAGE102
In, calculate
Figure 263350DEST_PATH_IMAGE103
The probability of the disease being associated with the symptoms
Figure 111221DEST_PATH_IMAGE105
Figure 849501DEST_PATH_IMAGE106
Wherein the content of the first and second substances,
Figure 214754DEST_PATH_IMAGE107
representing a sigmoid function;
calculating a loss function for disease-symptom relationship learning
Figure 338568DEST_PATH_IMAGE108
The formula is as follows:
Figure 91760DEST_PATH_IMAGE109
the invention has the beneficial effects that: the invention effectively integrates expert knowledge and electronic medical record data in the knowledge map and constructs the heteromorphic graph network. On the heteromorphic graph network, local information and global information of the heteromorphic graph network are learned by utilizing a graph convolution neural network method. The disease diagnosis model can train the knowledge and data end to end simultaneously. In the model optimization target, besides the disease prediction task, supervision information (a disease comparison learning part and a disease-symptom relationship learning part) on the knowledge relationship is added, so that the disease prediction task can effectively utilize knowledge, and the knowledge representation is not influenced by data noise. Aiming at the problems that the number of predicted diseases is large and the number of patients corresponding to part of the diseases is limited, multi-label hierarchical classification is designed for improving the prediction effect of few-sample class diseases.
Drawings
FIG. 1 is a diagram of a disease diagnosis and prognosis system based on a graph neural network according to an embodiment of the present invention;
fig. 2 is a diagram of a heterogeneous graph network structure according to an embodiment of the present invention;
FIG. 3 is a diagram of a disease diagnosis model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a disease hierarchy provided by an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
The embodiment of the invention provides a disease diagnosis and prediction system based on a graph neural network, which comprises a knowledge graph construction module, a data extraction and preprocessing module, a disease diagnosis model construction module and a disease diagnosis model application module, wherein the implementation process of each module is explained in detail below, as shown in fig. 1.
A knowledge graph construction module: the disease-symptom knowledge map is constructed based on SNOMED-CT, HPO and other medical knowledge sources, and comprises two node types of diseases and symptoms and a relation between the diseases and the symptoms.
The data extraction and pretreatment module: electronic medical record data of the patient, including disease diagnosis and symptom data of the patient, are extracted from the electronic medical record system and stored in a triple form.
The disease diagnosis model building module: and (4) carrying out graph neural network learning and prediction modeling on the disease-symptom knowledge graph and the electronic medical record data.
Disease diagnosis model application module: and (4) performing disease diagnosis prediction on the input symptoms of the new patient by using the disease diagnosis model.
The disease diagnosis model building module has the specific functions of: set of given diseases
Figure 105285DEST_PATH_IMAGE110
And the symptom set
Figure 195601DEST_PATH_IMAGE111
And the patient set
Figure 924653DEST_PATH_IMAGE006
Wherein, in the step (A),
Figure 176643DEST_PATH_IMAGE112
Figure 545308DEST_PATH_IMAGE113
Figure 721205DEST_PATH_IMAGE114
respectively, the disease type, symptom type and patient number. Disease diagnosis prediction is considered to be a multi-label classification problem, i.e. a disease diagnosis model is able to predict a disease diagnosis for a patient given the patient's symptoms.
The implementation of the disease diagnosis model comprises:
(1) heterogeneous graph network construction
Constructing a heterogeneous graph network containing three node types of diseases, symptoms and patients by using a disease-symptom knowledge graph and electronic medical record data
Figure 819611DEST_PATH_IMAGE115
Wherein the symptom is an intermediate node connecting between the disease and the patient. The heterogeneous graph network integrates the relationship subgraphs related to diseases and symptoms in a disease-symptom knowledge graph and the relationship subgraphs related to patients and symptoms in electronic medical record data, including the disease-symptom subgraphs
Figure 914606DEST_PATH_IMAGE116
And patient-symptom subgraph
Figure 849195DEST_PATH_IMAGE117
Heterogeneous graph networks
Figure 78051DEST_PATH_IMAGE115
Can be expressed as:
Figure 513187DEST_PATH_IMAGE002
wherein the node set
Figure 44663DEST_PATH_IMAGE118
Edge set
Figure 450236DEST_PATH_IMAGE010
Figure 233516DEST_PATH_IMAGE119
Figure 40935DEST_PATH_IMAGE120
Figure 743312DEST_PATH_IMAGE013
Respectively, a disease-symptom relationship and a patient-symptom relationship, the disease-symptom relationship being stored in a disease-symptom adjacency matrix and the patient-symptom relationship being stored in a patient-symptom adjacency matrix.
FIG. 2 is an example of a heterogeneous graph network architecture including 4 patients
Figure 386914DEST_PATH_IMAGE121
4 kinds of diseases
Figure 957572DEST_PATH_IMAGE122
4 symptoms
Figure 370230DEST_PATH_IMAGE123
And patient-symptom relationships, disease-symptom relationships.
(2) Subgraph construction
Disease-symptom subgraph
Figure 243508DEST_PATH_IMAGE116
: and extracting the disease-symptom relationship from the disease-symptom knowledge map to construct a disease-symptom subgraph.
Patient-symptom subgraph
Figure 623674DEST_PATH_IMAGE117
: patient-symptom sub-graphs are constructed using patient disease diagnosis and symptom data in a ternary format.
(3) Disease diagnosis model structure
Fig. 3 is a structural example of a disease diagnosis model. And obtaining node initial embedded representation of the disease, symptom and patient by using the disease-symptom co-occurrence matrix. The node initial embedded representation and adjacency matrix are used as inputs to a disease diagnosis model. The disease diagnosis model is composed of a graph encoder and a graph decoder. The specific steps of the generation of the node initial embedded representation, the graph encoder and the graph decoder are seen in (4) - (6).
(4) Generation of an initial embedded representation of a node
First, a disease-symptom co-occurrence matrix is constructed
Figure 486106DEST_PATH_IMAGE014
Matrix of
Figure 940221DEST_PATH_IMAGE015
To (1) a
Figure 46718DEST_PATH_IMAGE016
Line and first
Figure 930491DEST_PATH_IMAGE017
Is listed as
Figure 842952DEST_PATH_IMAGE018
Indicating a diagnosis of a disease in electronic medical record data
Figure 151574DEST_PATH_IMAGE022
In patients of (1), symptoms appear
Figure 179704DEST_PATH_IMAGE020
The number of the cells. Then, to
Figure 534462DEST_PATH_IMAGE015
Performing row normalization to obtain
Figure 735767DEST_PATH_IMAGE024
Disease of
Figure 164475DEST_PATH_IMAGE022
Is expressed as
Figure 612774DEST_PATH_IMAGE023
I.e. by
Figure 264947DEST_PATH_IMAGE021
To (1) a
Figure 456894DEST_PATH_IMAGE016
A row; to pair
Figure 740108DEST_PATH_IMAGE015
Performing column normalization to obtain
Figure 172357DEST_PATH_IMAGE025
Symptoms of
Figure 236128DEST_PATH_IMAGE026
Is expressed as
Figure 169449DEST_PATH_IMAGE124
I.e. by
Figure 120219DEST_PATH_IMAGE025
To (1) a
Figure 972637DEST_PATH_IMAGE016
And (4) columns. Then, the patient is calculated
Figure 461387DEST_PATH_IMAGE125
Initial embedded representation of
Figure 745869DEST_PATH_IMAGE029
The calculation formula is as follows:
Figure 128309DEST_PATH_IMAGE126
wherein the content of the first and second substances,
Figure 120272DEST_PATH_IMAGE031
for the patient
Figure 830739DEST_PATH_IMAGE125
The number of symptoms of (a).
(5) Picture coder
Firstly, the initial embedded representations of different types of nodes are respectively input into a multi-layer perceptron to obtain the initial embedded representations of the same dimension, and then the initial embedded representations are input into a graph encoder. The graph encoder is implemented based on a graph convolution neural network.
In the graph encoder, different types of nodes can transmit information through connecting edges in the graph to integrate information of other types of nodes. For diseases
Figure 168180DEST_PATH_IMAGE022
Of 1 at
Figure 375432DEST_PATH_IMAGE032
Node-embedded representation of a layer
Figure 772915DEST_PATH_IMAGE033
The calculation formula is as follows:
Figure 108694DEST_PATH_IMAGE127
for symptoms
Figure 984246DEST_PATH_IMAGE026
Of 1 at
Figure 216645DEST_PATH_IMAGE032
Node-embedded representation of a layer
Figure 270182DEST_PATH_IMAGE035
The calculation formula is as follows:
Figure 345455DEST_PATH_IMAGE036
for the patient
Figure 696802DEST_PATH_IMAGE128
Of 1 at
Figure 596756DEST_PATH_IMAGE032
Node-embedded representation of a layer
Figure 398358DEST_PATH_IMAGE038
The calculation formula is as follows:
Figure 383763DEST_PATH_IMAGE039
wherein the content of the first and second substances,
Figure 866697DEST_PATH_IMAGE040
is the function of the activation of the function,
Figure 808108DEST_PATH_IMAGE041
Figure 268695DEST_PATH_IMAGE042
are respectively the first
Figure 990664DEST_PATH_IMAGE032
A disease-symptom associated weight matrix and a patient-symptom associated weight matrix obtained by training a layer disease diagnosis model;
Figure 949392DEST_PATH_IMAGE043
are disease nodes respectively
Figure 823939DEST_PATH_IMAGE022
Symptom node
Figure 967344DEST_PATH_IMAGE026
Patient node
Figure 927341DEST_PATH_IMAGE076
In the first place
Figure 424181DEST_PATH_IMAGE129
Node-embedded representation of layers, the total number of layers of the graph encoder being
Figure 402502DEST_PATH_IMAGE130
Figure 467541DEST_PATH_IMAGE046
Representing disease nodes
Figure 164101DEST_PATH_IMAGE022
A set of adjacent symptom nodes is provided,
Figure 199053DEST_PATH_IMAGE131
node representing symptom
Figure 45262DEST_PATH_IMAGE026
A set of adjacent disease nodes, wherein the disease nodes are selected,
Figure 530470DEST_PATH_IMAGE132
node representing symptom
Figure 465059DEST_PATH_IMAGE026
A set of adjacent patient nodes that are,
Figure 303702DEST_PATH_IMAGE049
representing patient nodes
Figure 256615DEST_PATH_IMAGE076
A set of adjacent symptom nodes.
Figure 663456DEST_PATH_IMAGE046
Figure 69030DEST_PATH_IMAGE131
Obtained by a disease-symptom adjacency matrix,
Figure 976943DEST_PATH_IMAGE132
Figure 535094DEST_PATH_IMAGE049
obtained by the patient-symptom adjacency matrix. By repeatedly performing the above-described node-embedded representation update operation
Figure 96526DEST_PATH_IMAGE130
Next, a disease, symptom, and patient node embedded representation that can sufficiently capture the association relationship is obtained.
(6) Graphic decoder
The nodes derived by the graph encoder are embedded in a representation input graph encoder. In the graph decoder, multi-task learning is performed using node-embedded representations.
First, multi-label hierarchical classification of patient disease diagnosis prognosis is performed.
First, a disease hierarchical relationship is constructed using a hierarchical structure of diseases, as shown in fig. 4. Wherein the content of the first and second substances,
Figure 927079DEST_PATH_IMAGE133
the layer is the disease in the disease set D, i.e., the disease to be diagnosed and predicted, and the disease type is as described above
Figure 911784DEST_PATH_IMAGE050
Figure 901606DEST_PATH_IMAGE134
Layers are a systematic classification of diseases based on medical knowledge, denoted
Figure 587933DEST_PATH_IMAGE135
Figure 905782DEST_PATH_IMAGE053
Is composed of
Figure 748973DEST_PATH_IMAGE134
Number of disease system classifications for a layer.
Next, constructing a structure comprising
Figure 281717DEST_PATH_IMAGE136
A multi-label level classifier of a plurality of classifiers,
Figure 325896DEST_PATH_IMAGE136
a two classifiers as
Figure 193358DEST_PATH_IMAGE055
Figure 856552DEST_PATH_IMAGE137
. The patient is treated
Figure 493070DEST_PATH_IMAGE037
Node-embedded representation of respective inputs
Figure 708150DEST_PATH_IMAGE136
A two classifiers to obtain
Figure 545132DEST_PATH_IMAGE136
A prediction probability, is
Figure 261284DEST_PATH_IMAGE138
. Wherein the content of the first and second substances,
Figure 503040DEST_PATH_IMAGE057
sorter
Figure 889022DEST_PATH_IMAGE059
The corresponding label classifies the disease system of the patient; classifier
Figure 996656DEST_PATH_IMAGE139
The corresponding label is the disease diagnosis of the patient and the corresponding model parameters are
Figure 736073DEST_PATH_IMAGE061
Then, the patient is calculated
Figure 81603DEST_PATH_IMAGE044
The appearance of disease
Figure 982695DEST_PATH_IMAGE062
Probability of (2)
Figure 984149DEST_PATH_IMAGE063
Wherein, in the step (A),
Figure 510945DEST_PATH_IMAGE140
Figure 995803DEST_PATH_IMAGE065
for a classifier
Figure 723587DEST_PATH_IMAGE141
Predicting the presence of disease in a patient
Figure 274654DEST_PATH_IMAGE062
The probability of (d); hypothesis of disease
Figure 621453DEST_PATH_IMAGE062
Is classified into
Figure 941576DEST_PATH_IMAGE067
Figure 918890DEST_PATH_IMAGE068
For a classifier
Figure 894937DEST_PATH_IMAGE069
Predicting whether a patient presents with a systemic classification of disease
Figure 232377DEST_PATH_IMAGE067
The probability of (c).
Finally, a loss function of multi-label hierarchical classification is calculated
Figure 954477DEST_PATH_IMAGE070
The formula is as follows:
Figure 289643DEST_PATH_IMAGE071
Figure 549723DEST_PATH_IMAGE072
Figure 500974DEST_PATH_IMAGE073
Figure 795689DEST_PATH_IMAGE142
wherein the content of the first and second substances,
Figure 36178DEST_PATH_IMAGE075
for the patient
Figure 799865DEST_PATH_IMAGE044
The appearance of disease
Figure 541425DEST_PATH_IMAGE062
The real label of (a) is,
Figure 441379DEST_PATH_IMAGE077
for the patient
Figure 852769DEST_PATH_IMAGE044
The disease diagnosis of (a) corresponds to a true label of the systematic classification,
Figure 87441DEST_PATH_IMAGE078
the norm of L1 is shown,
Figure 383425DEST_PATH_IMAGE079
for disease
Figure 387153DEST_PATH_IMAGE080
And disease
Figure 235023DEST_PATH_IMAGE081
The similarity between the two is calculated according to the following formula:
Figure 722372DEST_PATH_IMAGE082
wherein the content of the first and second substances,
Figure 805735DEST_PATH_IMAGE083
respectively indicate diseases
Figure 680281DEST_PATH_IMAGE080
And disease
Figure 699052DEST_PATH_IMAGE081
The distribution of the real label of (a) is,
Figure 439475DEST_PATH_IMAGE084
Figure 280524DEST_PATH_IMAGE085
and
Figure 258844DEST_PATH_IMAGE086
respectively represent the patients
Figure 448517DEST_PATH_IMAGE143
The appearance of disease
Figure 895810DEST_PATH_IMAGE080
And disease
Figure 55396DEST_PATH_IMAGE081
The real tag of (1).
Second, disease contrast learning is performed.
Firstly, combining the diseases in the disease set D in pairs to obtain a disease pair set DD, wherein the number of the disease pairs is
Figure 825906DEST_PATH_IMAGE087
. Any disease pair in pair DD
Figure 262179DEST_PATH_IMAGE088
Disease pair tags if two diseases belong to the same phylogenetic classification
Figure 180456DEST_PATH_IMAGE144
If the two diseases belong to different phylogenetic classes, then
Figure 160044DEST_PATH_IMAGE090
Then, a disease-to-system type discriminator is constructed
Figure 112957DEST_PATH_IMAGE091
. Will be ill to
Figure 988640DEST_PATH_IMAGE088
Node-embedded representation of two diseases
Figure 394214DEST_PATH_IMAGE092
Input device
Figure 36548DEST_PATH_IMAGE145
In (1), calculating the distance between two diseases
Figure 391437DEST_PATH_IMAGE146
Figure 156130DEST_PATH_IMAGE147
Wherein the content of the first and second substances,
Figure 986683DEST_PATH_IMAGE095
representing the L2 norm.
Finally, a loss function of disease contrast learning is calculated
Figure 248687DEST_PATH_IMAGE148
The formula is as follows:
Figure 238509DEST_PATH_IMAGE149
wherein the content of the first and second substances,mlower bounds on the distance between representations are embedded for different disease system classes.
Thirdly, learning of disease-symptom relationships is performed.
Firstly, a disease and a symptom are respectively selected from a disease set D and a symptom set S to obtain a disease-symptom pair set DS, and the number of the disease-symptom pairs is
Figure 924836DEST_PATH_IMAGE098
. For any disease-symptom pair in DS
Figure 977106DEST_PATH_IMAGE099
If the disease-symptom is associated in the disease-symptom knowledge map, the disease-symptom pair label
Figure 289138DEST_PATH_IMAGE150
If no association exists, then
Figure 884199DEST_PATH_IMAGE101
Then, a disease-symptom relationship learning device is constructed
Figure 990695DEST_PATH_IMAGE102
Will be
Figure 530261DEST_PATH_IMAGE103
Node-embedded representation of diseases and symptoms in (1)
Figure 927875DEST_PATH_IMAGE104
Input device
Figure 564393DEST_PATH_IMAGE102
In (1), calculating disease-symptom pairs
Figure 855173DEST_PATH_IMAGE103
The probability of the disease being associated with the symptoms
Figure 147614DEST_PATH_IMAGE105
Figure 535870DEST_PATH_IMAGE106
Wherein the content of the first and second substances,
Figure 839943DEST_PATH_IMAGE107
representing the sigmoid function.
Finally, a loss function for learning disease-symptom relationship is calculated
Figure 225925DEST_PATH_IMAGE108
The formula is as follows:
Figure 802400DEST_PATH_IMAGE151
loss function of disease diagnosis model
Figure 807396DEST_PATH_IMAGE152
The definition is as follows:
Figure 418506DEST_PATH_IMAGE153
the foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (10)

1. A disease diagnosis prediction system based on a graph neural network, comprising:
(1) a knowledge graph construction module: constructing a disease-symptom knowledge map based on the medical knowledge source;
(2) the data extraction and pretreatment module: extracting electronic medical record data of a patient from the electronic medical record system, wherein the electronic medical record data comprises disease diagnosis and symptom data of the patient and is stored in a triple form;
(3) the disease diagnosis model building module: performing graph neural network learning and predictive modeling on disease-symptom knowledge maps and electronic medical record data, comprising:
constructing a heterogeneous graph network, wherein the heterogeneous graph network comprises a disease-symptom subgraph constructed by extracting disease-symptom relations from a disease-symptom knowledge graph and a patient-symptom subgraph constructed by utilizing patient disease diagnosis and symptom data in a triple form;
constructing a disease diagnosis model, wherein the disease diagnosis model consists of a graph encoder and a graph decoder;
the graph encoder is realized on the basis of a graph convolution neural network, the input of the graph encoder is node initial embedded representation of diseases, symptoms and patients obtained by utilizing a disease-symptom co-occurrence matrix, a disease-symptom adjacent matrix and a patient-symptom adjacent matrix, different types of nodes transmit information through connecting edges, the node embedded representation of the diseases, the symptoms and the patients is obtained through node embedded representation updating operation, and the graph encoder is input;
the graph decoder performs multi-task learning using node-embedded representations, including three parts:
a) multi-label hierarchical classification of patient disease diagnosis prognosis: constructing a disease hierarchical relation by using a disease hierarchical structure, wherein the disease hierarchical relation comprises a disease layer needing diagnosis and prediction and a disease system classification layer obtained according to medical knowledge; constructing a multi-label hierarchical classifier, and designing a loss function of the multi-label hierarchical classification;
b) disease comparison and learning: constructing a disease pair system category discriminator, calculating the distance between two diseases in a disease pair, and designing a loss function for disease comparison learning;
c) disease-symptom relationship learning: constructing a disease-symptom relation learning device, calculating the probability of the incidence relation between the disease and the symptom in the disease-symptom pair, and designing a loss function for the disease-symptom relation learning;
adding the loss function of the multi-label hierarchical classification, the loss function of the disease contrast learning and the loss function of the disease-symptom relation learning to obtain a loss function of a disease diagnosis model;
(4) disease diagnosis model application module: and (4) performing disease diagnosis prediction on the input symptoms of the new patient by using the disease diagnosis model.
2. The graph neural network-based disease diagnosis prediction system of claim 1, wherein in the knowledge-graph building module, the disease-symptom knowledge graph comprises a relationship between disease, symptom two node types and disease-symptom.
3. The system of claim 1, wherein the heteromorphic graph network is constructed based on a disease-symptom knowledge graph and electronic medical record data, and comprises three node types of disease, symptom and patient, wherein symptom is an intermediate node connected between disease and patient, and the heteromorphic graph network integrates a relationship subgraph related to disease and symptom in the disease-symptom knowledge graph and a relationship subgraph related to patient and symptom in the electronic medical record data.
4. The graphical neural network-based disease diagnosis prediction system of claim 1, wherein the heteromorphic graph network
Figure 388577DEST_PATH_IMAGE001
Expressed as:
Figure 322029DEST_PATH_IMAGE002
wherein the node set
Figure 429662DEST_PATH_IMAGE003
D, S, P are the disease set, symptom set, and patient set, respectively,
Figure 90451DEST_PATH_IMAGE004
Figure 717872DEST_PATH_IMAGE005
Figure 868231DEST_PATH_IMAGE006
Figure 962962DEST_PATH_IMAGE007
Figure 427441DEST_PATH_IMAGE008
Figure 174948DEST_PATH_IMAGE009
respectively representing the disease type, symptom type and patient number; edge set
Figure 230629DEST_PATH_IMAGE010
Figure 250538DEST_PATH_IMAGE011
Figure 66178DEST_PATH_IMAGE012
Figure 855143DEST_PATH_IMAGE013
Respectively, a disease-symptom relationship stored in a disease-symptom adjacency matrix and a patient-symptom relationship stored in a patient-symptom adjacency matrix.
5. The graph neural network-based disease diagnosis prediction system of claim 4, wherein the generation of the node initial embedded representation comprises:
construction of disease-symptom co-occurrence matrices
Figure 98036DEST_PATH_IMAGE014
Matrix of
Figure 667558DEST_PATH_IMAGE015
To (1) a
Figure 208261DEST_PATH_IMAGE016
Line and first
Figure 927430DEST_PATH_IMAGE017
Is listed as
Figure 528176DEST_PATH_IMAGE018
Indicating a diagnosis of a disease in electronic medical record data
Figure 70147DEST_PATH_IMAGE019
In patients in whom symptoms appear
Figure 476857DEST_PATH_IMAGE020
The number of (2);
to pair
Figure 240414DEST_PATH_IMAGE021
Performing row normalization to obtain
Figure 90690DEST_PATH_IMAGE022
Disease of
Figure 103645DEST_PATH_IMAGE019
Is expressed as
Figure 799200DEST_PATH_IMAGE023
I.e. by
Figure 417263DEST_PATH_IMAGE022
To (1) a
Figure 441369DEST_PATH_IMAGE016
A row;
to pair
Figure 941621DEST_PATH_IMAGE021
Performing column normalization to obtain
Figure 627817DEST_PATH_IMAGE024
Symptoms of
Figure 179015DEST_PATH_IMAGE025
Is expressed as
Figure 558044DEST_PATH_IMAGE026
I.e. by
Figure 561903DEST_PATH_IMAGE024
To (1) a
Figure 848528DEST_PATH_IMAGE016
Columns;
calculating the patient
Figure 175604DEST_PATH_IMAGE027
OfBeginning embedded representation
Figure 273004DEST_PATH_IMAGE028
The calculation formula is as follows:
Figure 13427DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 54808DEST_PATH_IMAGE030
for the patient
Figure 298707DEST_PATH_IMAGE027
The number of symptoms of (a).
6. The disease diagnosis prediction system based on graph neural network of claim 1, wherein different types of node initial embedded representations are inputted into a multi-layer perceptron respectively, and the initial embedded representations with the same dimension are inputted into the graph encoder.
7. The graph neural network-based disease diagnosis prediction system of claim 5, wherein the graph encoder is configured to predict disease
Figure 567009DEST_PATH_IMAGE019
Of 1 at
Figure 997990DEST_PATH_IMAGE031
Node-embedded representation of a layer
Figure 626418DEST_PATH_IMAGE032
The calculation formula is as follows:
Figure 944398DEST_PATH_IMAGE033
for symptoms
Figure 632868DEST_PATH_IMAGE025
Of 1 at
Figure 833036DEST_PATH_IMAGE031
Node-embedded representation of a layer
Figure 265155DEST_PATH_IMAGE034
The calculation formula is as follows:
Figure 421329DEST_PATH_IMAGE035
for the patient
Figure 22644DEST_PATH_IMAGE036
Of 1 at
Figure 162639DEST_PATH_IMAGE031
Node-embedded representation of a layer
Figure 149180DEST_PATH_IMAGE037
The calculation formula is as follows:
Figure 18916DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure 799922DEST_PATH_IMAGE039
is the function of the activation of the function,
Figure 161633DEST_PATH_IMAGE040
Figure 951865DEST_PATH_IMAGE041
are respectively the first
Figure 82632DEST_PATH_IMAGE031
A disease-symptom associated weight matrix and a patient-symptom associated weight matrix obtained by training a layer disease diagnosis model;
Figure 283807DEST_PATH_IMAGE042
are respectively diseases
Figure 942934DEST_PATH_IMAGE019
Symptoms of
Figure 254966DEST_PATH_IMAGE025
And the patient
Figure 256551DEST_PATH_IMAGE043
In the first place
Figure 628627DEST_PATH_IMAGE044
A node-embedded representation of a layer;
Figure 230510DEST_PATH_IMAGE045
indicating a disease
Figure 96966DEST_PATH_IMAGE019
A set of adjacent symptom nodes is provided,
Figure 264642DEST_PATH_IMAGE046
indicating symptoms
Figure 761613DEST_PATH_IMAGE025
A set of adjacent disease nodes, wherein the disease nodes are selected,
Figure 585213DEST_PATH_IMAGE047
indicating symptoms
Figure 770206DEST_PATH_IMAGE025
A set of adjacent patient nodes that are,
Figure 483734DEST_PATH_IMAGE048
representing the patient
Figure 463192DEST_PATH_IMAGE043
A set of adjacent symptom nodes.
8. The neural network based disease diagnosis prediction system of claim 7, wherein the multi-label hierarchical classification of the patient disease diagnosis prediction in the graph encoder comprises:
constructing disease hierarchy relationship, and recording the disease types of disease layers
Figure 774087DEST_PATH_IMAGE049
Disease System Classification level
Figure 247925DEST_PATH_IMAGE050
Figure 62297DEST_PATH_IMAGE051
Figure 228968DEST_PATH_IMAGE052
Number of disease system classifications;
construction of a container containing
Figure 761580DEST_PATH_IMAGE053
A multi-label level classifier of a plurality of classifiers,
Figure 288376DEST_PATH_IMAGE053
a two classifiers as
Figure 707987DEST_PATH_IMAGE054
Figure 29247DEST_PATH_IMAGE055
Figure 314735DEST_PATH_IMAGE056
(ii) a The patient is treated
Figure 127446DEST_PATH_IMAGE057
Node-embedded representation of respective inputs
Figure 916410DEST_PATH_IMAGE053
A two classifiers to obtain
Figure 159304DEST_PATH_IMAGE053
A prediction probability, is
Figure 463246DEST_PATH_IMAGE058
Therein, two classifiers
Figure 269528DEST_PATH_IMAGE059
The corresponding label classifies the disease system of the patient; two-classifier
Figure 991628DEST_PATH_IMAGE060
The corresponding label is the disease diagnosis of the patient and the corresponding model parameters are
Figure 592373DEST_PATH_IMAGE061
Calculating the patient
Figure 399924DEST_PATH_IMAGE062
The appearance of disease
Figure 744317DEST_PATH_IMAGE063
Probability of (2)
Figure 570191DEST_PATH_IMAGE064
Wherein, in the step (A),
Figure 107219DEST_PATH_IMAGE065
Figure 651332DEST_PATH_IMAGE066
is a classifier of two
Figure 799417DEST_PATH_IMAGE067
Predicting the presence of disease in a patient
Figure 230529DEST_PATH_IMAGE063
The probability of (d); hypothesis of disease
Figure 438657DEST_PATH_IMAGE063
Is classified into
Figure 955220DEST_PATH_IMAGE068
Figure 641416DEST_PATH_IMAGE069
Is a classifier of two
Figure 441882DEST_PATH_IMAGE070
Predicting whether a patient presents with a systemic classification of disease
Figure 306064DEST_PATH_IMAGE068
The probability of (d);
computing a loss function for multi-label hierarchical classification
Figure 559191DEST_PATH_IMAGE071
The formula is as follows:
Figure 49078DEST_PATH_IMAGE072
Figure 451853DEST_PATH_IMAGE073
Figure 1783DEST_PATH_IMAGE074
Figure 492938DEST_PATH_IMAGE075
wherein the content of the first and second substances,
Figure 520937DEST_PATH_IMAGE076
for the patient
Figure 30416DEST_PATH_IMAGE062
The appearance of disease
Figure 501980DEST_PATH_IMAGE063
The real label of (a) is,
Figure 729699DEST_PATH_IMAGE077
for the patient
Figure 561388DEST_PATH_IMAGE057
The disease diagnosis of (a) corresponds to a true label of a disease system classification,
Figure 676106DEST_PATH_IMAGE078
the norm of L1 is shown,
Figure 567839DEST_PATH_IMAGE079
for disease
Figure 36516DEST_PATH_IMAGE080
And disease
Figure 406317DEST_PATH_IMAGE081
The similarity between the two is calculated according to the following formula:
Figure 890388DEST_PATH_IMAGE082
wherein the content of the first and second substances,
Figure 703755DEST_PATH_IMAGE083
respectively indicate diseases
Figure 374908DEST_PATH_IMAGE080
And disease
Figure 627028DEST_PATH_IMAGE081
The distribution of the real label of (a) is,
Figure 903289DEST_PATH_IMAGE084
Figure 199141DEST_PATH_IMAGE085
and
Figure 311585DEST_PATH_IMAGE086
respectively represent the patients
Figure 85506DEST_PATH_IMAGE087
The appearance of disease
Figure 481852DEST_PATH_IMAGE080
And disease
Figure 696408DEST_PATH_IMAGE081
The real tag of (1).
9. The neural network based disease diagnosis prediction system of claim 7, wherein in the graph decoder, the disease contrast learning comprises:
combining the diseases in the disease set D in pairs to obtain a disease pair set DD with the number of disease pairs
Figure 545415DEST_PATH_IMAGE088
(ii) a Any disease pair in pair DD
Figure 873760DEST_PATH_IMAGE089
Disease pair tags if two diseases belong to the same phylogenetic classification
Figure 124613DEST_PATH_IMAGE090
If the two diseases belong to different phylogenetic classes, then
Figure 762267DEST_PATH_IMAGE091
Construction of disease-to-System class Distinguishing device
Figure 849303DEST_PATH_IMAGE092
To treat diseases
Figure 965027DEST_PATH_IMAGE089
Node-embedded representation of two diseases
Figure 883435DEST_PATH_IMAGE093
Input device
Figure 629674DEST_PATH_IMAGE092
In (1), calculating the distance between two diseases
Figure 250012DEST_PATH_IMAGE094
Figure 114632DEST_PATH_IMAGE095
Wherein the content of the first and second substances,
Figure 136814DEST_PATH_IMAGE096
represents the L2 norm;
calculating loss function for disease contrast learning
Figure 53955DEST_PATH_IMAGE097
The formula is as follows:
Figure 177900DEST_PATH_IMAGE098
wherein the content of the first and second substances,mlower bounds on the distance between representations are embedded for different disease system classes.
10. The graph neural network-based disease diagnosis prediction system of claim 7, wherein in the graph encoder, the disease-symptom relationship learning comprises:
selecting a disease and a symptom from the disease set D and the symptom set S respectively to obtain a disease-symptom pair set DS, wherein the number of the disease-symptom pairs is
Figure 838688DEST_PATH_IMAGE099
(ii) a For any disease-symptom pair in DS
Figure 466110DEST_PATH_IMAGE100
Disease-symptom pair labels if there is a relationship between the disease-symptom in the disease-symptom knowledge map
Figure 819731DEST_PATH_IMAGE101
If no association exists, then
Figure 414660DEST_PATH_IMAGE102
Construction of disease-symptom relationship learner
Figure 364293DEST_PATH_IMAGE103
Will be
Figure 361067DEST_PATH_IMAGE104
Node-embedded representation of diseases and symptoms in (1)
Figure 885590DEST_PATH_IMAGE105
Input device
Figure 450039DEST_PATH_IMAGE103
In, calculate
Figure 514947DEST_PATH_IMAGE104
The probability of the disease being associated with the symptoms
Figure 320223DEST_PATH_IMAGE106
Figure 750067DEST_PATH_IMAGE107
Wherein the content of the first and second substances,
Figure 54009DEST_PATH_IMAGE108
representing a sigmoid function;
calculating a loss function for disease-symptom relationship learning
Figure 673341DEST_PATH_IMAGE109
The formula is as follows:
Figure 316812DEST_PATH_IMAGE110
CN202111609275.1A 2021-12-27 2021-12-27 Disease diagnosis prediction system based on graph neural network Active CN113990495B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202111609275.1A CN113990495B (en) 2021-12-27 2021-12-27 Disease diagnosis prediction system based on graph neural network
PCT/CN2022/116970 WO2023124190A1 (en) 2021-12-27 2022-09-05 Graph neural network-based disease diagnosis and prediction system
JP2023536567A JP7459386B2 (en) 2021-12-27 2022-09-05 Disease diagnosis prediction system based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111609275.1A CN113990495B (en) 2021-12-27 2021-12-27 Disease diagnosis prediction system based on graph neural network

Publications (2)

Publication Number Publication Date
CN113990495A true CN113990495A (en) 2022-01-28
CN113990495B CN113990495B (en) 2022-04-29

Family

ID=79734519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111609275.1A Active CN113990495B (en) 2021-12-27 2021-12-27 Disease diagnosis prediction system based on graph neural network

Country Status (3)

Country Link
JP (1) JP7459386B2 (en)
CN (1) CN113990495B (en)
WO (1) WO2023124190A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114496234A (en) * 2022-04-18 2022-05-13 浙江大学 Cognitive-atlas-based personalized diagnosis and treatment scheme recommendation system for general patients
CN114496283A (en) * 2022-02-15 2022-05-13 山东大学 Disease prediction system based on path reasoning, storage medium and equipment
CN114664452A (en) * 2022-05-20 2022-06-24 之江实验室 General multi-disease prediction system based on causal verification data generation
CN114898879A (en) * 2022-05-10 2022-08-12 电子科技大学 Chronic disease risk prediction method based on graph representation learning
CN115019923A (en) * 2022-07-11 2022-09-06 中南大学 Electronic medical record data pre-training method based on comparative learning
CN115359870A (en) * 2022-10-20 2022-11-18 之江实验室 Disease diagnosis and treatment process abnormity identification system based on hierarchical graph neural network
CN115424724A (en) * 2022-11-04 2022-12-02 之江实验室 Lung cancer lymph node metastasis auxiliary diagnosis system for multi-modal image forest
CN116072298A (en) * 2023-04-06 2023-05-05 之江实验室 Disease prediction system based on hierarchical marker distribution learning
CN115862848B (en) * 2023-02-15 2023-05-30 之江实验室 Disease prediction system and device based on clinical data screening and medical knowledge graph
WO2023124190A1 (en) * 2021-12-27 2023-07-06 之江实验室 Graph neural network-based disease diagnosis and prediction system
CN116631641A (en) * 2023-07-21 2023-08-22 之江实验室 Disease prediction device integrating self-adaptive similar patient diagrams
CN116646072A (en) * 2023-05-18 2023-08-25 肇庆医学高等专科学校 Training method and device for prostate diagnosis neural network model
CN116936108A (en) * 2023-09-19 2023-10-24 之江实验室 Unbalanced data-oriented disease prediction system
CN117012374A (en) * 2023-10-07 2023-11-07 之江实验室 Medical follow-up system and method integrating event map and deep reinforcement learning

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562266B (en) * 2023-07-10 2023-09-15 中国医学科学院北京协和医院 Text analysis method, computer device, and computer-readable storage medium
CN117010494B (en) * 2023-09-27 2024-01-05 之江实验室 Medical data generation method and system based on causal expression learning
CN117235487B (en) * 2023-10-12 2024-03-12 北京大学第三医院(北京大学第三临床医学院) Feature extraction method and system for predicting hospitalization event of asthma patient
CN117409911A (en) * 2023-10-13 2024-01-16 四川大学 Electronic medical record representation learning method based on multi-view contrast learning

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030207278A1 (en) * 2002-04-25 2003-11-06 Javed Khan Methods for analyzing high dimensional data for classifying, diagnosing, prognosticating, and/or predicting diseases and other biological states
WO2015126268A1 (en) * 2014-02-18 2015-08-27 Instytut Biochemii I Biofizyki Polskiej Akademii Nauk An electrochemical biosensor for the detection of protein s100b
US20150356272A1 (en) * 2014-06-10 2015-12-10 Taipei Medical University Prescription analysis system and method for applying probabilistic model based on medical big data
CN108154928A (en) * 2017-12-27 2018-06-12 北京嘉和美康信息技术有限公司 A kind of methods for the diagnosis of diseases and device
CN108198620A (en) * 2018-01-12 2018-06-22 洛阳飞来石软件开发有限公司 A kind of skin disease intelligent auxiliary diagnosis system based on deep learning
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
CN109784387A (en) * 2018-12-29 2019-05-21 天津南大通用数据技术股份有限公司 Multi-level progressive classification method and system based on neural network and Bayesian model
US20190155993A1 (en) * 2017-11-20 2019-05-23 ThinkGenetic Inc. Method and System Supporting Disease Diagnosis
CN110277165A (en) * 2019-06-27 2019-09-24 清华大学 Aided diagnosis method, device, equipment and storage medium based on figure neural network
CN111370127A (en) * 2020-01-14 2020-07-03 之江实验室 Decision support system for early diagnosis of chronic nephropathy in cross-department based on knowledge graph
CN111382272A (en) * 2020-03-09 2020-07-07 西南交通大学 Electronic medical record ICD automatic coding method based on knowledge graph
CN111834012A (en) * 2020-07-14 2020-10-27 中国中医科学院中医药信息研究所 Traditional Chinese medicine syndrome diagnosis method and device based on deep learning and attention mechanism
CN112037912A (en) * 2020-09-09 2020-12-04 平安科技(深圳)有限公司 Triage model training method, device and equipment based on medical knowledge map
CN112263220A (en) * 2020-10-23 2021-01-26 北京文通图像识别技术研究中心有限公司 Endocrine disease intelligent diagnosis system
CN113409892A (en) * 2021-05-13 2021-09-17 西安电子科技大学 miRNA-disease association relation prediction method based on graph neural network
CN113434626A (en) * 2021-08-27 2021-09-24 之江实验室 Multi-center medical diagnosis knowledge map representation learning method and system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130268290A1 (en) * 2012-04-02 2013-10-10 David Jackson Systems and methods for disease knowledge modeling
US11636949B2 (en) * 2018-08-10 2023-04-25 Kahun Medical Ltd. Hybrid knowledge graph for healthcare applications
CN111666477B (en) 2020-06-19 2023-10-20 腾讯科技(深圳)有限公司 Data processing method, device, intelligent equipment and medium
CN111914562B (en) 2020-08-21 2022-10-14 腾讯科技(深圳)有限公司 Electronic information analysis method, device, equipment and readable storage medium
CN113674856B (en) 2021-04-15 2023-12-12 腾讯科技(深圳)有限公司 Medical data processing method, device, equipment and medium based on artificial intelligence
CN113656589B (en) 2021-04-19 2023-07-04 腾讯科技(深圳)有限公司 Object attribute determining method, device, computer equipment and storage medium
CN113643821B (en) * 2021-10-13 2022-02-11 浙江大学 Multi-center knowledge graph joint decision support method and system
CN113990495B (en) * 2021-12-27 2022-04-29 之江实验室 Disease diagnosis prediction system based on graph neural network

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030207278A1 (en) * 2002-04-25 2003-11-06 Javed Khan Methods for analyzing high dimensional data for classifying, diagnosing, prognosticating, and/or predicting diseases and other biological states
WO2015126268A1 (en) * 2014-02-18 2015-08-27 Instytut Biochemii I Biofizyki Polskiej Akademii Nauk An electrochemical biosensor for the detection of protein s100b
US20150356272A1 (en) * 2014-06-10 2015-12-10 Taipei Medical University Prescription analysis system and method for applying probabilistic model based on medical big data
US20190155993A1 (en) * 2017-11-20 2019-05-23 ThinkGenetic Inc. Method and System Supporting Disease Diagnosis
CN108154928A (en) * 2017-12-27 2018-06-12 北京嘉和美康信息技术有限公司 A kind of methods for the diagnosis of diseases and device
CN108198620A (en) * 2018-01-12 2018-06-22 洛阳飞来石软件开发有限公司 A kind of skin disease intelligent auxiliary diagnosis system based on deep learning
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
CN109784387A (en) * 2018-12-29 2019-05-21 天津南大通用数据技术股份有限公司 Multi-level progressive classification method and system based on neural network and Bayesian model
CN110277165A (en) * 2019-06-27 2019-09-24 清华大学 Aided diagnosis method, device, equipment and storage medium based on figure neural network
CN111370127A (en) * 2020-01-14 2020-07-03 之江实验室 Decision support system for early diagnosis of chronic nephropathy in cross-department based on knowledge graph
CN111382272A (en) * 2020-03-09 2020-07-07 西南交通大学 Electronic medical record ICD automatic coding method based on knowledge graph
CN111834012A (en) * 2020-07-14 2020-10-27 中国中医科学院中医药信息研究所 Traditional Chinese medicine syndrome diagnosis method and device based on deep learning and attention mechanism
CN112037912A (en) * 2020-09-09 2020-12-04 平安科技(深圳)有限公司 Triage model training method, device and equipment based on medical knowledge map
CN112263220A (en) * 2020-10-23 2021-01-26 北京文通图像识别技术研究中心有限公司 Endocrine disease intelligent diagnosis system
CN113409892A (en) * 2021-05-13 2021-09-17 西安电子科技大学 miRNA-disease association relation prediction method based on graph neural network
CN113434626A (en) * 2021-08-27 2021-09-24 之江实验室 Multi-center medical diagnosis knowledge map representation learning method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SARAAH PARISOT ET AL: "Disease prediction using graph convolutional networks :Application to Autisom Spectrum Disorder and Alzheimer"s disease", 《MEDICAL IMAGE ANALYSIS》 *
孙振超: "基于异构信息网络的疾病辅助诊断方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)医药卫生科技辑》 *
张驰名等: "基于深度学习的胸部常见疾病诊断方法", 《计算机工程》 *
王永天: "基于本体的疾病分子标志物挖掘方法研究", 《万方学位论文》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023124190A1 (en) * 2021-12-27 2023-07-06 之江实验室 Graph neural network-based disease diagnosis and prediction system
CN114496283A (en) * 2022-02-15 2022-05-13 山东大学 Disease prediction system based on path reasoning, storage medium and equipment
CN114496234B (en) * 2022-04-18 2022-07-19 浙江大学 Cognitive-atlas-based personalized diagnosis and treatment scheme recommendation system for general patients
CN114496234A (en) * 2022-04-18 2022-05-13 浙江大学 Cognitive-atlas-based personalized diagnosis and treatment scheme recommendation system for general patients
CN114898879A (en) * 2022-05-10 2022-08-12 电子科技大学 Chronic disease risk prediction method based on graph representation learning
CN114664452A (en) * 2022-05-20 2022-06-24 之江实验室 General multi-disease prediction system based on causal verification data generation
WO2023221739A1 (en) * 2022-05-20 2023-11-23 之江实验室 General multi-disease prediction system based on causal check data generation
CN115019923A (en) * 2022-07-11 2022-09-06 中南大学 Electronic medical record data pre-training method based on comparative learning
CN115019923B (en) * 2022-07-11 2023-04-28 中南大学 Electronic medical record data pre-training method based on contrast learning
CN115359870A (en) * 2022-10-20 2022-11-18 之江实验室 Disease diagnosis and treatment process abnormity identification system based on hierarchical graph neural network
CN115424724A (en) * 2022-11-04 2022-12-02 之江实验室 Lung cancer lymph node metastasis auxiliary diagnosis system for multi-modal image forest
CN115862848B (en) * 2023-02-15 2023-05-30 之江实验室 Disease prediction system and device based on clinical data screening and medical knowledge graph
CN116072298B (en) * 2023-04-06 2023-08-15 之江实验室 Disease prediction system based on hierarchical marker distribution learning
CN116072298A (en) * 2023-04-06 2023-05-05 之江实验室 Disease prediction system based on hierarchical marker distribution learning
CN116646072A (en) * 2023-05-18 2023-08-25 肇庆医学高等专科学校 Training method and device for prostate diagnosis neural network model
CN116631641A (en) * 2023-07-21 2023-08-22 之江实验室 Disease prediction device integrating self-adaptive similar patient diagrams
CN116631641B (en) * 2023-07-21 2023-12-22 之江实验室 Disease prediction device integrating self-adaptive similar patient diagrams
CN116936108A (en) * 2023-09-19 2023-10-24 之江实验室 Unbalanced data-oriented disease prediction system
CN116936108B (en) * 2023-09-19 2024-01-02 之江实验室 Unbalanced data-oriented disease prediction system
CN117012374A (en) * 2023-10-07 2023-11-07 之江实验室 Medical follow-up system and method integrating event map and deep reinforcement learning
CN117012374B (en) * 2023-10-07 2024-01-26 之江实验室 Medical follow-up system and method integrating event map and deep reinforcement learning

Also Published As

Publication number Publication date
JP2024503980A (en) 2024-01-30
CN113990495B (en) 2022-04-29
WO2023124190A1 (en) 2023-07-06
JP7459386B2 (en) 2024-04-01

Similar Documents

Publication Publication Date Title
CN113990495B (en) Disease diagnosis prediction system based on graph neural network
Sullivan Understanding from machine learning models
CN113822494B (en) Risk prediction method, device, equipment and storage medium
Ming et al. Rulematrix: Visualizing and understanding classifiers with rules
Li et al. A survey of data-driven and knowledge-aware explainable ai
Pham et al. Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels
Zheng et al. The fusion of deep learning and fuzzy systems: A state-of-the-art survey
CN111382272B (en) Electronic medical record ICD automatic coding method based on knowledge graph
Ghorbanali et al. Ensemble transfer learning-based multimodal sentiment analysis using weighted convolutional neural networks
CN110968701A (en) Relationship map establishing method, device and equipment for graph neural network
CN113553440B (en) Medical entity relationship extraction method based on hierarchical reasoning
CN111476315A (en) Image multi-label identification method based on statistical correlation and graph convolution technology
Ibrahim et al. Explainable convolutional neural networks: A taxonomy, review, and future directions
Leevy et al. Investigating the relationship between time and predictive model maintenance
CN112069825B (en) Entity relation joint extraction method for alert condition record data
Ezugwu et al. Machine learning research trends in Africa: a 30 years overview with bibliometric analysis review
CN111143573B (en) Method for predicting knowledge-graph target node based on user feedback information
Wang et al. BB-GCN: A Bi-modal Bridged Graph Convolutional Network for Multi-label Chest X-Ray Recognition
Abu et al. Approaches Of Deep Learning In Persuading The Contemporary Society For The Adoption Of New Trend Of AI Systems: A Review
CN114428864A (en) Knowledge graph construction method and device, electronic equipment and medium
Vergara et al. A Schematic Review of Knowledge Reasoning Approaches Based on the Knowledge Graph
CN116662554B (en) Infectious disease aspect emotion classification method based on heterogeneous graph convolution neural network
de Oliveira Producing Decisions and Explanations: A Joint Approach Towards Explainable CNNs
Fujita et al. Trends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices: 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020, Kitakyushu, Japan, September 22-25, 2020, Proceedings
Deng et al. Deep multiple instance learning for forecasting stock trends using financial news

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant