CN113990495B - Disease diagnosis prediction system based on graph neural network - Google Patents

Disease diagnosis prediction system based on graph neural network Download PDF

Info

Publication number
CN113990495B
CN113990495B CN202111609275.1A CN202111609275A CN113990495B CN 113990495 B CN113990495 B CN 113990495B CN 202111609275 A CN202111609275 A CN 202111609275A CN 113990495 B CN113990495 B CN 113990495B
Authority
CN
China
Prior art keywords
disease
symptom
patient
graph
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111609275.1A
Other languages
Chinese (zh)
Other versions
CN113990495A (en
Inventor
李劲松
池胜强
王宇清
田雨
周天舒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202111609275.1A priority Critical patent/CN113990495B/en
Publication of CN113990495A publication Critical patent/CN113990495A/en
Application granted granted Critical
Publication of CN113990495B publication Critical patent/CN113990495B/en
Priority to PCT/CN2022/116970 priority patent/WO2023124190A1/en
Priority to JP2023536567A priority patent/JP7459386B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Abstract

The invention discloses a disease diagnosis prediction system based on a graph neural network, which comprises a knowledge map construction module, a data extraction and preprocessing module, a disease diagnosis model construction module and a disease diagnosis model application module. The invention effectively integrates expert knowledge and electronic medical record data in the knowledge map and constructs the heteromorphic graph network. On the heteromorphic graph network, local information and global information of the heteromorphic graph network are learned by utilizing a graph convolution neural network method. The disease diagnosis model can train the knowledge and data end to end simultaneously. In the model optimization target, besides the disease prediction task, supervision information on the knowledge relationship is added, so that the disease prediction task can effectively utilize knowledge, and the knowledge representation is not influenced by data noise. Aiming at the problems that the number of predicted diseases is large and the number of patients corresponding to part of the diseases is limited, multi-label hierarchical classification is designed for improving the prediction effect of few-sample class diseases.

Description

Disease diagnosis prediction system based on graph neural network
Technical Field
The invention belongs to the technical field of medical health information, and particularly relates to a disease diagnosis and prediction system based on a graph neural network.
Background
In the field of medical care, a plurality of knowledge maps with good organization, such as international disease classification, drug Bank, clinical guidelines and consensus, and the like, have hierarchical information and complex association relationship which accord with human cognition. A knowledge graph is a heterogeneous graph network that contains a variety of relationships. How to simultaneously utilize expert knowledge and electronic medical record data in the knowledge map and integrate the knowledge and the data for modeling has an important role in disease diagnosis and prediction.
The existing method for predicting diseases based on a graph neural network model lacks a method for effectively fusing a medical knowledge graph and electronic medical record data to construct a heteromorphic graph network. The main methods at present are as follows: (1) data-based graphical network modeling: constructing a graph network based on the electronic medical record data, and predicting diseases by utilizing a graph neural network model; the method does not fully utilize existing sources of medical knowledge. (2) Knowledge representation learning and disease prediction staged modeling approach: performing expression learning on the medical knowledge map to obtain vector expression of knowledge, and then integrating the vector expression into electronic medical record data to perform disease prediction; the staged training approach does not yield a knowledge representation that is best suited for disease prediction. (3) End-to-end modeling methods that focus only on disease prediction tasks: fusing medical knowledge maps and electronic medical record data, constructing a heteromorphic graph network, and predicting diseases by utilizing a graph neural network model; although the method solves the defects existing in the two methods, the learned knowledge representation is possibly influenced by noise in data because the model only optimizes the disease prediction task.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a disease diagnosis and prediction system based on a graph neural network.
The purpose of the invention is realized by the following technical scheme: a graphical neural network based disease diagnosis prediction system, the system comprising:
(1) a knowledge graph construction module: constructing a disease-symptom knowledge map based on the medical knowledge source;
(2) the data extraction and pretreatment module: extracting electronic medical record data of a patient from the electronic medical record system, wherein the electronic medical record data comprises disease diagnosis and symptom data of the patient and is stored in a triple form;
(3) the disease diagnosis model building module: performing graph neural network learning and predictive modeling on disease-symptom knowledge maps and electronic medical record data, comprising:
constructing a heterogeneous graph network, wherein the heterogeneous graph network comprises a disease-symptom subgraph constructed by extracting disease-symptom relations from a disease-symptom knowledge graph and a patient-symptom subgraph constructed by utilizing patient disease diagnosis and symptom data in a triple form;
constructing a disease diagnosis model, wherein the disease diagnosis model consists of a graph encoder and a graph decoder;
the graph encoder is realized on the basis of a graph convolution neural network, the input of the graph encoder is node initial embedded representation of diseases, symptoms and patients obtained by utilizing a disease-symptom co-occurrence matrix, a disease-symptom adjacent matrix and a patient-symptom adjacent matrix, different types of nodes transmit information through connecting edges, the node embedded representation of the diseases, the symptoms and the patients is obtained through node embedded representation updating operation, and the graph encoder is input;
the graph decoder performs multi-task learning using node-embedded representations, including three parts:
a) multi-label hierarchical classification of patient disease diagnosis prognosis: constructing a disease hierarchical relation by using a disease hierarchical structure, wherein the disease hierarchical relation comprises a disease layer needing diagnosis and prediction and a disease system classification layer obtained according to medical knowledge; constructing a multi-label hierarchical classifier, and designing a loss function of the multi-label hierarchical classification;
b) disease comparison and learning: constructing a disease pair system category discriminator, calculating the distance between two diseases in a disease pair, and designing a loss function for disease comparison learning;
c) disease-symptom relationship learning: constructing a disease-symptom relation learning device, calculating the probability of the incidence relation between the disease and the symptom in the disease-symptom pair, and designing a loss function for the disease-symptom relation learning;
adding the loss function of the multi-label hierarchical classification, the loss function of the disease contrast learning and the loss function of the disease-symptom relation learning to obtain a loss function of a disease diagnosis model;
(4) disease diagnosis model application module: and (4) performing disease diagnosis prediction on the input symptoms of the new patient by using the disease diagnosis model.
Further, in the knowledge graph building module, the disease-symptom knowledge graph comprises a disease, a symptom two node type and a disease-symptom one relation.
Further, the heteromorphic graph network is constructed based on a disease-symptom knowledge graph and electronic medical record data and comprises three node types of diseases, symptoms and patients, wherein the symptoms are intermediate nodes connected between the diseases and the patients, and the heteromorphic graph network integrates relationship subgraphs related to the diseases and the symptoms in the disease-symptom knowledge graph and relationship subgraphs related to the patients and the symptoms in the electronic medical record data.
Further, the heterogeneous graph network
Figure 100002_DEST_PATH_IMAGE001
Expressed as:
Figure DEST_PATH_IMAGE002
wherein the node set
Figure 100002_DEST_PATH_IMAGE003
D, S, P are the disease set, symptom set, and patient set, respectively,
Figure DEST_PATH_IMAGE004
Figure 100002_DEST_PATH_IMAGE005
Figure DEST_PATH_IMAGE006
Figure 100002_DEST_PATH_IMAGE007
Figure DEST_PATH_IMAGE008
Figure 100002_DEST_PATH_IMAGE009
respectively representing the disease type, symptom type and patient number; edge set
Figure DEST_PATH_IMAGE010
The set R includes disease-symptom relationships
Figure 100002_DEST_PATH_IMAGE011
And patient-symptom relationship
Figure DEST_PATH_IMAGE012
The disease-symptom relationships are stored in a disease-symptom adjacency matrix and the patient-symptom relationships are stored in a patient-symptom adjacency matrix.
Further, the generating of the node initial embedded representation comprises:
construction of disease-symptom co-occurrence matrices
Figure 100002_DEST_PATH_IMAGE013
Matrix of
Figure DEST_PATH_IMAGE014
To (1) a
Figure 100002_DEST_PATH_IMAGE015
Line and first
Figure DEST_PATH_IMAGE016
Is listed as
Figure 100002_DEST_PATH_IMAGE017
Indicating a diagnosis of a disease in electronic medical record data
Figure DEST_PATH_IMAGE018
In patients in whom symptoms appear
Figure 100002_DEST_PATH_IMAGE019
The number of (2);
to pair
Figure 310296DEST_PATH_IMAGE014
Performing row normalization to obtain
Figure DEST_PATH_IMAGE020
Disease of
Figure 100002_DEST_PATH_IMAGE021
Is expressed as
Figure DEST_PATH_IMAGE022
I.e. by
Figure 100002_DEST_PATH_IMAGE023
To (1) a
Figure 771364DEST_PATH_IMAGE015
A row;
to pair
Figure 935629DEST_PATH_IMAGE014
Performing column normalization to obtain
Figure DEST_PATH_IMAGE024
Symptoms of
Figure 100002_DEST_PATH_IMAGE025
Is expressed as
Figure DEST_PATH_IMAGE026
I.e. by
Figure 380517DEST_PATH_IMAGE024
To (1) a
Figure 655641DEST_PATH_IMAGE015
Columns;
calculating the patient
Figure DEST_PATH_IMAGE027
Initial embedded representation of
Figure 100002_DEST_PATH_IMAGE028
The calculation formula is as follows:
Figure DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE030
for the patient
Figure 236795DEST_PATH_IMAGE027
The number of symptoms of (a).
Further, the initial embedded representations of different types of nodes are respectively input into a multi-layer perceptron to obtain the initial embedded representations of the same dimension, and then input into a graph encoder.
Further, in the picture encoder, for diseases
Figure 70496DEST_PATH_IMAGE021
Of 1 at
Figure DEST_PATH_IMAGE031
Node-embedded representation of a layer
Figure 100002_DEST_PATH_IMAGE032
The calculation formula is as follows:
Figure DEST_PATH_IMAGE033
for symptoms
Figure 2680DEST_PATH_IMAGE025
Of 1 at
Figure 81495DEST_PATH_IMAGE031
Node-embedded representation of a layer
Figure 100002_DEST_PATH_IMAGE034
The calculation formula is as follows:
Figure DEST_PATH_IMAGE035
for the patient
Figure 100002_DEST_PATH_IMAGE036
Of 1 at
Figure 782734DEST_PATH_IMAGE031
Node-embedded representation of a layer
Figure DEST_PATH_IMAGE037
The calculation formula is as follows:
Figure 100002_DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE039
is the function of the activation of the function,
Figure 100002_DEST_PATH_IMAGE040
Figure DEST_PATH_IMAGE041
are respectively the first
Figure 492064DEST_PATH_IMAGE031
A disease-symptom associated weight matrix and a patient-symptom associated weight matrix obtained by training a layer disease diagnosis model;
Figure 100002_DEST_PATH_IMAGE042
are respectively diseases
Figure 442703DEST_PATH_IMAGE021
Symptoms of
Figure 59629DEST_PATH_IMAGE025
And the patient
Figure DEST_PATH_IMAGE043
In the first place
Figure 100002_DEST_PATH_IMAGE044
A node-embedded representation of a layer;
Figure DEST_PATH_IMAGE045
indicating a disease
Figure 880954DEST_PATH_IMAGE021
A set of adjacent symptom nodes is provided,
Figure 100002_DEST_PATH_IMAGE046
indicating symptoms
Figure 292344DEST_PATH_IMAGE025
A set of adjacent disease nodes, wherein the disease nodes are selected,
Figure DEST_PATH_IMAGE047
indicating symptoms
Figure 464700DEST_PATH_IMAGE025
A set of adjacent patient nodes that are,
Figure 100002_DEST_PATH_IMAGE048
representing the patient
Figure 118273DEST_PATH_IMAGE043
A set of adjacent symptom nodes.
Further, in the graphical decoder, the multi-label hierarchical classification of the patient disease diagnosis prediction comprises:
construction of disease hierarchy, disease of disease layerThe category is
Figure DEST_PATH_IMAGE049
Disease System Classification level
Figure 100002_DEST_PATH_IMAGE050
Figure DEST_PATH_IMAGE051
Figure 100002_DEST_PATH_IMAGE052
Number of disease system classifications;
construction of a container containing
Figure DEST_PATH_IMAGE053
A multi-label level classifier of a plurality of classifiers,
Figure 528525DEST_PATH_IMAGE053
a two classifiers as
Figure DEST_PATH_IMAGE054
Figure DEST_PATH_IMAGE055
Figure 100002_DEST_PATH_IMAGE056
(ii) a The patient is treated
Figure 845237DEST_PATH_IMAGE043
Node-embedded representation of respective inputs
Figure 770468DEST_PATH_IMAGE053
A two classifiers to obtain
Figure 729197DEST_PATH_IMAGE053
A prediction probability, is
Figure DEST_PATH_IMAGE057
Therein, two classifiers
Figure 100002_DEST_PATH_IMAGE058
The corresponding label classifies the disease system of the patient; two-classifier
Figure DEST_PATH_IMAGE059
The corresponding label is the disease diagnosis of the patient and the corresponding model parameters are
Figure 100002_DEST_PATH_IMAGE060
Calculating the patient
Figure 259535DEST_PATH_IMAGE036
The appearance of disease
Figure DEST_PATH_IMAGE061
Probability of (2)
Figure 100002_DEST_PATH_IMAGE062
Wherein, in the step (A),
Figure DEST_PATH_IMAGE063
Figure 100002_DEST_PATH_IMAGE064
is a classifier of two
Figure DEST_PATH_IMAGE065
Predicting the presence of disease in a patient
Figure 481569DEST_PATH_IMAGE061
The probability of (d); hypothesis of disease
Figure 628516DEST_PATH_IMAGE061
Is classified into
Figure 100002_DEST_PATH_IMAGE066
Figure DEST_PATH_IMAGE067
Is a classifier of two
Figure 100002_DEST_PATH_IMAGE068
Predicting whether a patient presents with a systemic classification of disease
Figure 358313DEST_PATH_IMAGE066
The probability of (d);
computing a loss function for multi-label hierarchical classification
Figure DEST_PATH_IMAGE069
The formula is as follows:
Figure 100002_DEST_PATH_IMAGE070
Figure DEST_PATH_IMAGE071
Figure 100002_DEST_PATH_IMAGE072
Figure DEST_PATH_IMAGE073
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE074
for the patient
Figure DEST_PATH_IMAGE075
The appearance of disease
Figure 477579DEST_PATH_IMAGE061
The real label of (a) is,
Figure 100002_DEST_PATH_IMAGE076
for the patient
Figure 667251DEST_PATH_IMAGE075
The disease diagnosis of (a) corresponds to a true label of a disease system classification,
Figure DEST_PATH_IMAGE077
the norm of L1 is shown,
Figure 100002_DEST_PATH_IMAGE078
for disease
Figure DEST_PATH_IMAGE079
And disease
Figure 100002_DEST_PATH_IMAGE080
The similarity between the two is calculated according to the following formula:
Figure DEST_PATH_IMAGE081
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE082
respectively indicate diseases
Figure 239178DEST_PATH_IMAGE079
And disease
Figure 805289DEST_PATH_IMAGE080
The distribution of the real label of (a) is,
Figure DEST_PATH_IMAGE083
Figure 100002_DEST_PATH_IMAGE084
and
Figure DEST_PATH_IMAGE085
respectively represent the patients
Figure 44640DEST_PATH_IMAGE075
The appearance of disease
Figure 670794DEST_PATH_IMAGE079
And disease
Figure 37008DEST_PATH_IMAGE080
The real tag of (1).
Further, in the image decoder, the disease contrast learning includes:
combining the diseases in the disease set D in pairs to obtain a disease pair set DD with the number of disease pairs
Figure 100002_DEST_PATH_IMAGE086
(ii) a Any disease pair in pair DD
Figure DEST_PATH_IMAGE087
Disease pair tags if two diseases belong to the same phylogenetic classification
Figure 100002_DEST_PATH_IMAGE088
If the two diseases belong to different phylogenetic classes, then
Figure DEST_PATH_IMAGE089
Construction of disease-to-System class Distinguishing device
Figure 100002_DEST_PATH_IMAGE090
To treat diseases
Figure 344493DEST_PATH_IMAGE087
Node-embedded representation of two diseases
Figure DEST_PATH_IMAGE091
Input device
Figure 235088DEST_PATH_IMAGE090
In (1), calculating the distance between two diseases
Figure 100002_DEST_PATH_IMAGE092
Figure DEST_PATH_IMAGE093
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE094
represents the L2 norm;
calculating loss function for disease contrast learning
Figure 100002_DEST_PATH_IMAGE095
The formula is as follows:
Figure DEST_PATH_IMAGE096
wherein the content of the first and second substances,mlower bounds on the distance between representations are embedded for different disease system classes.
Further, in the graph encoder, the disease-symptom relationship learning includes:
selecting a disease and a symptom from the disease set D and the symptom set S respectively to obtain a disease-symptom pair set DS, wherein the number of the disease-symptom pairs is
Figure 100002_DEST_PATH_IMAGE097
(ii) a For any disease-symptom pair in DS
Figure DEST_PATH_IMAGE098
Disease-symptom pair labels if there is a relationship between the disease-symptom in the disease-symptom knowledge map
Figure 100002_DEST_PATH_IMAGE099
If no association exists, then
Figure DEST_PATH_IMAGE100
Construction of disease-symptom relationship learner
Figure 100002_DEST_PATH_IMAGE101
Will be
Figure DEST_PATH_IMAGE102
Node-embedded representation of diseases and symptoms in (1)
Figure 100002_DEST_PATH_IMAGE103
Input device
Figure 907509DEST_PATH_IMAGE101
In, calculate
Figure 516345DEST_PATH_IMAGE102
The probability of the disease being associated with the symptoms
Figure DEST_PATH_IMAGE104
Figure 100002_DEST_PATH_IMAGE105
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE106
representing a sigmoid function;
calculating a loss function for disease-symptom relationship learning
Figure 100002_DEST_PATH_IMAGE107
The formula is as follows:
Figure DEST_PATH_IMAGE108
the invention has the beneficial effects that: the invention effectively integrates expert knowledge and electronic medical record data in the knowledge map and constructs the heteromorphic graph network. On the heteromorphic graph network, local information and global information of the heteromorphic graph network are learned by utilizing a graph convolution neural network method. The disease diagnosis model can train the knowledge and data end to end simultaneously. In the model optimization target, besides the disease prediction task, supervision information (a disease comparison learning part and a disease-symptom relationship learning part) on the knowledge relationship is added, so that the disease prediction task can effectively utilize knowledge, and the knowledge representation is not influenced by data noise. Aiming at the problems that the number of predicted diseases is large and the number of patients corresponding to part of the diseases is limited, multi-label hierarchical classification is designed for improving the prediction effect of few-sample class diseases.
Drawings
FIG. 1 is a diagram of a disease diagnosis and prognosis system based on a graph neural network according to an embodiment of the present invention;
fig. 2 is a diagram of a heterogeneous graph network structure according to an embodiment of the present invention;
FIG. 3 is a diagram of a disease diagnosis model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a disease hierarchy provided by an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
The embodiment of the invention provides a disease diagnosis and prediction system based on a graph neural network, which comprises a knowledge graph construction module, a data extraction and preprocessing module, a disease diagnosis model construction module and a disease diagnosis model application module, wherein the implementation process of each module is explained in detail below, as shown in fig. 1.
A knowledge graph construction module: the disease-symptom knowledge map is constructed based on SNOMED-CT, HPO and other medical knowledge sources, and comprises two node types of diseases and symptoms and a relation between the diseases and the symptoms.
The data extraction and pretreatment module: electronic medical record data of the patient, including disease diagnosis and symptom data of the patient, are extracted from the electronic medical record system and stored in a triple form.
The disease diagnosis model building module: and (4) carrying out graph neural network learning and prediction modeling on the disease-symptom knowledge graph and the electronic medical record data.
Disease diagnosis model application module: and (4) performing disease diagnosis prediction on the input symptoms of the new patient by using the disease diagnosis model.
The disease diagnosis model building module has the specific functions of: set of given diseases
Figure DEST_PATH_IMAGE109
And the symptom set
Figure DEST_PATH_IMAGE110
And the patient set
Figure 594897DEST_PATH_IMAGE006
Wherein, in the step (A),
Figure DEST_PATH_IMAGE111
Figure DEST_PATH_IMAGE112
Figure DEST_PATH_IMAGE113
respectively, the disease type, symptom type and patient number. Disease diagnosis prediction is considered to be a multi-label classification problem, i.e. a disease diagnosis model is able to predict a disease diagnosis for a patient given the patient's symptoms.
The implementation of the disease diagnosis model comprises:
(1) heterogeneous graph network construction
Constructing a heterogeneous graph network containing three node types of diseases, symptoms and patients by using a disease-symptom knowledge graph and electronic medical record data
Figure DEST_PATH_IMAGE114
Wherein the symptom is an intermediate node connecting between the disease and the patient. The heterogeneous graph network integrates the relationship subgraphs related to diseases and symptoms in a disease-symptom knowledge graph and the relationship subgraphs related to patients and symptoms in electronic medical record data, including the disease-symptom subgraphs
Figure DEST_PATH_IMAGE115
And patient-symptom subgraph
Figure DEST_PATH_IMAGE116
Heterogeneous graph networks
Figure 543262DEST_PATH_IMAGE114
Can be expressed as:
Figure 245638DEST_PATH_IMAGE002
wherein the node set
Figure DEST_PATH_IMAGE117
Edge set
Figure 76191DEST_PATH_IMAGE010
The set R includes disease-symptom relationships
Figure DEST_PATH_IMAGE118
And patient-symptom relationship
Figure 522216DEST_PATH_IMAGE012
The disease-symptom relationships are stored in a disease-symptom adjacency matrix and the patient-symptom relationships are stored in a patient-symptom adjacency matrix.
FIG. 2 is an example of a heterogeneous graph network architecture including 4 patients
Figure DEST_PATH_IMAGE119
4 kinds of diseases
Figure DEST_PATH_IMAGE120
4 symptoms
Figure DEST_PATH_IMAGE121
And patient-symptom relationships, disease-symptom relationships.
(2) Subgraph construction
Disease-symptom subgraph
Figure 856245DEST_PATH_IMAGE115
: and extracting the disease-symptom relationship from the disease-symptom knowledge map to construct a disease-symptom subgraph.
Patient-symptom subgraph
Figure 995103DEST_PATH_IMAGE116
: patient-symptom sub-graphs are constructed using patient disease diagnosis and symptom data in a ternary format.
(3) Disease diagnosis model structure
Fig. 3 is a structural example of a disease diagnosis model. And obtaining node initial embedded representation of the disease, symptom and patient by using the disease-symptom co-occurrence matrix. The node initial embedded representation and adjacency matrix are used as inputs to a disease diagnosis model. The disease diagnosis model is composed of a graph encoder and a graph decoder. The specific steps of the generation of the node initial embedded representation, the graph encoder and the graph decoder are seen in (4) - (6).
(4) Generation of an initial embedded representation of a node
First, a disease-symptom co-occurrence matrix is constructed
Figure 47372DEST_PATH_IMAGE013
Matrix of
Figure 562667DEST_PATH_IMAGE014
To (1) a
Figure 282362DEST_PATH_IMAGE015
Line and first
Figure 326541DEST_PATH_IMAGE016
Is listed as
Figure 131686DEST_PATH_IMAGE017
Indicating a diagnosis of a disease in electronic medical record data
Figure 418049DEST_PATH_IMAGE021
In patients of (1), symptoms appear
Figure 992249DEST_PATH_IMAGE019
The number of the cells. Then, to
Figure 472909DEST_PATH_IMAGE014
Performing row normalization to obtain
Figure 765350DEST_PATH_IMAGE023
Disease of
Figure 356869DEST_PATH_IMAGE021
Is expressed as
Figure 51155DEST_PATH_IMAGE022
I.e. by
Figure 437137DEST_PATH_IMAGE020
To (1) a
Figure 216874DEST_PATH_IMAGE015
A row; to pair
Figure 346504DEST_PATH_IMAGE014
Performing column normalization to obtain
Figure 895297DEST_PATH_IMAGE024
Symptoms of
Figure 717760DEST_PATH_IMAGE025
Is expressed as
Figure DEST_PATH_IMAGE122
I.e. by
Figure 719214DEST_PATH_IMAGE024
To (1) a
Figure 652535DEST_PATH_IMAGE015
And (4) columns. Then, the patient is calculated
Figure DEST_PATH_IMAGE123
Initial embedded representation of
Figure 55834DEST_PATH_IMAGE028
The calculation formula is as follows:
Figure DEST_PATH_IMAGE124
wherein the content of the first and second substances,
Figure 783619DEST_PATH_IMAGE030
for the patient
Figure 272369DEST_PATH_IMAGE123
The number of symptoms of (a).
(5) Picture coder
Firstly, the initial embedded representations of different types of nodes are respectively input into a multi-layer perceptron to obtain the initial embedded representations of the same dimension, and then the initial embedded representations are input into a graph encoder. The graph encoder is implemented based on a graph convolution neural network.
In the graph encoder, different types of nodes can transmit information through connecting edges in the graph to integrate information of other types of nodes. For diseases
Figure 274960DEST_PATH_IMAGE021
Of 1 at
Figure 532766DEST_PATH_IMAGE031
Node-embedded representation of a layer
Figure 431452DEST_PATH_IMAGE032
The calculation formula is as follows:
Figure DEST_PATH_IMAGE125
for symptoms
Figure 407498DEST_PATH_IMAGE025
Of 1 at
Figure 948201DEST_PATH_IMAGE031
Node-embedded representation of a layer
Figure 60514DEST_PATH_IMAGE034
The calculation formula is as follows:
Figure 894215DEST_PATH_IMAGE035
for the patient
Figure DEST_PATH_IMAGE126
Of 1 at
Figure 91978DEST_PATH_IMAGE031
Node-embedded representation of a layer
Figure 436372DEST_PATH_IMAGE037
The calculation formula is as follows:
Figure 403191DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure 909259DEST_PATH_IMAGE039
is the function of the activation of the function,
Figure 859897DEST_PATH_IMAGE040
Figure 476823DEST_PATH_IMAGE041
are respectively the first
Figure 829307DEST_PATH_IMAGE031
A disease-symptom associated weight matrix and a patient-symptom associated weight matrix obtained by training a layer disease diagnosis model;
Figure 506276DEST_PATH_IMAGE042
are disease nodes respectively
Figure 678632DEST_PATH_IMAGE021
Syndrome of (1)Form node
Figure 364828DEST_PATH_IMAGE025
Patient node
Figure 306239DEST_PATH_IMAGE075
In the first place
Figure DEST_PATH_IMAGE127
Node-embedded representation of layers, the total number of layers of the graph encoder being
Figure DEST_PATH_IMAGE128
Figure 888530DEST_PATH_IMAGE045
Representing disease nodes
Figure 813761DEST_PATH_IMAGE021
A set of adjacent symptom nodes is provided,
Figure DEST_PATH_IMAGE129
node representing symptom
Figure 772490DEST_PATH_IMAGE025
A set of adjacent disease nodes, wherein the disease nodes are selected,
Figure DEST_PATH_IMAGE130
node representing symptom
Figure 568407DEST_PATH_IMAGE025
A set of adjacent patient nodes that are,
Figure 587179DEST_PATH_IMAGE048
representing patient nodes
Figure 734126DEST_PATH_IMAGE075
A set of adjacent symptom nodes.
Figure 762125DEST_PATH_IMAGE045
Figure 678129DEST_PATH_IMAGE129
Obtained by a disease-symptom adjacency matrix,
Figure 867802DEST_PATH_IMAGE130
Figure 502045DEST_PATH_IMAGE048
obtained by the patient-symptom adjacency matrix. By repeatedly performing the above-described node-embedded representation update operation
Figure 301112DEST_PATH_IMAGE128
Next, a disease, symptom, and patient node embedded representation that can sufficiently capture the association relationship is obtained.
(6) Graphic decoder
The nodes derived by the graph encoder are embedded in a representation input graph encoder. In the graph decoder, multi-task learning is performed using node-embedded representations.
First, multi-label hierarchical classification of patient disease diagnosis prognosis is performed.
First, a disease hierarchical relationship is constructed using a hierarchical structure of diseases, as shown in fig. 4. Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE131
the layer is the disease in the disease set D, i.e., the disease to be diagnosed and predicted, and the disease type is as described above
Figure 337201DEST_PATH_IMAGE049
Figure DEST_PATH_IMAGE132
Layers are a systematic classification of diseases based on medical knowledge, denoted
Figure DEST_PATH_IMAGE133
Figure 432196DEST_PATH_IMAGE052
Is composed of
Figure 553736DEST_PATH_IMAGE132
Number of disease system classifications for a layer.
Next, constructing a structure comprising
Figure DEST_PATH_IMAGE134
A multi-label level classifier of a plurality of classifiers,
Figure 392379DEST_PATH_IMAGE134
a two classifiers as
Figure 548553DEST_PATH_IMAGE054
Figure DEST_PATH_IMAGE135
. The patient is treated
Figure 80029DEST_PATH_IMAGE036
Node-embedded representation of respective inputs
Figure 423286DEST_PATH_IMAGE134
A two classifiers to obtain
Figure 65619DEST_PATH_IMAGE134
A prediction probability, is
Figure DEST_PATH_IMAGE136
. Wherein the content of the first and second substances,
Figure 810722DEST_PATH_IMAGE056
sorter
Figure 778678DEST_PATH_IMAGE058
The corresponding label classifies the disease system of the patient; classifier
Figure DEST_PATH_IMAGE137
The corresponding label is the disease diagnosis of the patient and the corresponding model parameters are
Figure 609230DEST_PATH_IMAGE060
Then, the patient is calculated
Figure 55255DEST_PATH_IMAGE043
The appearance of disease
Figure 654864DEST_PATH_IMAGE061
Probability of (2)
Figure 793721DEST_PATH_IMAGE062
Wherein, in the step (A),
Figure DEST_PATH_IMAGE138
Figure 111570DEST_PATH_IMAGE064
for a classifier
Figure DEST_PATH_IMAGE139
Predicting the presence of disease in a patient
Figure 95706DEST_PATH_IMAGE061
The probability of (d); hypothesis of disease
Figure 815401DEST_PATH_IMAGE061
Is classified into
Figure 358115DEST_PATH_IMAGE066
Figure 163260DEST_PATH_IMAGE067
For a classifier
Figure 216667DEST_PATH_IMAGE068
Predicting whether a patient presents with a systemic classification of disease
Figure 790868DEST_PATH_IMAGE066
The probability of (c).
Finally, a loss function of multi-label hierarchical classification is calculated
Figure 271528DEST_PATH_IMAGE069
Disclosure of the inventionThe formula is as follows:
Figure 563969DEST_PATH_IMAGE070
Figure 889908DEST_PATH_IMAGE071
Figure 584194DEST_PATH_IMAGE072
Figure DEST_PATH_IMAGE140
wherein the content of the first and second substances,
Figure 970176DEST_PATH_IMAGE074
for the patient
Figure 749914DEST_PATH_IMAGE043
The appearance of disease
Figure 879544DEST_PATH_IMAGE061
The real label of (a) is,
Figure 693916DEST_PATH_IMAGE076
for the patient
Figure 250799DEST_PATH_IMAGE043
The disease diagnosis of (a) corresponds to a true label of the systematic classification,
Figure 252253DEST_PATH_IMAGE077
the norm of L1 is shown,
Figure 451153DEST_PATH_IMAGE078
for disease
Figure 854453DEST_PATH_IMAGE079
And disease
Figure 847817DEST_PATH_IMAGE080
The similarity between the two is calculated according to the following formula:
Figure 336567DEST_PATH_IMAGE081
wherein the content of the first and second substances,
Figure 807999DEST_PATH_IMAGE082
respectively indicate diseases
Figure 331385DEST_PATH_IMAGE079
And disease
Figure 495650DEST_PATH_IMAGE080
The distribution of the real label of (a) is,
Figure 737275DEST_PATH_IMAGE083
Figure 746819DEST_PATH_IMAGE084
and
Figure 390290DEST_PATH_IMAGE085
respectively represent the patients
Figure DEST_PATH_IMAGE141
The appearance of disease
Figure 692834DEST_PATH_IMAGE079
And disease
Figure 421755DEST_PATH_IMAGE080
The real tag of (1).
Second, disease contrast learning is performed.
Firstly, combining the diseases in the disease set D in pairs to obtain a disease pair set DD, wherein the number of the disease pairs is
Figure 234990DEST_PATH_IMAGE086
. Any disease pair in pair DD
Figure 467389DEST_PATH_IMAGE087
Disease pair tags if two diseases belong to the same phylogenetic classification
Figure DEST_PATH_IMAGE142
If the two diseases belong to different phylogenetic classes, then
Figure 973456DEST_PATH_IMAGE089
Then, a disease-to-system type discriminator is constructed
Figure 924095DEST_PATH_IMAGE090
. Will be ill to
Figure 541021DEST_PATH_IMAGE087
Node-embedded representation of two diseases
Figure 893505DEST_PATH_IMAGE091
Input device
Figure DEST_PATH_IMAGE143
In (1), calculating the distance between two diseases
Figure DEST_PATH_IMAGE144
Figure DEST_PATH_IMAGE145
Wherein the content of the first and second substances,
Figure 39315DEST_PATH_IMAGE094
representing the L2 norm.
Finally, a loss function of disease contrast learning is calculated
Figure DEST_PATH_IMAGE146
The formula is as follows:
Figure DEST_PATH_IMAGE147
wherein the content of the first and second substances,mlower bounds on the distance between representations are embedded for different disease system classes.
Thirdly, learning of disease-symptom relationships is performed.
Firstly, a disease and a symptom are respectively selected from a disease set D and a symptom set S to obtain a disease-symptom pair set DS, and the number of the disease-symptom pairs is
Figure 946091DEST_PATH_IMAGE097
. For any disease-symptom pair in DS
Figure 101129DEST_PATH_IMAGE098
If the disease-symptom is associated in the disease-symptom knowledge map, the disease-symptom pair label
Figure DEST_PATH_IMAGE148
If no association exists, then
Figure 308120DEST_PATH_IMAGE100
Then, a disease-symptom relationship learning device is constructed
Figure 155990DEST_PATH_IMAGE101
Will be
Figure 81221DEST_PATH_IMAGE102
Node-embedded representation of diseases and symptoms in (1)
Figure 39949DEST_PATH_IMAGE103
Input device
Figure 835867DEST_PATH_IMAGE101
In (1), calculating disease-symptom pairs
Figure 120218DEST_PATH_IMAGE102
The probability of the disease being associated with the symptoms
Figure 267165DEST_PATH_IMAGE104
Figure 295164DEST_PATH_IMAGE105
Wherein the content of the first and second substances,
Figure 709703DEST_PATH_IMAGE106
representing the sigmoid function.
Finally, a loss function for learning disease-symptom relationship is calculated
Figure 633797DEST_PATH_IMAGE107
The formula is as follows:
Figure DEST_PATH_IMAGE149
loss function of disease diagnosis model
Figure DEST_PATH_IMAGE150
The definition is as follows:
Figure DEST_PATH_IMAGE151
the foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (9)

1. A disease diagnosis prediction system based on a graph neural network, comprising:
(1) a knowledge graph construction module: constructing a disease-symptom knowledge map based on the medical knowledge source;
(2) the data extraction and pretreatment module: extracting electronic medical record data of a patient from the electronic medical record system, wherein the electronic medical record data comprises disease diagnosis and symptom data of the patient and is stored in a triple form;
(3) the disease diagnosis model building module: performing graph neural network learning and predictive modeling on disease-symptom knowledge maps and electronic medical record data, comprising:
constructing a heterogeneous graph network, wherein the heterogeneous graph network comprises a disease-symptom subgraph constructed by extracting disease-symptom relations from a disease-symptom knowledge graph and a patient-symptom subgraph constructed by utilizing patient disease diagnosis and symptom data in a triple form;
constructing a disease diagnosis model, wherein the disease diagnosis model consists of a graph encoder and a graph decoder;
the graph encoder is realized on the basis of a graph convolution neural network, the input of the graph encoder is node initial embedded representation of diseases, symptoms and patients obtained by utilizing a disease-symptom co-occurrence matrix, a disease-symptom adjacent matrix and a patient-symptom adjacent matrix, different types of nodes transmit information through connecting edges, the node embedded representation of the diseases, the symptoms and the patients is obtained through node embedded representation updating operation, and the graph encoder is input;
the graph decoder performs multi-task learning using node-embedded representations, including three parts:
a) multi-label hierarchical classification of patient disease diagnosis prognosis:
constructing disease hierarchical relation by using disease hierarchical structure, wherein the disease hierarchical relation comprises a disease layer needing diagnosis and prediction and a disease system classification layer obtained according to medical knowledge, and the disease types of the disease layer are recorded as
Figure DEST_PATH_IMAGE001
Disease System Classification level
Figure 854497DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
Figure 623869DEST_PATH_IMAGE004
Number of disease system classifications;
construction of a container containing
Figure DEST_PATH_IMAGE005
A multi-label level classifier of a plurality of classifiers,
Figure 394379DEST_PATH_IMAGE005
a two classifiers as
Figure 754954DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE007
Figure 610914DEST_PATH_IMAGE008
(ii) a The patient is treated
Figure DEST_PATH_IMAGE009
Node-embedded representation of respective inputs
Figure 246295DEST_PATH_IMAGE005
A two classifiers to obtain
Figure 136890DEST_PATH_IMAGE005
A prediction probability, is
Figure 635743DEST_PATH_IMAGE010
Therein, two classifiers
Figure DEST_PATH_IMAGE011
Corresponding markSigning a patient's disease system classification; two-classifier
Figure 510158DEST_PATH_IMAGE012
The corresponding label is the disease diagnosis of the patient and the corresponding model parameters are
Figure DEST_PATH_IMAGE013
Calculating the patient
Figure 886912DEST_PATH_IMAGE014
The appearance of disease
Figure DEST_PATH_IMAGE015
Probability of (2)
Figure 163173DEST_PATH_IMAGE016
Wherein, in the step (A),
Figure DEST_PATH_IMAGE017
Figure 599970DEST_PATH_IMAGE018
is a classifier of two
Figure DEST_PATH_IMAGE019
Predicting the presence of disease in a patient
Figure 899365DEST_PATH_IMAGE015
The probability of (d); hypothesis of disease
Figure 610969DEST_PATH_IMAGE015
Is classified into
Figure 741736DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE021
Is a classifier of two
Figure 83856DEST_PATH_IMAGE022
Predicting whether a patient presents with a systemic classification of disease
Figure 932863DEST_PATH_IMAGE020
The probability of (d);
computing a loss function for multi-label hierarchical classification
Figure DEST_PATH_IMAGE023
The formula is as follows:
Figure 385841DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE025
Figure 636694DEST_PATH_IMAGE026
Figure 149715DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE028
the number of patients is indicated and the number of patients,
Figure 486018DEST_PATH_IMAGE029
for the patient
Figure 539425DEST_PATH_IMAGE009
The appearance of disease
Figure 582467DEST_PATH_IMAGE015
The real label of (a) is,
Figure DEST_PATH_IMAGE030
for the patient
Figure 328706DEST_PATH_IMAGE009
The disease diagnosis of (a) corresponds to a true label of a disease system classification,
Figure 322945DEST_PATH_IMAGE031
the norm of L1 is shown,
Figure DEST_PATH_IMAGE032
for disease
Figure 445621DEST_PATH_IMAGE033
And disease
Figure DEST_PATH_IMAGE034
The similarity between the two is calculated according to the following formula:
Figure 343170DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE036
respectively indicate diseases
Figure 260311DEST_PATH_IMAGE033
And disease
Figure 508889DEST_PATH_IMAGE034
The distribution of the real label of (a) is,
Figure 169678DEST_PATH_IMAGE037
Figure DEST_PATH_IMAGE038
and
Figure 921733DEST_PATH_IMAGE039
respectively represent the patients
Figure 9775DEST_PATH_IMAGE009
The appearance of disease
Figure 542388DEST_PATH_IMAGE033
And disease
Figure 944550DEST_PATH_IMAGE034
The real tag of (1);
b) disease comparison and learning: constructing a disease pair system category discriminator, calculating the distance between two diseases in a disease pair, and designing a loss function for disease comparison learning;
c) disease-symptom relationship learning: constructing a disease-symptom relation learning device, calculating the probability of the incidence relation between the disease and the symptom in the disease-symptom pair, and designing a loss function for the disease-symptom relation learning;
adding the loss function of the multi-label hierarchical classification, the loss function of the disease contrast learning and the loss function of the disease-symptom relation learning to obtain a loss function of a disease diagnosis model;
(4) disease diagnosis model application module: and (4) performing disease diagnosis prediction on the input symptoms of the new patient by using the disease diagnosis model.
2. The graph neural network-based disease diagnosis prediction system of claim 1, wherein in the knowledge-graph building module, the disease-symptom knowledge graph comprises a relationship between disease, symptom two node types and disease-symptom.
3. The system of claim 1, wherein the heteromorphic graph network is constructed based on a disease-symptom knowledge graph and electronic medical record data, and comprises three node types of disease, symptom and patient, wherein symptom is an intermediate node connected between disease and patient, and the heteromorphic graph network integrates a relationship subgraph related to disease and symptom in the disease-symptom knowledge graph and a relationship subgraph related to patient and symptom in the electronic medical record data.
4. The graphical neural network-based disease diagnosis prediction system of claim 1, wherein the heteromorphic graph network
Figure DEST_PATH_IMAGE040
Expressed as:
Figure 613429DEST_PATH_IMAGE041
wherein the node set
Figure DEST_PATH_IMAGE042
D, S, P are the disease set, symptom set, and patient set, respectively,
Figure 75634DEST_PATH_IMAGE043
Figure DEST_PATH_IMAGE044
Figure 95543DEST_PATH_IMAGE045
Figure DEST_PATH_IMAGE046
Figure 35817DEST_PATH_IMAGE047
Figure DEST_PATH_IMAGE048
respectively representing the disease type, symptom type and patient number; edge set
Figure 824781DEST_PATH_IMAGE049
The set R includes a set representing disease-symptom relationships
Figure DEST_PATH_IMAGE050
And patient-symptom relationship
Figure 723467DEST_PATH_IMAGE051
The disease-symptom relationships are stored in a disease-symptom adjacency matrix and the patient-symptom relationships are stored in a patient-symptom adjacency matrix.
5. The graph neural network-based disease diagnosis prediction system of claim 4, wherein the generation of the node initial embedded representation comprises:
construction of disease-symptom co-occurrence matrices
Figure DEST_PATH_IMAGE052
Matrix of
Figure 401311DEST_PATH_IMAGE053
To (1) a
Figure 942014DEST_PATH_IMAGE055
Line and first
Figure DEST_PATH_IMAGE056
Is listed as
Figure 54326DEST_PATH_IMAGE057
Indicating a diagnosis of a disease in electronic medical record data
Figure DEST_PATH_IMAGE058
In patients in whom symptoms appear
Figure 655072DEST_PATH_IMAGE059
The number of (2);
to pair
Figure DEST_PATH_IMAGE060
Performing row normalization to obtain
Figure 587256DEST_PATH_IMAGE061
Disease of
Figure 666070DEST_PATH_IMAGE058
Is expressed as
Figure DEST_PATH_IMAGE062
I.e. by
Figure 632889DEST_PATH_IMAGE061
To (1) a
Figure 138957DEST_PATH_IMAGE055
A row;
to pair
Figure 620754DEST_PATH_IMAGE060
Performing column normalization to obtain
Figure 503259DEST_PATH_IMAGE063
Symptoms of
Figure DEST_PATH_IMAGE064
Is expressed as
Figure 324585DEST_PATH_IMAGE065
I.e. by
Figure 267133DEST_PATH_IMAGE063
To (1) a
Figure 908330DEST_PATH_IMAGE055
Columns;
calculating the patient
Figure DEST_PATH_IMAGE066
Initial embedded table ofDisplay device
Figure 594526DEST_PATH_IMAGE067
The calculation formula is as follows:
Figure DEST_PATH_IMAGE068
wherein the content of the first and second substances,
Figure 270358DEST_PATH_IMAGE069
for the patient
Figure 383808DEST_PATH_IMAGE066
The number of symptoms of (a).
6. The disease diagnosis prediction system based on graph neural network of claim 1, wherein different types of node initial embedded representations are inputted into a multi-layer perceptron respectively, and the initial embedded representations with the same dimension are inputted into the graph encoder.
7. The graph neural network-based disease diagnosis prediction system of claim 5, wherein the graph encoder is configured to predict disease
Figure 309038DEST_PATH_IMAGE058
Of 1 at
Figure DEST_PATH_IMAGE070
Node-embedded representation of a layer
Figure 2188DEST_PATH_IMAGE071
The calculation formula is as follows:
Figure DEST_PATH_IMAGE072
for symptoms
Figure 42780DEST_PATH_IMAGE064
Of 1 at
Figure 327131DEST_PATH_IMAGE070
Node-embedded representation of a layer
Figure 5237DEST_PATH_IMAGE073
The calculation formula is as follows:
Figure DEST_PATH_IMAGE074
for the patient
Figure 970919DEST_PATH_IMAGE014
Of 1 at
Figure 418081DEST_PATH_IMAGE070
Node-embedded representation of a layer
Figure 873333DEST_PATH_IMAGE075
The calculation formula is as follows:
Figure DEST_PATH_IMAGE076
wherein the content of the first and second substances,
Figure 976418DEST_PATH_IMAGE077
is the function of the activation of the function,
Figure DEST_PATH_IMAGE078
Figure 11370DEST_PATH_IMAGE079
are respectively the first
Figure 47459DEST_PATH_IMAGE070
Disease-symptom associations trained on stratigraphic disease diagnostic modelsA weight matrix and a patient-symptom association weight matrix;
Figure DEST_PATH_IMAGE080
are respectively diseases
Figure 142454DEST_PATH_IMAGE058
Symptoms of
Figure 529573DEST_PATH_IMAGE064
And the patient
Figure 899375DEST_PATH_IMAGE009
In the first place
Figure 55550DEST_PATH_IMAGE081
A node-embedded representation of a layer;
Figure DEST_PATH_IMAGE082
indicating a disease
Figure 55867DEST_PATH_IMAGE058
A set of adjacent symptom nodes is provided,
Figure 930282DEST_PATH_IMAGE083
indicating symptoms
Figure 307036DEST_PATH_IMAGE064
A set of adjacent disease nodes, wherein the disease nodes are selected,
Figure DEST_PATH_IMAGE084
indicating symptoms
Figure 583297DEST_PATH_IMAGE064
A set of adjacent patient nodes that are,
Figure 551253DEST_PATH_IMAGE085
representing the patient
Figure 850647DEST_PATH_IMAGE009
A set of adjacent symptom nodes.
8. The neural network based disease diagnosis prediction system of claim 7, wherein in the graph decoder, the disease contrast learning comprises:
combining the diseases in the disease set D in pairs to obtain a disease pair set DD with the number of disease pairs
Figure DEST_PATH_IMAGE086
(ii) a Any disease pair in pair DD
Figure 562251DEST_PATH_IMAGE087
Disease pair tags if two diseases belong to the same phylogenetic classification
Figure DEST_PATH_IMAGE088
If the two diseases belong to different phylogenetic classes, then
Figure 394816DEST_PATH_IMAGE089
Construction of disease-to-System class Distinguishing device
Figure DEST_PATH_IMAGE090
To treat diseases
Figure 799252DEST_PATH_IMAGE087
Node-embedded representation of two diseases
Figure 585943DEST_PATH_IMAGE091
Input device
Figure 101238DEST_PATH_IMAGE090
In (1), calculating the distance between two diseases
Figure DEST_PATH_IMAGE092
Figure 289774DEST_PATH_IMAGE093
Wherein the content of the first and second substances,
Figure 599532DEST_PATH_IMAGE094
represents the L2 norm;
calculating loss function for disease contrast learning
Figure DEST_PATH_IMAGE095
The formula is as follows:
Figure 139098DEST_PATH_IMAGE096
wherein the content of the first and second substances,mlower bounds on the distance between representations are embedded for different disease system classes.
9. The graph neural network-based disease diagnosis prediction system of claim 7, wherein in the graph encoder, the disease-symptom relationship learning comprises:
selecting a disease and a symptom from the disease set D and the symptom set S respectively to obtain a disease-symptom pair set DS, wherein the number of the disease-symptom pairs is
Figure DEST_PATH_IMAGE097
(ii) a For any disease-symptom pair in DS
Figure 458084DEST_PATH_IMAGE098
Disease-symptom pair labels if there is a relationship between the disease-symptom in the disease-symptom knowledge map
Figure DEST_PATH_IMAGE099
If no association exists, then
Figure 501126DEST_PATH_IMAGE100
Construction of disease-symptom relationship learner
Figure DEST_PATH_IMAGE101
Will be
Figure 247365DEST_PATH_IMAGE102
Node-embedded representation of diseases and symptoms in (1)
Figure DEST_PATH_IMAGE103
Input device
Figure 743069DEST_PATH_IMAGE101
In, calculate
Figure 865746DEST_PATH_IMAGE102
The probability of the disease being associated with the symptoms
Figure 763294DEST_PATH_IMAGE104
Figure DEST_PATH_IMAGE105
Wherein the content of the first and second substances,
Figure 680435DEST_PATH_IMAGE106
representing a sigmoid function;
calculating a loss function for disease-symptom relationship learning
Figure DEST_PATH_IMAGE107
The formula is as follows:
Figure 929014DEST_PATH_IMAGE108
CN202111609275.1A 2021-12-27 2021-12-27 Disease diagnosis prediction system based on graph neural network Active CN113990495B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202111609275.1A CN113990495B (en) 2021-12-27 2021-12-27 Disease diagnosis prediction system based on graph neural network
PCT/CN2022/116970 WO2023124190A1 (en) 2021-12-27 2022-09-05 Graph neural network-based disease diagnosis and prediction system
JP2023536567A JP7459386B2 (en) 2021-12-27 2022-09-05 Disease diagnosis prediction system based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111609275.1A CN113990495B (en) 2021-12-27 2021-12-27 Disease diagnosis prediction system based on graph neural network

Publications (2)

Publication Number Publication Date
CN113990495A CN113990495A (en) 2022-01-28
CN113990495B true CN113990495B (en) 2022-04-29

Family

ID=79734519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111609275.1A Active CN113990495B (en) 2021-12-27 2021-12-27 Disease diagnosis prediction system based on graph neural network

Country Status (3)

Country Link
JP (1) JP7459386B2 (en)
CN (1) CN113990495B (en)
WO (1) WO2023124190A1 (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113990495B (en) * 2021-12-27 2022-04-29 之江实验室 Disease diagnosis prediction system based on graph neural network
CN114496283A (en) * 2022-02-15 2022-05-13 山东大学 Disease prediction system based on path reasoning, storage medium and equipment
CN114496234B (en) * 2022-04-18 2022-07-19 浙江大学 Cognitive-atlas-based personalized diagnosis and treatment scheme recommendation system for general patients
CN114898879B (en) * 2022-05-10 2023-04-21 电子科技大学 Chronic disease risk prediction method based on graph representation learning
CN114664452B (en) * 2022-05-20 2022-09-23 之江实验室 General multi-disease prediction system based on causal verification data generation
CN115019923B (en) * 2022-07-11 2023-04-28 中南大学 Electronic medical record data pre-training method based on contrast learning
CN115359870B (en) * 2022-10-20 2023-03-24 之江实验室 Disease diagnosis and treatment process abnormity identification system based on hierarchical graph neural network
CN115424724B (en) * 2022-11-04 2023-01-24 之江实验室 Lung cancer lymph node metastasis auxiliary diagnosis system for multi-modal forest
CN115862848B (en) * 2023-02-15 2023-05-30 之江实验室 Disease prediction system and device based on clinical data screening and medical knowledge graph
CN116072298B (en) * 2023-04-06 2023-08-15 之江实验室 Disease prediction system based on hierarchical marker distribution learning
CN116646072A (en) * 2023-05-18 2023-08-25 肇庆医学高等专科学校 Training method and device for prostate diagnosis neural network model
CN116562266B (en) * 2023-07-10 2023-09-15 中国医学科学院北京协和医院 Text analysis method, computer device, and computer-readable storage medium
CN116631641B (en) * 2023-07-21 2023-12-22 之江实验室 Disease prediction device integrating self-adaptive similar patient diagrams
CN116936108B (en) * 2023-09-19 2024-01-02 之江实验室 Unbalanced data-oriented disease prediction system
CN117010494B (en) * 2023-09-27 2024-01-05 之江实验室 Medical data generation method and system based on causal expression learning
CN117012374B (en) * 2023-10-07 2024-01-26 之江实验室 Medical follow-up system and method integrating event map and deep reinforcement learning
CN117235487B (en) * 2023-10-12 2024-03-12 北京大学第三医院(北京大学第三临床医学院) Feature extraction method and system for predicting hospitalization event of asthma patient
CN117409911B (en) * 2023-10-13 2024-05-07 四川大学 Electronic medical record representation learning method based on multi-view contrast learning
CN117438023B (en) * 2023-10-31 2024-04-26 灌云县南岗镇卫生院 Hospital information management method and system based on big data
CN117894422A (en) * 2024-03-18 2024-04-16 攀枝花学院 ICU severe monitoring-based data visualization method and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154928A (en) * 2017-12-27 2018-06-12 北京嘉和美康信息技术有限公司 A kind of methods for the diagnosis of diseases and device
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
CN110277165A (en) * 2019-06-27 2019-09-24 清华大学 Aided diagnosis method, device, equipment and storage medium based on figure neural network
CN111370127A (en) * 2020-01-14 2020-07-03 之江实验室 Decision support system for early diagnosis of chronic nephropathy in cross-department based on knowledge graph
CN111382272A (en) * 2020-03-09 2020-07-07 西南交通大学 Electronic medical record ICD automatic coding method based on knowledge graph
CN111834012A (en) * 2020-07-14 2020-10-27 中国中医科学院中医药信息研究所 Traditional Chinese medicine syndrome diagnosis method and device based on deep learning and attention mechanism
CN112037912A (en) * 2020-09-09 2020-12-04 平安科技(深圳)有限公司 Triage model training method, device and equipment based on medical knowledge map
CN112263220A (en) * 2020-10-23 2021-01-26 北京文通图像识别技术研究中心有限公司 Endocrine disease intelligent diagnosis system
CN113409892A (en) * 2021-05-13 2021-09-17 西安电子科技大学 miRNA-disease association relation prediction method based on graph neural network
CN113434626A (en) * 2021-08-27 2021-09-24 之江实验室 Multi-center medical diagnosis knowledge map representation learning method and system

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7774143B2 (en) * 2002-04-25 2010-08-10 The United States Of America As Represented By The Secretary, Department Of Health And Human Services Methods for analyzing high dimensional data for classifying, diagnosing, prognosticating, and/or predicting diseases and other biological states
US20130268290A1 (en) * 2012-04-02 2013-10-10 David Jackson Systems and methods for disease knowledge modeling
PL407244A1 (en) * 2014-02-18 2015-08-31 Instytut Biochemii I Biofizyki Polskiej Akademii Nauk Electrochemical bio-sensor for detecting S100B protein
US20150356272A1 (en) * 2014-06-10 2015-12-10 Taipei Medical University Prescription analysis system and method for applying probabilistic model based on medical big data
US20190155993A1 (en) * 2017-11-20 2019-05-23 ThinkGenetic Inc. Method and System Supporting Disease Diagnosis
CN108198620B (en) * 2018-01-12 2022-03-22 洛阳飞来石软件开发有限公司 Skin disease intelligent auxiliary diagnosis system based on deep learning
US11636949B2 (en) * 2018-08-10 2023-04-25 Kahun Medical Ltd. Hybrid knowledge graph for healthcare applications
CN109784387A (en) * 2018-12-29 2019-05-21 天津南大通用数据技术股份有限公司 Multi-level progressive classification method and system based on neural network and Bayesian model
CN111666477B (en) 2020-06-19 2023-10-20 腾讯科技(深圳)有限公司 Data processing method, device, intelligent equipment and medium
CN111914562B (en) 2020-08-21 2022-10-14 腾讯科技(深圳)有限公司 Electronic information analysis method, device, equipment and readable storage medium
CN113674856B (en) 2021-04-15 2023-12-12 腾讯科技(深圳)有限公司 Medical data processing method, device, equipment and medium based on artificial intelligence
CN113656589B (en) 2021-04-19 2023-07-04 腾讯科技(深圳)有限公司 Object attribute determining method, device, computer equipment and storage medium
CN113643821B (en) * 2021-10-13 2022-02-11 浙江大学 Multi-center knowledge graph joint decision support method and system
CN113990495B (en) * 2021-12-27 2022-04-29 之江实验室 Disease diagnosis prediction system based on graph neural network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154928A (en) * 2017-12-27 2018-06-12 北京嘉和美康信息技术有限公司 A kind of methods for the diagnosis of diseases and device
CN109036553A (en) * 2018-08-01 2018-12-18 北京理工大学 A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
CN110277165A (en) * 2019-06-27 2019-09-24 清华大学 Aided diagnosis method, device, equipment and storage medium based on figure neural network
CN111370127A (en) * 2020-01-14 2020-07-03 之江实验室 Decision support system for early diagnosis of chronic nephropathy in cross-department based on knowledge graph
CN111382272A (en) * 2020-03-09 2020-07-07 西南交通大学 Electronic medical record ICD automatic coding method based on knowledge graph
CN111834012A (en) * 2020-07-14 2020-10-27 中国中医科学院中医药信息研究所 Traditional Chinese medicine syndrome diagnosis method and device based on deep learning and attention mechanism
CN112037912A (en) * 2020-09-09 2020-12-04 平安科技(深圳)有限公司 Triage model training method, device and equipment based on medical knowledge map
CN112263220A (en) * 2020-10-23 2021-01-26 北京文通图像识别技术研究中心有限公司 Endocrine disease intelligent diagnosis system
CN113409892A (en) * 2021-05-13 2021-09-17 西安电子科技大学 miRNA-disease association relation prediction method based on graph neural network
CN113434626A (en) * 2021-08-27 2021-09-24 之江实验室 Multi-center medical diagnosis knowledge map representation learning method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Disease prediction using graph convolutional networks :Application to Autisom Spectrum Disorder and Alzheimer"s disease;Saraah Parisot ET AL;《Medical Image Analysis》;20180831;第48卷;全文 *
基于异构信息网络的疾病辅助诊断方法研究;孙振超;《中国优秀博硕士学位论文全文数据库(硕士)医药卫生科技辑》;20211215;第2021年卷(第12期);第17-31页 *
基于本体的疾病分子标志物挖掘方法研究;王永天;《万方学位论文》;20211202;全文 *
基于深度学习的胸部常见疾病诊断方法;张驰名等;《计算机工程》;20200731;第46卷(第7期);全文 *

Also Published As

Publication number Publication date
CN113990495A (en) 2022-01-28
WO2023124190A1 (en) 2023-07-06
JP7459386B2 (en) 2024-04-01
JP2024503980A (en) 2024-01-30

Similar Documents

Publication Publication Date Title
CN113990495B (en) Disease diagnosis prediction system based on graph neural network
Sullivan Understanding from machine learning models
Pham et al. Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels
Buhrmester et al. Analysis of explainers of black box deep neural networks for computer vision: A survey
Ming et al. Rulematrix: Visualizing and understanding classifiers with rules
Zheng et al. The fusion of deep learning and fuzzy systems: A state-of-the-art survey
CN111382272B (en) Electronic medical record ICD automatic coding method based on knowledge graph
Heidari et al. The COVID-19 epidemic analysis and diagnosis using deep learning: A systematic literature review and future directions
CN113553440B (en) Medical entity relationship extraction method based on hierarchical reasoning
Geetha et al. Fuzzy case-based reasoning approach for finding COVID-19 patients priority in hospitals at source shortage period
Nagahisarchoghaei et al. An empirical survey on explainable ai technologies: Recent trends, use-cases, and categories from technical and application perspectives
CN114743037A (en) Deep medical image clustering method based on multi-scale structure learning
Ezugwu et al. Machine learning research trends in Africa: a 30 years overview with bibliometric analysis review
CN112069825B (en) Entity relation joint extraction method for alert condition record data
Haggag et al. A computer-aided diagnostic system for diabetic retinopathy based on local and global extracted features
CN117457192A (en) Intelligent remote diagnosis method and system
CN111143573B (en) Method for predicting knowledge-graph target node based on user feedback information
Wang et al. BB-GCN: A Bi-modal Bridged Graph Convolutional Network for Multi-label Chest X-Ray Recognition
Fujita et al. Advances and Trends in Artificial Intelligence. From Theory to Practice: 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, Kuala Lumpur, Malaysia, July 26–29, 2021, Proceedings, Part II
Abu et al. Approaches Of Deep Learning In Persuading The Contemporary Society For The Adoption Of New Trend Of AI Systems: A Review
CN114428864A (en) Knowledge graph construction method and device, electronic equipment and medium
CN114429822A (en) Medical record quality inspection method and device and storage medium
Vergara et al. A Schematic Review of Knowledge Reasoning Approaches Based on the Knowledge Graph
CN116662554B (en) Infectious disease aspect emotion classification method based on heterogeneous graph convolution neural network
de Oliveira Producing Decisions and Explanations: A Joint Approach Towards Explainable CNNs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant