WO2023124190A1 - 一种基于图神经网络的疾病诊断预测系统 - Google Patents

一种基于图神经网络的疾病诊断预测系统 Download PDF

Info

Publication number
WO2023124190A1
WO2023124190A1 PCT/CN2022/116970 CN2022116970W WO2023124190A1 WO 2023124190 A1 WO2023124190 A1 WO 2023124190A1 CN 2022116970 W CN2022116970 W CN 2022116970W WO 2023124190 A1 WO2023124190 A1 WO 2023124190A1
Authority
WO
WIPO (PCT)
Prior art keywords
disease
symptom
patient
graph
relationship
Prior art date
Application number
PCT/CN2022/116970
Other languages
English (en)
French (fr)
Inventor
李劲松
池胜强
王宇清
田雨
周天舒
Original Assignee
之江实验室
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 之江实验室 filed Critical 之江实验室
Priority to JP2023536567A priority Critical patent/JP7459386B2/ja
Publication of WO2023124190A1 publication Critical patent/WO2023124190A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Definitions

  • the invention belongs to the technical field of medical and health information, and in particular relates to a disease diagnosis and prediction system based on a graph neural network.
  • a knowledge graph is a heterogeneous graph network that contains multiple relationships. How to use the expert knowledge and electronic medical record data in the knowledge map at the same time, integrate knowledge and data for modeling, and use it for disease diagnosis and prediction plays an important role.
  • the existing methods for disease prediction based on graph neural network models lack methods for effectively fusing medical knowledge graphs and electronic medical record data to construct heterogeneous graph networks.
  • the main methods are as follows: (1) Data-based graph network modeling: construct graph network based on electronic medical record data, and use graph neural network model for disease prediction; this method does not make full use of existing medical knowledge sources. (2) Stage-by-stage modeling method of knowledge representation learning and disease prediction: perform representation learning on medical knowledge maps, obtain knowledge vector representations, and then integrate them into electronic medical record data for disease prediction; stage-by-stage training methods cannot obtain the best Knowledge representation suitable for disease prediction.
  • the present invention proposes a disease diagnosis and prediction system based on a graph neural network.
  • a disease diagnosis and prediction system based on graph neural network comprising:
  • Knowledge map construction module build a disease-symptom knowledge map based on medical knowledge sources
  • Data extraction and preprocessing module extract patient electronic medical record data from the electronic medical record system, including patient disease diagnosis and symptom data, and save it in triplet form;
  • a heterogeneous graph network which includes a disease-symptom subgraph constructed by extracting the disease-symptom relationship from the disease-symptom knowledge graph, and a patient-symptom subgraph constructed using triplet form patient disease diagnosis and symptom data.
  • Symptom subgraph constructed by extracting the disease-symptom relationship from the disease-symptom knowledge graph, and a patient-symptom subgraph constructed using triplet form patient disease diagnosis and symptom data.
  • the disease diagnosis model is composed of two parts: a graph encoder and a graph decoder;
  • the graph encoder is implemented based on a graph convolutional neural network, and the input is the initial embedding representation of the disease, symptom, and patient node obtained by using the disease-symptom co-occurrence matrix, as well as the disease-symptom adjacency matrix and patient-symptom adjacency matrix, different types
  • the nodes transmit information through the connection edge, and the disease, symptom, and patient node embedding representations are obtained through the node embedding representation update operation, and input to the graph decoder;
  • the graph decoder utilizes node embedding representations for multi-task learning and consists of three parts:
  • Multi-label hierarchical classification of patient disease diagnosis and prediction use the hierarchical structure of the disease to construct the disease hierarchical relationship, including the disease layer that needs to be diagnosed and predicted and the disease system classification layer based on medical knowledge; build a multi-label hierarchical classifier, design multiple Loss function for label hierarchical classification;
  • Disease comparison learning construct a disease pair system class discriminator, calculate the distance between the two diseases in the disease pair, and design a loss function for disease comparison learning;
  • Disease-symptom relationship learning construct a disease-symptom relationship learner, calculate the probability of a disease-symptom relationship in a disease-symptom pair, and design a loss function for disease-symptom relationship learning;
  • (4) Disease diagnosis model application module use the disease diagnosis model to predict the disease diagnosis of the input symptoms of new patients.
  • the disease-symptom knowledge graph includes two node types of disease and symptom, and a disease-symptom relationship.
  • the heterogeneous graph network is constructed based on the disease-symptom knowledge graph and electronic medical record data, including three node types of disease, symptom and patient, wherein the symptom is an intermediate node connecting the disease and the patient, and the heterogeneous graph
  • the network integrates the relationship subgraphs related to diseases and symptoms in the disease-symptom knowledge graph and the relationship subgraphs related to patients and symptoms in the electronic medical record data.
  • heterogeneous graph network G is expressed as:
  • node set V ⁇ v i
  • v i ⁇ D ⁇ S ⁇ P ⁇ , D, S, P are given disease set, symptom set and patient set respectively, N D , N S , and N P respectively denote the type of disease, symptom type, and number of patients;
  • the edge set E ⁇ (v i , r, v j )
  • the set R includes Disease-symptom relationship r DS and patient-symptom relationship r PS , the disease-symptom relationship is stored in the disease-symptom adjacency matrix, and the patient-symptom relationship is stored in the patient-symptom adjacency matrix.
  • the generation of the initial embedding representation of the node includes:
  • M ij indicating the number of symptoms S j among patients diagnosed with disease D i in the electronic medical record data
  • the initial embedding representations of different types of nodes are respectively input into a multi-layer perceptron, and the initial embedding representations of the same dimension are obtained, and then input into the graph encoder.
  • the node embedding of the l-th layer represents The calculation formula is:
  • is the activation function
  • is the activation function
  • is the activation function
  • is the activation function
  • is the activation function
  • is the activation function
  • is the activation function
  • is the activation function
  • NS (D i ) represents the set of adjacent symptom nodes of disease D i
  • N D (S i ) represents symptom S
  • the set of adjacent disease nodes of i , N P (S i ) represents the set of adjacent patient nodes of symptom S i
  • NS (P i ) represents the set of adjacent symptom nodes of patient P i .
  • the multi-label hierarchical classification of the patient's disease diagnosis prediction includes:
  • the disease category in the disease layer is denoted as N D
  • the disease system classification layer is denoted as Classify the number of disease systems
  • ⁇ ab cos(dist a ,dist b )
  • dist a and dist b represent the true label distribution of disease a and disease b respectively, and denote the true labels of patients P i presenting disease a and disease b, respectively.
  • the disease contrastive learning includes:
  • m is the lower bound of the distance between embedding representations of different disease system categories.
  • the disease-symptom relationship learning includes:
  • sigmoid( ) represents the sigmoid function
  • the beneficial effects of the present invention are: the present invention effectively integrates expert knowledge and electronic medical record data in the knowledge graph, and constructs a heterogeneous graph network.
  • the heterogeneous graph network On the heterogeneous graph network, the local information and global information of the heterogeneous graph network are learned by using the graph convolutional neural network method.
  • Disease diagnosis models can be trained end-to-end on both knowledge and data.
  • the supervision information on the knowledge relationship (disease comparison learning part and disease-symptom relationship learning part) is added at the same time, so as to ensure that the disease prediction task can effectively use knowledge and ensure that the knowledge representation is not affected.
  • the effect of data noise Aiming at the problem that the number of predicted diseases is large, and some diseases correspond to a limited number of patients, a multi-label hierarchical classification is designed to improve the prediction effect of diseases with few samples.
  • FIG. 1 is a structural diagram of a disease diagnosis and prediction system based on a graph neural network provided by an embodiment of the present invention
  • FIG. 2 is a structural diagram of a heterogeneous graph network provided by an embodiment of the present invention.
  • Fig. 3 is a structural diagram of a disease diagnosis model provided by an embodiment of the present invention.
  • Fig. 4 is a schematic diagram of the hierarchical structure of diseases provided by the embodiment of the present invention.
  • An embodiment of the present invention provides a disease diagnosis and prediction system based on a graph neural network.
  • the system includes a knowledge map building module, a data extraction and preprocessing module, a disease diagnosis model building module, and a disease diagnosis model application module. The implementation process of each module is described in detail below.
  • Knowledge map construction module build a disease-symptom knowledge map based on medical knowledge sources such as SNOMED-CT and HPO.
  • the disease-symptom knowledge map includes two node types of disease and symptom and a disease-symptom relationship.
  • Data extraction and preprocessing module Extract the patient's electronic medical record data from the electronic medical record system, including the patient's disease diagnosis and symptom data, and save it in triplet form.
  • Disease diagnosis model building block graph neural network learning and predictive modeling on disease-symptom knowledge graph and electronic medical record data.
  • Disease diagnosis model application module use the disease diagnosis model to predict the disease diagnosis of the input symptoms of new patients.
  • the specific function of the disease diagnosis model building block is: Given a disease set symptom set and the patient set Among them, ND , NS , and NP represent the type of disease, type of symptom, and number of patients, respectively.
  • ND , NS , and NP represent the type of disease, type of symptom, and number of patients, respectively.
  • a disease diagnosis model is able to predict a patient's disease diagnosis.
  • the implementation of the disease diagnosis model includes:
  • a heterogeneous graph network G containing three node types of disease, symptom and patient is constructed, in which the symptom is the intermediate node connecting the disease and the patient.
  • the heterogeneous graph network integrates the relationship subgraphs related to diseases and symptoms in the disease-symptom knowledge graph and the relationship subgraphs related to patients and symptoms in the electronic medical record data, including the disease-symptom subgraph G DS and the patient-symptom subgraph Figure G PS .
  • a heterogeneous graph network G can be expressed as:
  • node set V ⁇ v i
  • edge set E ⁇ (v i , r, v j )
  • the set R includes a disease-symptom relationship r DS and a patient-symptom relationship r PS , the disease-symptom relationship is stored in the disease-symptom adjacency matrix, and the patient-symptom relationship is stored in the patient-symptom adjacency matrix.
  • Figure 2 is an example of a heterogeneous graph network structure, including 4 patients P 1 , P 2 , P 3 , P 4 , 4 diseases D 1 , D 2 , D 3 , D 4 , and 4 symptoms S 1 , S 2 , S 3 , S 4 , and patient-symptom relationship, disease-symptom relationship.
  • Disease-symptom subgraph G DS extract the disease-symptom relationship from the disease-symptom knowledge graph to construct a disease-symptom subgraph.
  • Patient-symptom subgraph G PS use triplet form of patient disease diagnosis and symptom data to construct patient-symptom subgraph.
  • Figure 3 is an example of the structure of a disease diagnosis model.
  • the disease-symptom co-occurrence matrix uses the disease-symptom co-occurrence matrix to obtain the node initial embedding representation of disease, symptom and patient.
  • the node initial embedding representation and adjacency matrix are used as the input of the disease diagnosis model.
  • the disease diagnosis model consists of two parts: a graph encoder and a graph decoder. See (4)-(6) for the specific steps of generation of node initial embedding representation, graph encoder and graph decoder.
  • a disease-symptom co-occurrence matrix The i-th row and j-th column of the matrix M are denoted as M ij , indicating the number of symptoms S j among patients diagnosed with disease D i in the electronic medical record data. Then, row normalization is performed on M to obtain M D , and the initial embedding of disease D i is expressed as That is, the i-th row of M D ; column normalization is performed on M to obtain M S , and the initial embedding of symptom S i is expressed as That is, the i-th column of M S. Then, an initial embedding representation of patient P i is computed Calculated as follows:
  • the initial embedding representations of different types of nodes are respectively input into a multi-layer perceptron, and the initial embedding representations of the same dimension are obtained, which are then input into the graph encoder.
  • the graph encoder is implemented based on a graph convolutional neural network.
  • a graph encoder different types of nodes can integrate information from other types of nodes by transferring information through connected edges in the graph.
  • D i the node embedding of layer l represents The calculation formula is:
  • is the activation function
  • is the activation function
  • NS (D i ) represents the set of adjacent symptom nodes of disease node D i
  • N D (S i ) represents the set of adjacent disease nodes of symptom node S i
  • N P (S i ) represents the set of adjacent disease nodes of symptom node S i
  • the set of nodes, N S (P i ) represents the set of adjacent symptom nodes of patient node P i .
  • NS (D i ), N D (S i ) are obtained through the disease-symptom adjacency matrix
  • NP (S i ), NS (P i ) are obtained through the patient-symptom adjacency matrix.
  • the node embedding representation obtained by the graph encoder is input into the graph decoder. Multi-task learning with node embedding representations in graph decoders.
  • the L D layer is the disease in the disease set D , that is, the disease that needs to be diagnosed and predicted. is the number of disease system classifications in the LSD layer.
  • Input the node embedding representation of patient P i into N clf binary classifiers respectively to obtain N clf predicted probabilities, which are denoted as prob c , c 1, 2, . . . , N clf . in, Classifier
  • the corresponding label is the disease system classification of the patient; the classifier
  • ⁇ ab cos(dist a ,dist b )
  • dist a and dist b represent the true label distribution of disease a and disease b respectively, and denote the true labels of patients P i presenting disease a and disease b, respectively.
  • the diseases in the disease set D are combined in pairs to obtain a disease pair set DD, and the number of disease pairs is N DD .
  • the disease pair label If two diseases belong to different systematic classifications, then
  • m is the lower bound of the distance between embedding representations of different disease system categories.
  • a disease and a symptom are respectively selected from the disease set D and the symptom set S to obtain a set DS of disease-symptom pairs, and the number of disease-symptom pairs is N DS .
  • the disease-symptom pair label If no relationship exists, then
  • sigmoid( ⁇ ) represents the sigmoid function.
  • the loss function L of the disease diagnosis model is defined as follows:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

本发明公开了一种基于图神经网络的疾病诊断预测系统,系统包括知识图谱构建模块、数据提取与预处理模块、疾病诊断模型构建模块和疾病诊断模型应用模块。本发明有效整合知识图谱中的专家知识和电子病历数据,构建异构图网络。在异构图网络上,利用图卷积神经网络方法,学习异构图网络的局部信息和全局信息。疾病诊断模型可以对知识和数据同时进行端到端的训练。在模型优化目标中,除了优化疾病预测任务,同时加入对知识关系的监督信息,从而保证疾病预测任务可以有效利用知识,也保证知识表示不受数据噪声的影响。针对预测疾病数量多,部分疾病对应患者数量有限的问题,设计多标签层次分类,用于提高少样本类别疾病的预测效果。

Description

一种基于图神经网络的疾病诊断预测系统 技术领域
本发明属于医疗健康信息技术领域,尤其涉及一种基于图神经网络的疾病诊断预测系统。
背景技术
在医疗保健领域有许多组织良好的知识图谱,如国际疾病分类、DrugBank、临床指南与共识等,具有符合人类认知的层级信息、复杂关联关系。知识图谱是一种包含多种关系的异构图网络。如何同时利用知识图谱中的专家知识和电子病历数据,整合知识与数据进行建模,用于疾病诊断预测,具有重要作用。
现有基于图神经网络模型进行疾病预测的方法,缺少有效融合医学知识图谱与电子病历数据构建异构图网络的方法。目前主要的方法有以下几种:(1)基于数据的图网络建模:基于电子病历数据构建图网络,利用图神经网络模型进行疾病预测;该方法没有充分利用现有的医学知识源。(2)知识表示学习和疾病预测的分阶段建模方法:对医学知识图谱进行表示学习,得到知识的向量表示,再融入到电子病历数据中,进行疾病预测;分阶段的训练方法不能获得最适合疾病预测的知识表示。(3)只关注疾病预测任务的端到端建模方法:融合医学知识图谱和电子病历数据,构建异构图网络,利用图神经网络模型进行疾病预测;该方法虽然解决了上述两种方法中存在的不足,但是,由于模型只优化疾病预测任务,可能导致学习到的知识表示受到数据中噪声的影响。
发明内容
本发明针对现有技术的不足,提出一种基于图神经网络的疾病诊断预测系统。
本发明的目的是通过以下技术方案实现的:一种基于图神经网络的疾病诊断预测系统,该系统包括:
(1)知识图谱构建模块:基于医学知识源构建疾病-症状知识图谱;
(2)数据提取与预处理模块:从电子病历系统中抽取患者电子病历数据,包括患者疾病诊断和症状数据,用三元组形式保存;
(3)疾病诊断模型构建模块:对疾病-症状知识图谱和电子病历数据进行图神经网络学习和预测建模,包括:
构建异构图网络,所述异构图网络包括从疾病-症状知识图谱中提取疾病-症状关系构建的疾病-症状子图,以及利用三元组形式的患者疾病诊断和症状数据构建的患者-症状子图;
构建疾病诊断模型,所述疾病诊断模型由图编码器和图解码器两部分组成;
所述图编码器基于图卷积神经网络实现,输入为利用疾病-症状共现矩阵得到的疾病、症状、患者的节点初始嵌入表示,以及疾病-症状邻接矩阵和患者-症状邻接矩阵,不同类型的节点通过连接边传递信息,通过节点嵌入表示更新操作得到疾病、症状、患者节点嵌入表示,输入图解码器;
所述图解码器利用节点嵌入表示进行多任务学习,包括三个部分:
a)患者疾病诊断预测的多标签层次分类:利用疾病的层级结构构建疾病层级关系,包括需要进行诊断预测的疾病层和根据医学知识得到的疾病系统分类层;构建多标签层次分类器,设计多标签层次分类的损失函数;
b)疾病对比学习:构建疾病对系统类别判别器,计算疾病对中两种疾病之间的距离,设计疾病对比学习的损失函数;
c)疾病-症状关系学习:构建疾病-症状关系学习器,计算疾病-症状对中疾病与症状存在关联关系的概率,设计疾病-症状关系学习的损失函数;
将所述多标签层次分类的损失函数、所述疾病对比学习的损失函数和所述疾病-症状关系学习的损失函数加和得到疾病诊断模型的损失函数;
(4)疾病诊断模型应用模块:利用疾病诊断模型,对新患者的输入症状进行疾病诊断预测。
进一步地,所述知识图谱构建模块中,所述疾病-症状知识图谱包括疾病、症状两种节点类型和疾病-症状一种关系。
进一步地,所述异构图网络基于疾病-症状知识图谱和电子病历数据构建,包含疾病、症状及患者三种节点类型,其中症状是疾病与患者之间连接的中间节点,所述异构图网络集成了疾病-症状知识图谱中与疾病、症状相关的关系子图和电子病历数据中与患者、症状相关的关系子图。
进一步地,所述异构图网络G表示为:
G=(V,E)
其中,节点集V={v i|v i∈{D∪S∪P}},D、S、P分别为给定的疾病集、症状集和患者集,
Figure PCTCN2022116970-appb-000001
N D、N S、N P分别表示疾病种类、症状种类和患者数量;边集E={(v i,r,v j)|r∈R,v i,v j∈V},集合R包括疾病-症状关系r DS和患者-症状关系r PS,所述疾病-症状关系存储在疾病-症状邻接矩阵中,所述患者-症状关系存储在患者-症状邻接矩阵中。
进一步地,所述节点初始嵌入表示的生成包括:
构建疾病-症状共现矩阵
Figure PCTCN2022116970-appb-000002
矩阵M的第i行、第j列记为M ij,表示电子病历数据中诊断为疾病D i的患者中出现症状S j的数量;
对M进行行归一化,得到M D,疾病D i的初始嵌入表示为
Figure PCTCN2022116970-appb-000003
即M D的第i行;
对M进行列归一化,得到M S,症状S i的初始嵌入表示为
Figure PCTCN2022116970-appb-000004
即M S的第i列;
计算患者P i的初始嵌入表示
Figure PCTCN2022116970-appb-000005
计算公式如下:
Figure PCTCN2022116970-appb-000006
其中,
Figure PCTCN2022116970-appb-000007
为患者P i的症状数量。
进一步地,将不同类型的节点初始嵌入表示分别输入一个多层感知器,得到相同维度的初始嵌入表示,再输入图编码器中。
进一步地,所述图编码器中,对于疾病D i,第l层的节点嵌入表示
Figure PCTCN2022116970-appb-000008
计算公式为:
Figure PCTCN2022116970-appb-000009
对于症状S i,第l层的节点嵌入表示
Figure PCTCN2022116970-appb-000010
计算公式为:
Figure PCTCN2022116970-appb-000011
对于患者P i,第l层的节点嵌入表示
Figure PCTCN2022116970-appb-000012
计算公式为:
Figure PCTCN2022116970-appb-000013
其中,σ是激活函数,
Figure PCTCN2022116970-appb-000014
分别是第l层疾病诊断模型训练得到的疾病-症状关联权重矩阵和患者-症状关联权重矩阵;
Figure PCTCN2022116970-appb-000015
分别是疾病D i、症状S i、患者P i在第l-1层的节点嵌入表示;N S(D i)表示疾病D i相邻症状节点的集合,N D(S i)表示症状S i相邻疾病节点的集合,N P(S i)表示症状S i相邻患者节点的集合,N S(P i)表示患者P i相邻症状节点的集合。
进一步地,所述图解码器中,所述患者疾病诊断预测的多标签层次分类包括:
构建疾病层级关系,疾病层的疾病种类记为N D,疾病系统分类层记为
Figure PCTCN2022116970-appb-000016
Figure PCTCN2022116970-appb-000017
Figure PCTCN2022116970-appb-000018
为疾病系统分类数量;
构建包含N clf个二分类器的多标签层次分类器,N clf个二分类器记为clf c,c=1,2,...,N clf
Figure PCTCN2022116970-appb-000019
将患者P i的节点嵌入表示分别输入N clf个二分类器,得到N clf个预测概率,记为prob c,c=1,2,...,N clf,其中,二分类器
Figure PCTCN2022116970-appb-000020
对应的标签为患者的疾病系统分类;二分类器
Figure PCTCN2022116970-appb-000021
对应的标签为患者的疾病诊断,对应的模型参数为w c,c=1,2,...,N D
计算患者P i出现疾病D j的概率
Figure PCTCN2022116970-appb-000022
其中,
Figure PCTCN2022116970-appb-000023
Figure PCTCN2022116970-appb-000024
为二分类器
Figure PCTCN2022116970-appb-000025
预测患者是否出现疾病D j的概率;假设疾病D j的系统分类为SD c
Figure PCTCN2022116970-appb-000026
为二分类器clf c预测患者是否出现疾病系统分类SD c的概率;
计算多标签层次分类的损失函数L clf,公式如下:
L clf=L p-diag+L diag+L sparse
Figure PCTCN2022116970-appb-000027
Figure PCTCN2022116970-appb-000028
Figure PCTCN2022116970-appb-000029
其中,
Figure PCTCN2022116970-appb-000030
为患者P i出现疾病D j的真实标签,
Figure PCTCN2022116970-appb-000031
为患者P i的疾病诊断对应的疾病系统分类的真实标签,||·|| 1表示L1范数,α ab为疾病a和疾病b之间的相似性,计算公式如下:
α ab=cos(dist a,dist b)
其中,dist a,dist b分别表示疾病a和疾病b的真实标签分布,
Figure PCTCN2022116970-appb-000032
Figure PCTCN2022116970-appb-000033
Figure PCTCN2022116970-appb-000034
分别表示患者P i出现疾病a和疾病b的真实标签。
进一步地,所述图解码器中,所述疾病对比学习包括:
将疾病集D中的疾病进行两两组合,得到疾病对集合DD,疾病对数量为N DD;对DD中的任意一个疾病对DD i,如果两种疾病属于同一个系统分类,则疾病对标签
Figure PCTCN2022116970-appb-000035
如果两种疾病属于不同的系统分类,则
Figure PCTCN2022116970-appb-000036
构建疾病对系统类别判别器clf discri,将疾病对DD i中两种疾病的节点嵌入表示e i1,e i2输入clf discri中,计算两种疾病之间的距离
Figure PCTCN2022116970-appb-000037
Figure PCTCN2022116970-appb-000038
其中,||·|| 2表示L2范数;
计算疾病对比学习的损失函数L dis-dis,公式如下:
Figure PCTCN2022116970-appb-000039
其中,m为不同疾病系统类别嵌入表示之间距离的下界。
进一步地,所述图解码器中,所述疾病-症状关系学习包括:
从疾病集D和症状集S中分别选取一种疾病和一种症状,得到疾病-症状对集合DS,疾病-症状对数量为N DS;对DS中的任意一个疾病-症状对DS i,如果疾病-症状在疾病-症状知识图谱中存在关联关系,则疾病-症状对标签
Figure PCTCN2022116970-appb-000040
如果不存在关联关系,则
Figure PCTCN2022116970-appb-000041
构建疾病-症状关系学习器clf rel,将DS i中的疾病和症状的节点嵌入表示e id,e is输入clf rel中,计算DS i中疾病与症状存在关联关系的概率
Figure PCTCN2022116970-appb-000042
Figure PCTCN2022116970-appb-000043
其中,sigmoid(·)表示sigmoid函数;
计算疾病-症状关系学习的损失函数L dis-symp,公式如下:
Figure PCTCN2022116970-appb-000044
本发明的有益效果是:本发明有效整合知识图谱中的专家知识和电子病历数据,构建异构图网络。在异构图网络上,利用图卷积神经网络方法,学习异构图网络的局部信息和全局信息。疾病诊断模型可以对知识和数据同时进行端到端的训练。在模型优化目标中,除了优化疾病预测任务,同时加入对知识关系的监督信息(疾病对比学习部分和疾病-症状关系学习部分),从而保证疾病预测任务可以有效利用知识,也保证知识表示不受数据噪声的影响。针对预测疾病数量多,部分疾病对应患者数量有限的问题,设计多标签层次分类,用于提高少样本类别疾病的预测效果。
附图说明
图1为本发明实施例提供的基于图神经网络的疾病诊断预测系统结构图;
图2为本发明实施例提供的异构图网络结构图;
图3为本发明实施例提供的疾病诊断模型结构图;
图4为本发明实施例提供的疾病的层级结构示意图。
具体实施方式
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图对本发明的具体实施方式做详细的说明。
在下面的描述中阐述了很多具体细节以便于充分理解本发明,但是本发明还可以采用其它不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本发明内涵的情况下做类似推广,因此本发明不受下面公开的具体实施例的限制。
本发明实施例提供一种基于图神经网络的疾病诊断预测系统,如图1所示,该系统包括知识图谱构建模块、数据提取与预处理模块、疾病诊断模型构建模块和疾病诊断模型应用模块,下面详细阐述每个模块的实现过程。
知识图谱构建模块:基于SNOMED-CT、HPO等医学知识源构建疾病-症状知识图谱,所述疾病-症状知识图谱包括疾病、症状两种节点类型和疾病-症状一种关系。
数据提取与预处理模块:从电子病历系统中抽取患者的电子病历数据,包括患者疾病诊断和症状数据,用三元组形式保存。
疾病诊断模型构建模块:对疾病-症状知识图谱和电子病历数据进行图神经网络学习和预测建模。
疾病诊断模型应用模块:利用疾病诊断模型,对新患者的输入症状进行疾病诊断预测。
疾病诊断模型构建模块的具体功能为:给定疾病集
Figure PCTCN2022116970-appb-000045
症状集
Figure PCTCN2022116970-appb-000046
和患者集
Figure PCTCN2022116970-appb-000047
其中,N D、N S、N P分别表示疾病种类、症状种类和患者数量。将疾病诊断预测看作是多标签分类问题,即在给定患者症状的情况下,疾病诊断模型能够预测患者的疾病诊断。
疾病诊断模型的实现包括:
(1)异构图网络构建
利用疾病-症状知识图谱和电子病历数据,构建一个包含疾病、症状以及患者三种节点类型的异构图网络G,其中症状是疾病与患者之间连接的中间节点。该异构图网络集成了疾病-症状知识图谱中与疾病、症状相关的关系子图和电子病历数据中与患者、症状相关的关系子图,包括疾病-症状子图G DS和患者-症状子图G PS
异构图网络G可以表示为:
G=(V,E)
其中,节点集V={v i|v i∈{D∪S∪P}},边集E={(v i,r,v j)|r∈R,v i,v j∈V},集合R包括疾病-症状关系r DS和患者-症状关系r PS,疾病-症状关系存储在疾病-症状邻接矩阵中,患者-症状关系存储在患者-症状邻接矩阵中。
图2为一异构图网络结构示例,包括4个患者P 1,P 2,P 3,P 4、4种疾病D 1,D 2,D 3,D 4、4种症状S 1,S 2,S 3,S 4,以及患者-症状关系、疾病-症状关系。
(2)子图构建
疾病-症状子图G DS:从疾病-症状知识图谱中提取疾病-症状关系构建疾病-症状子图。
患者-症状子图G PS:利用三元组形式的患者疾病诊断和症状数据,构建患者-症状子图。
(3)疾病诊断模型结构
图3为疾病诊断模型结构示例。利用疾病-症状共现矩阵,得到疾病、症状、患者的节点初始嵌入表示。将节点初始嵌入表示和邻接矩阵作为疾病诊断模型的输入。疾病诊断模型由图编码器和图解码器两部分组成。节点初始嵌入表示的生成、图编码器和图解码器的具体步骤见(4)-(6)。
(4)节点初始嵌入表示的生成
首先,构建一个疾病-症状共现矩阵
Figure PCTCN2022116970-appb-000048
矩阵M的第i行、第j列记为M ij,表示电子病历数据中诊断为疾病D i的患者中,出现症状S j的数量。接着,对M进行行归一化,得到M D,疾病D i的初始嵌入表示为
Figure PCTCN2022116970-appb-000049
即M D的第i行;对M进行列归一化,得到M S,症状S i的初始嵌入表示为
Figure PCTCN2022116970-appb-000050
即M S的第i列。然后,计算患者P i的初始嵌入表示
Figure PCTCN2022116970-appb-000051
计算公式如下:
Figure PCTCN2022116970-appb-000052
其中,
Figure PCTCN2022116970-appb-000053
为患者P i的症状数量。
(5)图编码器
首先,不同类型的节点初始嵌入表示分别输入一个多层感知器,得到相同维度的初始嵌入表示,随后输入图编码器中。图编码器基于图卷积神经网络实现。
在图编码器中,不同类型的节点可以通过图中的连接边传递信息,来整合其他类型节点的信息。对于疾病D i,第l层的节点嵌入表示
Figure PCTCN2022116970-appb-000054
计算公式为:
Figure PCTCN2022116970-appb-000055
对于症状S i,第l层的节点嵌入表示
Figure PCTCN2022116970-appb-000056
计算公式为:
Figure PCTCN2022116970-appb-000057
对于患者P i,第l层的节点嵌入表示
Figure PCTCN2022116970-appb-000058
计算公式为:
Figure PCTCN2022116970-appb-000059
其中,σ是激活函数,
Figure PCTCN2022116970-appb-000060
分别是第l层疾病诊断模型训练得到的疾病-症状关联权重矩阵和患者-症状关联权重矩阵;
Figure PCTCN2022116970-appb-000061
分别是疾病节点D i、症状节点S i、患者节点P i在第l-1层的节点嵌入表示,图编码器的总层数为L N。N S(D i)表示疾病节点D i相邻症状节点的集合,N D(S i)表示症状节点S i相邻疾病节点的集合,N P(S i)表示症状节点S i相邻患者节点的集合,N S(P i)表示患者节点P i相邻症状节点的集合。N S(D i)、N D(S i)通过疾病-症状邻接矩阵获得,N P(S i)、N S(P i)通过患者-症状邻接矩阵获得。通过反复执行上述节点嵌入表示更新操作L N次,得到能够充分捕获关联关系的疾病、症状、患者节点嵌入表示。
(6)图解码器
将图编码器得到的节点嵌入表示输入图解码器中。在图解码器中,利用节点嵌入表示进行多任务学习。
第一,进行患者疾病诊断预测的多标签层次分类。
首先,利用疾病的层级结构构建疾病层级关系,示例如图4所示。其中,L D层是疾病集D中的疾病,即需要进行诊断预测的疾病,疾病种类如前所述为N D;L SD层是根据医学知识对疾病进行的系统分类,记为
Figure PCTCN2022116970-appb-000062
Figure PCTCN2022116970-appb-000063
为L SD层的疾病系统分类数量。
接着,构建包含N clf个二分类器的多标签层次分类器,N clf个二分类器记为clf c,c=1,2,...,N clf。将患者P i的节点嵌入表示分别输入N clf个二分类器,得到N clf个预测概率,记为prob c,c=1,2,...,N clf。其中,
Figure PCTCN2022116970-appb-000064
分类器
Figure PCTCN2022116970-appb-000065
对应的标签为患者的疾病系统分类;分类器
Figure PCTCN2022116970-appb-000066
对应的标签为患者的疾病诊断,对应的模型参数为w c,c=1,2,...,N D
然后,计算患者P i出现疾病D j的概率
Figure PCTCN2022116970-appb-000067
其中,
Figure PCTCN2022116970-appb-000068
Figure PCTCN2022116970-appb-000069
为分类器
Figure PCTCN2022116970-appb-000070
预测患者是否出现疾病D j的概率;假设疾病D j的系统分类为SD c
Figure PCTCN2022116970-appb-000071
为分类器clf c预测患者是否出现疾病系统分类SD c的概率。
最后,计算多标签层次分类的损失函数L clf,公式如下:
L clf=L p-diag+L diag+L sparse
Figure PCTCN2022116970-appb-000072
Figure PCTCN2022116970-appb-000073
Figure PCTCN2022116970-appb-000074
其中,
Figure PCTCN2022116970-appb-000075
为患者P i出现疾病D j的真实标签,
Figure PCTCN2022116970-appb-000076
为患者P i的疾病诊断对应的系统分类的真实标签,||·|| 1表示L1范数,α ab为疾病a和疾病b之间的相似性,计算公式如下:
α ab=cos(dist a,dist b)
其中,dist a,dist b分别表示疾病a和疾病b的真实标签分布,
Figure PCTCN2022116970-appb-000077
Figure PCTCN2022116970-appb-000078
Figure PCTCN2022116970-appb-000079
分别表示患者P i出现疾病a和疾病b的真实标签。
第二,进行疾病对比学习。
首先,将疾病集D中的疾病进行两两组合,得到疾病对集合DD,疾病对数量为N DD。对DD中的任意一个疾病对DD i,如果两种疾病属于同一个系统分类,则疾病对标签
Figure PCTCN2022116970-appb-000080
如果两种疾病属于不同的系统分类,则
Figure PCTCN2022116970-appb-000081
接着,构建疾病对系统类别判别器clf discri。将疾病对DD i中两种疾病的节点嵌入表示e i1,e i2输入clf discri中,计算两种疾病之间的距离
Figure PCTCN2022116970-appb-000082
Figure PCTCN2022116970-appb-000083
其中,||·|| 2表示L2范数。
最后,计算疾病对比学习的损失函数L dis-dis,公式如下:
Figure PCTCN2022116970-appb-000084
其中,m为不同疾病系统类别嵌入表示之间距离的下界。
第三,进行疾病-症状关系学习。
首先,从疾病集D和症状集S中分别选取一种疾病和一种症状,得到疾病-症状对集合DS,疾病-症状对数量为N DS。对DS中的任意一个疾病-症状对DS i,如果该疾病-症状在疾病-症状知识图谱中存在关联关系,则疾病-症状对标签
Figure PCTCN2022116970-appb-000085
如果不存在关联关系,则
Figure PCTCN2022116970-appb-000086
接着,构建疾病-症状关系学习器clf rel,将DS i中的疾病和症状的节点嵌入表示e id,e is输入clf rel中,计算疾病-症状对DS i中疾病与症状存在关联关系的概率
Figure PCTCN2022116970-appb-000087
Figure PCTCN2022116970-appb-000088
其中,sigmoid(·)表示sigmoid函数。
最后,计算疾病-症状关系学习的损失函数L dis-symp,公式如下:
Figure PCTCN2022116970-appb-000089
疾病诊断模型的损失函数L定义如下:
L=L clf+L dis-dis+L dis-symp
以上所述仅是本发明的优选实施方式,虽然本发明已以较佳实施例披露如上,然而并非用以限定本发明。任何熟悉本领域的技术人员,在不脱离本发明技术方案范围情况下,都可利用上述揭示的方法和技术内容对本发明技术方案做出许多可能的变动和修饰,或修改为等同变化的等效实施例。因此,凡是未脱离本发明技术方案的内容,依据本发明的技术实质对以上实施例所做的任何的简单修改、等同变化及修饰,均仍属于本发明技术方案保护的范围内。

Claims (9)

  1. 一种基于图神经网络的疾病诊断预测系统,其特征在于,包括:
    (1)知识图谱构建模块:基于医学知识源构建疾病-症状知识图谱;
    (2)数据提取与预处理模块:从电子病历系统中抽取患者电子病历数据,包括患者疾病诊断和症状数据,用三元组形式保存;
    (3)疾病诊断模型构建模块:对疾病-症状知识图谱和电子病历数据进行图神经网络学习和预测建模,包括:
    构建异构图网络,所述异构图网络包括从疾病-症状知识图谱中提取疾病-症状关系构建的疾病-症状子图,以及利用三元组形式的患者疾病诊断和症状数据构建的患者-症状子图;
    构建疾病诊断模型,所述疾病诊断模型由图编码器和图解码器两部分组成;
    所述图编码器基于图卷积神经网络实现,输入为利用疾病-症状共现矩阵得到的疾病、症状、患者的节点初始嵌入表示,以及疾病-症状邻接矩阵和患者-症状邻接矩阵,不同类型的节点通过连接边传递信息,通过节点嵌入表示更新操作得到疾病、症状、患者节点嵌入表示,输入图解码器;
    所述图解码器利用节点嵌入表示进行多任务学习,包括三个部分:
    a)患者疾病诊断预测的多标签层次分类:
    利用疾病的层级结构构建疾病层级关系,包括需要进行诊断预测的疾病层和根据医学知识得到的疾病系统分类层,疾病层的疾病种类记为N D,疾病系统分类层记为SD i
    Figure PCTCN2022116970-appb-100001
    为疾病系统分类数量;
    构建包含N clf个二分类器的多标签层次分类器,N clf个二分类器记为clf c,c=1,2,...,N clf
    Figure PCTCN2022116970-appb-100002
    将患者P i的节点嵌入表示分别输入N clf个二分类器,得到N clf个预测概率,记为prob c,c=1,2,...,N clf,其中,二分类器clf c
    Figure PCTCN2022116970-appb-100003
    对应的标签为患者的疾病系统分类;二分类器clf c
    Figure PCTCN2022116970-appb-100004
    对应的标签为患者的疾病诊断,对应的模型参数为w c,c=1,2,...,N D
    计算患者P i出现疾病D j的概率
    Figure PCTCN2022116970-appb-100005
    其中,
    Figure PCTCN2022116970-appb-100006
    为二分类器
    Figure PCTCN2022116970-appb-100007
    预测患者是否出现疾病D j的概率;假设疾病D j的系统分类为SD c
    Figure PCTCN2022116970-appb-100008
    为二分类器clf c预测患者是否出现疾病系统分类SD c的概率;
    计算多标签层次分类的损失函数L clf,公式如下:
    L clf=L p-diag+L diag+L sparse
    Figure PCTCN2022116970-appb-100009
    Figure PCTCN2022116970-appb-100010
    Figure PCTCN2022116970-appb-100011
    其中,N P表示患者数量,
    Figure PCTCN2022116970-appb-100012
    为患者P i出现疾病D j的真实标签,
    Figure PCTCN2022116970-appb-100013
    为患者P i的疾病诊断对应的疾病系统分类的真实标签,||·|| 1表示L1范数,α ab为疾病a和疾病b之间的相似性,计算公式如下:
    α ab=cos(dist a,dist b)
    其中,dist a,dist b分别表示疾病a和疾病b的真实标签分布,
    Figure PCTCN2022116970-appb-100014
    Figure PCTCN2022116970-appb-100015
    分别表示患者P i出现疾病a和疾病b的真实标签;
    b)疾病对比学习:构建疾病对系统类别判别器,计算疾病对中两种疾病之间的距离,设计疾病对比学习的损失函数;
    c)疾病-症状关系学习:构建疾病-症状关系学习器,计算疾病-症状对中疾病与症状存在关联关系的概率,设计疾病-症状关系学习的损失函数;
    将所述多标签层次分类的损失函数、所述疾病对比学习的损失函数和所述疾病-症状关系学习的损失函数加和得到疾病诊断模型的损失函数;
    (4)疾病诊断模型应用模块:利用疾病诊断模型,对新患者的输入症状进行疾病诊断预测。
  2. 根据权利要求1所述的基于图神经网络的疾病诊断预测系统,其特征在于,所述知识图谱构建模块中,所述疾病-症状知识图谱包括疾病、症状两种节点类型和疾病-症状一种关系。
  3. 根据权利要求1所述的基于图神经网络的疾病诊断预测系统,其特征在于,所述异构图网络基于疾病-症状知识图谱和电子病历数据构建,包含疾病、症状及患者三种节点类型,其中症状是疾病与患者之间连接的中间节点,所述异构图网络集成了疾病-症状知识图谱中与疾病、症状相关的关系子图和电子病历数据中与患者、症状相关的关系子图。
  4. 根据权利要求1所述的基于图神经网络的疾病诊断预测系统,其特征在于,所述异构 图网络G表示为:
    G=(V,E)
    其中,节点集V={v i|v i∈{D∪S∪P}},D、S、P分别为给定的疾病集、症状集和患者集,
    Figure PCTCN2022116970-appb-100016
    N D、N S、N P分别表示疾病种类、症状种类和患者数量;边集E={(v i,r,v j)|r∈R,v i,v j∈V},集合R包括表示疾病-症状关系r DS和患者-症状关系r PS,所述疾病-症状关系存储在疾病-症状邻接矩阵中,所述患者-症状关系存储在患者-症状邻接矩阵中。
  5. 根据权利要求4所述的基于图神经网络的疾病诊断预测系统,其特征在于,所述节点初始嵌入表示的生成包括:
    构建疾病-症状共现矩阵
    Figure PCTCN2022116970-appb-100017
    矩阵M的第i行、第j列记为M ij,表示电子病历数据中诊断为疾病D i的患者中出现症状S j的数量;
    对M进行行归一化,得到M D,疾病D i的初始嵌入表示为
    Figure PCTCN2022116970-appb-100018
    即M D的第i行;
    对M进行列归一化,得到M S,症状S i的初始嵌入表示为
    Figure PCTCN2022116970-appb-100019
    即M S的第i列;
    计算患者P i的初始嵌入表示
    Figure PCTCN2022116970-appb-100020
    计算公式如下:
    Figure PCTCN2022116970-appb-100021
    其中,
    Figure PCTCN2022116970-appb-100022
    为患者P i的症状数量。
  6. 根据权利要求1所述的基于图神经网络的疾病诊断预测系统,其特征在于,将不同类型的节点初始嵌入表示分别输入一个多层感知器,得到相同维度的初始嵌入表示,再输入图编码器中。
  7. 根据权利要求5所述的基于图神经网络的疾病诊断预测系统,其特征在于,所述图编码器中,对于疾病D i,第l层的节点嵌入表示
    Figure PCTCN2022116970-appb-100023
    计算公式为:
    Figure PCTCN2022116970-appb-100024
    对于症状S i,第l层的节点嵌入表示
    Figure PCTCN2022116970-appb-100025
    计算公式为:
    Figure PCTCN2022116970-appb-100026
    对于患者P i,第l层的节点嵌入表示
    Figure PCTCN2022116970-appb-100027
    计算公式为:
    Figure PCTCN2022116970-appb-100028
    其中,σ是激活函数,
    Figure PCTCN2022116970-appb-100029
    分别是第l层疾病诊断模型训练得到的疾病-症状关联权重矩阵和患者-症状关联权重矩阵;
    Figure PCTCN2022116970-appb-100030
    分别是疾病D i、症状S i、患者P i在第l-1层的节点嵌入表示;N S(D i)表示疾病D i相邻症状节点的集合,N D(S i)表示症状S i相邻疾病节点的集合,N P(S i)表示症状S i相邻患者节点的集合,N S(P i)表示患者P i相邻症状节点的集合。
  8. 根据权利要求7所述的基于图神经网络的疾病诊断预测系统,其特征在于,所述图解码器中,所述疾病对比学习包括:
    将疾病集D中的疾病进行两两组合,得到疾病对集合DD,疾病对数量为N DD;对DD中的任意一个疾病对DD i,如果两种疾病属于同一个系统分类,则疾病对标签
    Figure PCTCN2022116970-appb-100031
    如果两种疾病属于不同的系统分类,则
    Figure PCTCN2022116970-appb-100032
    构建疾病对系统类别判别器clf discri,将疾病对DD i中两种疾病的节点嵌入表示e i1,e i2输入clf discri中,计算两种疾病之间的距离
    Figure PCTCN2022116970-appb-100033
    Figure PCTCN2022116970-appb-100034
    其中,||·|| 2表示L2范数;
    计算疾病对比学习的损失函数L dis-dis,公式如下:
    Figure PCTCN2022116970-appb-100035
    其中,m为不同疾病系统类别嵌入表示之间距离的下界。
  9. 根据权利要求7所述的基于图神经网络的疾病诊断预测系统,其特征在于,所述图解码器中,所述疾病-症状关系学习包括:
    从疾病集D和症状集S中分别选取一种疾病和一种症状,得到疾病-症状对集合DS,疾病-症状对数量为N DS;对DS中的任意一个疾病-症状对DS i,如果疾病-症状在疾病-症状知识图谱中存在关联关系,则疾病-症状对标签
    Figure PCTCN2022116970-appb-100036
    如果不存在关联关系,则
    Figure PCTCN2022116970-appb-100037
    构建疾病-症状关系学习器clf rel,将DS i中的疾病和症状的节点嵌入表示e id,e is输入clf rel中,计算DS i中疾病与症状存在关联关系的概率
    Figure PCTCN2022116970-appb-100038
    Figure PCTCN2022116970-appb-100039
    其中,sigmoid(·)表示sigmoid函数;
    计算疾病-症状关系学习的损失函数L dis-symp,公式如下:
    Figure PCTCN2022116970-appb-100040
PCT/CN2022/116970 2021-12-27 2022-09-05 一种基于图神经网络的疾病诊断预测系统 WO2023124190A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023536567A JP7459386B2 (ja) 2021-12-27 2022-09-05 グラフニューラルネットワークに基づく疾患診断予測システム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111609275.1 2021-12-27
CN202111609275.1A CN113990495B (zh) 2021-12-27 2021-12-27 一种基于图神经网络的疾病诊断预测系统

Publications (1)

Publication Number Publication Date
WO2023124190A1 true WO2023124190A1 (zh) 2023-07-06

Family

ID=79734519

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116970 WO2023124190A1 (zh) 2021-12-27 2022-09-05 一种基于图神经网络的疾病诊断预测系统

Country Status (3)

Country Link
JP (1) JP7459386B2 (zh)
CN (1) CN113990495B (zh)
WO (1) WO2023124190A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562266A (zh) * 2023-07-10 2023-08-08 中国医学科学院北京协和医院 文本分析方法、计算机设备及计算机可读存储介质
CN116631641A (zh) * 2023-07-21 2023-08-22 之江实验室 一种集成自适应相似患者图的疾病预测装置
CN117010494A (zh) * 2023-09-27 2023-11-07 之江实验室 一种基于因果表示学习的医学数据生成方法及系统
CN117235487A (zh) * 2023-10-12 2023-12-15 北京大学第三医院(北京大学第三临床医学院) 一种用于预测哮喘患者住院事件的特征提取方法和系统
CN117409911A (zh) * 2023-10-13 2024-01-16 四川大学 一种基于多视图对比学习的电子病历表示学习方法
CN117438023A (zh) * 2023-10-31 2024-01-23 灌云县南岗镇卫生院 基于大数据的医院信息管理方法及系统
CN117894422A (zh) * 2024-03-18 2024-04-16 攀枝花学院 一种基于icu重症监测的数据可视化方法及系统

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113990495B (zh) * 2021-12-27 2022-04-29 之江实验室 一种基于图神经网络的疾病诊断预测系统
CN114496283A (zh) * 2022-02-15 2022-05-13 山东大学 一种基于路径推理的疾病预测系统、存储介质及设备
CN114496234B (zh) * 2022-04-18 2022-07-19 浙江大学 一种基于认知图谱的全科患者个性化诊疗方案推荐系统
CN114898879B (zh) * 2022-05-10 2023-04-21 电子科技大学 一种基于图表示学习的慢病风险预测方法
CN114664452B (zh) * 2022-05-20 2022-09-23 之江实验室 一种基于因果校验数据生成的全科多疾病预测系统
CN115019923B (zh) * 2022-07-11 2023-04-28 中南大学 一种基于对比学习的电子病历数据预训练方法
CN115359870B (zh) * 2022-10-20 2023-03-24 之江实验室 一种基于层次图神经网络的疾病诊疗过程异常识别系统
CN115424724B (zh) * 2022-11-04 2023-01-24 之江实验室 一种多模态图森林的肺癌淋巴结转移辅助诊断系统
CN115862848B (zh) * 2023-02-15 2023-05-30 之江实验室 基于临床数据筛选和医学知识图谱的疾病预测系统和装置
CN116072298B (zh) * 2023-04-06 2023-08-15 之江实验室 一种基于层级标记分布学习的疾病预测系统
CN116646072A (zh) * 2023-05-18 2023-08-25 肇庆医学高等专科学校 一种前列腺诊断神经网络模型的训练方法及装置
CN116936108B (zh) * 2023-09-19 2024-01-02 之江实验室 一种面向不平衡数据的疾病预测系统
CN117012374B (zh) * 2023-10-07 2024-01-26 之江实验室 一种融合事件图谱和深度强化学习的医疗随访系统及方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2881354A1 (en) * 2012-04-02 2013-10-10 David B. Jackson Systems and methods for disease knowledge modeling and clinical decision support
US20200051694A1 (en) * 2018-08-10 2020-02-13 Tal Goldberg Hybrid knowledge graph for healthcare applications
CN111370127A (zh) * 2020-01-14 2020-07-03 之江实验室 一种基于知识图谱的跨科室慢性肾病早期诊断决策支持系统
CN113434626A (zh) * 2021-08-27 2021-09-24 之江实验室 一种多中心医学诊断知识图谱表示学习方法及系统
CN113643821A (zh) * 2021-10-13 2021-11-12 浙江大学 一种多中心知识图谱联合决策支持方法与系统
CN113990495A (zh) * 2021-12-27 2022-01-28 之江实验室 一种基于图神经网络的疾病诊断预测系统

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7774143B2 (en) * 2002-04-25 2010-08-10 The United States Of America As Represented By The Secretary, Department Of Health And Human Services Methods for analyzing high dimensional data for classifying, diagnosing, prognosticating, and/or predicting diseases and other biological states
PL407244A1 (pl) * 2014-02-18 2015-08-31 Instytut Biochemii I Biofizyki Polskiej Akademii Nauk Bioczujnik elektrochemiczny do wykrywania białka S100B
US20150356272A1 (en) * 2014-06-10 2015-12-10 Taipei Medical University Prescription analysis system and method for applying probabilistic model based on medical big data
US20190155993A1 (en) * 2017-11-20 2019-05-23 ThinkGenetic Inc. Method and System Supporting Disease Diagnosis
CN108154928A (zh) * 2017-12-27 2018-06-12 北京嘉和美康信息技术有限公司 一种疾病诊断方法及装置
CN108198620B (zh) * 2018-01-12 2022-03-22 洛阳飞来石软件开发有限公司 一种基于深度学习的皮肤疾病智能辅助诊断系统
CN109036553B (zh) * 2018-08-01 2022-03-29 北京理工大学 一种基于自动抽取医疗专家知识的疾病预测方法
CN109784387A (zh) * 2018-12-29 2019-05-21 天津南大通用数据技术股份有限公司 基于神经网络和贝叶斯模型的多层次递进分类方法及系统
CN110277165B (zh) * 2019-06-27 2021-06-04 清华大学 基于图神经网络的辅助诊断方法、装置、设备及存储介质
CN111382272B (zh) * 2020-03-09 2022-11-01 西南交通大学 一种基于知识图谱的电子病历icd自动编码方法
CN111666477B (zh) 2020-06-19 2023-10-20 腾讯科技(深圳)有限公司 一种数据处理方法、装置、智能设备及介质
CN111834012A (zh) * 2020-07-14 2020-10-27 中国中医科学院中医药信息研究所 基于深度学习和注意力机制的中医证候诊断方法及装置
CN111914562B (zh) 2020-08-21 2022-10-14 腾讯科技(深圳)有限公司 电子信息分析方法、装置、设备及可读存储介质
CN112037912B (zh) * 2020-09-09 2023-07-11 平安科技(深圳)有限公司 基于医疗知识图谱的分诊模型训练方法、装置及设备
CN112263220A (zh) * 2020-10-23 2021-01-26 北京文通图像识别技术研究中心有限公司 一种内分泌疾病智能诊断系统
CN113674856B (zh) 2021-04-15 2023-12-12 腾讯科技(深圳)有限公司 基于人工智能的医学数据处理方法、装置、设备及介质
CN113656589B (zh) 2021-04-19 2023-07-04 腾讯科技(深圳)有限公司 对象属性确定方法、装置、计算机设备及存储介质
CN113409892B (zh) * 2021-05-13 2023-04-25 西安电子科技大学 基于图神经网络的miRNA-疾病关联关系预测方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2881354A1 (en) * 2012-04-02 2013-10-10 David B. Jackson Systems and methods for disease knowledge modeling and clinical decision support
US20200051694A1 (en) * 2018-08-10 2020-02-13 Tal Goldberg Hybrid knowledge graph for healthcare applications
CN111370127A (zh) * 2020-01-14 2020-07-03 之江实验室 一种基于知识图谱的跨科室慢性肾病早期诊断决策支持系统
CN113434626A (zh) * 2021-08-27 2021-09-24 之江实验室 一种多中心医学诊断知识图谱表示学习方法及系统
CN113643821A (zh) * 2021-10-13 2021-11-12 浙江大学 一种多中心知识图谱联合决策支持方法与系统
CN113990495A (zh) * 2021-12-27 2022-01-28 之江实验室 一种基于图神经网络的疾病诊断预测系统

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562266B (zh) * 2023-07-10 2023-09-15 中国医学科学院北京协和医院 文本分析方法、计算机设备及计算机可读存储介质
CN116562266A (zh) * 2023-07-10 2023-08-08 中国医学科学院北京协和医院 文本分析方法、计算机设备及计算机可读存储介质
CN116631641B (zh) * 2023-07-21 2023-12-22 之江实验室 一种集成自适应相似患者图的疾病预测装置
CN116631641A (zh) * 2023-07-21 2023-08-22 之江实验室 一种集成自适应相似患者图的疾病预测装置
CN117010494B (zh) * 2023-09-27 2024-01-05 之江实验室 一种基于因果表示学习的医学数据生成方法及系统
CN117010494A (zh) * 2023-09-27 2023-11-07 之江实验室 一种基于因果表示学习的医学数据生成方法及系统
CN117235487A (zh) * 2023-10-12 2023-12-15 北京大学第三医院(北京大学第三临床医学院) 一种用于预测哮喘患者住院事件的特征提取方法和系统
CN117235487B (zh) * 2023-10-12 2024-03-12 北京大学第三医院(北京大学第三临床医学院) 一种用于预测哮喘患者住院事件的特征提取方法和系统
CN117409911A (zh) * 2023-10-13 2024-01-16 四川大学 一种基于多视图对比学习的电子病历表示学习方法
CN117409911B (zh) * 2023-10-13 2024-05-07 四川大学 一种基于多视图对比学习的电子病历表示学习方法
CN117438023A (zh) * 2023-10-31 2024-01-23 灌云县南岗镇卫生院 基于大数据的医院信息管理方法及系统
CN117438023B (zh) * 2023-10-31 2024-04-26 灌云县南岗镇卫生院 基于大数据的医院信息管理方法及系统
CN117894422A (zh) * 2024-03-18 2024-04-16 攀枝花学院 一种基于icu重症监测的数据可视化方法及系统

Also Published As

Publication number Publication date
CN113990495A (zh) 2022-01-28
CN113990495B (zh) 2022-04-29
JP2024503980A (ja) 2024-01-30
JP7459386B2 (ja) 2024-04-01

Similar Documents

Publication Publication Date Title
WO2023124190A1 (zh) 一种基于图神经网络的疾病诊断预测系统
WO2022042002A1 (zh) 一种半监督学习模型的训练方法、图像处理方法及设备
Sullivan Understanding from machine learning models
Herath et al. Adoption of artificial intelligence in smart cities: A comprehensive review
Wang et al. A review on extreme learning machine
WO2022001489A1 (zh) 一种无监督域适应的目标重识别方法
CN111382272B (zh) 一种基于知识图谱的电子病历icd自动编码方法
TWI766618B (zh) 關鍵點檢測方法、電子設備及電腦可讀儲存介質
US20210134418A1 (en) Method and System for Assessing Drug Efficacy Using Multiple Graph Kernel Fusion
WO2016033990A1 (zh) 生成检测模型的方法和设备、用于检测目标的方法和设备
CN109993100B (zh) 基于深层特征聚类的人脸表情识别的实现方法
Geetha et al. Fuzzy case-based reasoning approach for finding COVID-19 patients priority in hospitals at source shortage period
Kumar et al. Advanced prediction of performance of a student in an university using machine learning techniques
WO2019180310A1 (en) A method, an apparatus and a computer program product for an interpretable neural network representation
CN114743037A (zh) 一种基于多尺度结构学习的深度医学图像聚类方法
Mahajan Applications of pattern recognition algorithm in health and medicine
CN112069825B (zh) 面向警情笔录数据的实体关系联合抽取方法
Leng et al. Bi-level artificial intelligence model for risk classification of acute respiratory diseases based on Chinese clinical data
Murad et al. AI powered asthma prediction towards treatment formulation: An android app approach
CN116072298B (zh) 一种基于层级标记分布学习的疾病预测系统
Zou et al. Deep learning and its application in diabetic retinopathy screening
Wu et al. A feature optimized deep learning model for clinical data mining
Wang et al. Bb-gcn: A bi-modal bridged graph convolutional network for multi-label chest x-ray recognition
Shi et al. Analysis of electronic health records based on long short‐term memory
Gao et al. A Collaborative Multimodal Learning-Based Framework for COVID-19 Diagnosis

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2023536567

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913469

Country of ref document: EP

Kind code of ref document: A1