CN115641956A - Phenotype analysis method for disease prediction - Google Patents

Phenotype analysis method for disease prediction Download PDF

Info

Publication number
CN115641956A
CN115641956A CN202211320189.3A CN202211320189A CN115641956A CN 115641956 A CN115641956 A CN 115641956A CN 202211320189 A CN202211320189 A CN 202211320189A CN 115641956 A CN115641956 A CN 115641956A
Authority
CN
China
Prior art keywords
disease
common
rare
database
phenotypic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211320189.3A
Other languages
Chinese (zh)
Inventor
王飞
徐勇军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Xiamen Data Intelligence Research Institute
Original Assignee
Zhongke Xiamen Data Intelligence Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Xiamen Data Intelligence Research Institute filed Critical Zhongke Xiamen Data Intelligence Research Institute
Priority to CN202211320189.3A priority Critical patent/CN115641956A/en
Publication of CN115641956A publication Critical patent/CN115641956A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to the technical field of disease prediction, and discloses a phenotype analysis method for disease prediction, which comprises the steps of constructing a database, constructing a rare disease-common disease common database by measuring the phenotype similarity of rare diseases and common diseases and referring to the difference of gene data of the rare diseases and the common diseases, processing patient data to find the optimal combination of phenotype characteristics, calculating the matching score of diseases, calculating the cross entropy loss of a phenotype characteristic matching model for disease prediction, outputting a prediction result by using the weighted sum of the classification loss function in the rare disease-common disease common database and the cross entropy loss of the phenotype characteristic matching model as the total loss function of a disease prediction model, extracting effective phenotype characteristics based on a graph volume network for comparing the data of the rare diseases and the common diseases, extracting the disease matching difference from the common characteristics of the rare diseases and the common diseases to predict the diseases, solving the problem of easy confusion among the diseases, and avoiding misdiagnosis and missed diagnosis of the diseases to a certain extent.

Description

Phenotype analysis method for disease prediction
Technical Field
The invention relates to the technical field of disease prediction, in particular to a phenotype analysis method for disease prediction.
Background
At present, deep learning methods are mostly adopted for disease prediction, and computer technology is used for assisting the diagnosis of prediction results, but in order to improve the accuracy of disease prediction and diagnosis, complex medical multi-modal data is required to be fully utilized to extract effective information hidden therein, the data comprises medical imaging data and corresponding non-imaging phenotypic characteristics, such as the age, the height and the body functions of a patient, the data is difficult to process by the traditional deep learning method, not every phenotypic characteristic contributes to disease prediction, for disease prediction, screening out the phenotypic characteristics which have negative influence on the disease prediction result by adopting an effective method can effectively improve the accuracy of a disease prediction model, particularly has important diagnostic significance in the prediction of rare diseases, and as the rare diseases and the common diseases have great commonality, in terms of general diagnosis, the method is characterized in that a self-adaptive multilayer aggregated graph convolutional network can be used for predicting diseases, an encoder is mainly designed to automatically select the optimal combination of phenotypic characteristics, the multilayer aggregated graph convolutional network with a multimeric aggregation mode is introduced to select advantageous structure information for each node, a group graph structure is designed according to the spatial distribution and text similarity of the phenotypic characteristics, each effective phenotypic characteristic is allowed to have positive effect on a disease prediction result, the optimal phenotypic characteristic information can be automatically searched from each layer in a disease prediction model, the disease diagnosis accuracy is improved to a certain extent, but for rare diseases, classification detection among diseases is not performed in the disease classification process, particularly, the rare diseases and the common diseases have great similarity on the phenotypic characteristics, in this case, the difference between the rare diseases and the corresponding common diseases is difficult to distinguish, so misdiagnosis and missed diagnosis of the rare diseases are easily caused, and therefore, how to extract effective phenotypic information for diagnosing the rare diseases has important significance for predicting medical diseases.
Disclosure of Invention
The invention aims to provide a phenotype analysis method for disease prediction, which is characterized in that effective phenotype characteristics are extracted based on a graph convolution network and used for comparing data of rare diseases and common diseases, simultaneously, rare disease data and common disease data are fused to construct a 'rare disease-common disease' common database, phenotype characteristics of patients are matched with the database, and disease matching differences are extracted from the commonalities of the rare diseases and the common diseases, so that the problems in the background technology can be effectively solved.
In order to achieve the purpose, the invention provides the following technical scheme:
a phenotypic analysis method oriented to disease prediction comprising the following analysis steps:
the method comprises the following steps: constructing a database, including constructing a rare disease database, constructing a common disease database and constructing a patient information database;
step two: performing database comparison and fusion treatment, namely constructing a 'rare disease-common disease' common database by measuring the phenotype similarity of rare disease phenotype characteristics and common disease phenotype characteristics and simultaneously referring to the difference between rare disease gene data and common disease gene data;
step three: patient information data processing defining a set of patient phenotypic characteristics H as a set K = { K = h A patient personal basic information, a patient family genetic medical history, a patient body representation, and an adjacency matrix
Figure BDA0003909994180000021
Searching for appropriate phenotypic characteristics and calculating corresponding phenotypic characteristic selection scores, and further calculating edge weights to obtain the optimal combination of phenotypic characteristics;
inputting the optimal combination of the phenotypic characteristics of the patients into a rare disease database, a common disease database and a 'rare disease-common disease' common database, respectively calculating the disease matching scores of the optimal combination of the phenotypic characteristics of the patients in the multiple databases, and outputting Z = { Z = i }∈R n×p Match score, Z, representing all data i Representing a matching score of an inode in the database;
step five: calculating cross entropy loss for phenotypic feature matching models for disease prediction
Figure BDA0003909994180000022
Y ij Tag information representing data;
step six: classification loss function L in 'rare disease-common disease' co-classification database H-C Cross-entropy loss L with phenotypic feature matching model w The weighted sum of the two is used as the total loss function of the disease prediction model, and the smaller the value of the total loss function is, the higher the accuracy of the prediction result is;
step seven: and outputting a prediction result.
As a still further scheme of the invention: the rare disease database in the step one comprises rare disease categories, rare disease genetic information, rare disease phenotype characteristics and rare disease gene data, different phenotypes and gene sequences are correspondingly positioned into rare disease entries by combining the existing rare disease knowledge base and corresponding rare disease cases, the common disease database comprises common disease categories, common disease genetic information, common disease phenotype characteristics and common disease gene data, different phenotypes and gene sequences are correspondingly supplemented into common disease entries by combining common disease pathogenic genes and clinical medical cases, and the patient information database comprises patient disease symptoms and medical examination data.
As a still further scheme of the invention: in the second step, the data classification in the rare disease database is subjected to up-sampling treatment on the samples on the basis of adopting a clustering algorithm, so that the accuracy of data classification is improved, and the negative influence on the data classification caused by the crossing and overlapping phenomena existing between each disease category of the rare disease database is reduced.
As a still further scheme of the invention: in the second step, the phenotype similarity of the phenotype characteristics of the rare diseases and the phenotype characteristics of the common diseases is defined as sim, and the phenotype similarity exists for the rare diseases x and the common diseases y
Figure BDA0003909994180000031
The phenotype similarity sim (x, y) of the rare disease x and the common disease y is taken as the prior information for detecting the gene difference, and the difference var (x, y) of the gene data of the rare disease and the gene data of the common disease n =α*W p *var(x,y) n-1 + (1- α) sim (x, y), α represents weight value, W p Representative Gene interactionA network.
As a still further scheme of the invention: the classification loss function in the rare disease-common disease classification database is
Figure BDA0003909994180000032
N represents the number of samples in the database that participate in the classification,
Figure BDA0003909994180000033
for verifying the loss of correlation of disease x with disease y,
Figure BDA0003909994180000034
matching degree of disease x and disease y, gamma represents influencing factor in optimization process, d is dimension of characteristic vector of disease x and disease y, C x 、C y The covariance matrix in d-dimension, representing disease x and disease y eigenvectors, is the input initial constant.
As a still further scheme of the invention: the three-step adjacency matrix
Figure BDA0003909994180000041
Middle alpha h Is of phenotype K h Is a phenotype selection score of, gamma is a phenotypic characteristic K of two nodes h V, w ∈ H,
Figure BDA0003909994180000042
wherein
Figure BDA0003909994180000043
Is a phenotypic characteristic K h The number of samples meeting the requirement.
As a still further scheme of the invention: when K is h When it is a non-quantitative phenotypic trait, define
Figure BDA0003909994180000044
As a function of the threshold value theta,
Figure BDA0003909994180000045
Figure BDA0003909994180000046
is characterized by the phenotypic characteristic K h The number of satisfactory samples in the p and u categories in (1) and γ =1, when Kh is a quantitative phenotypic characteristic, define
Figure BDA0003909994180000047
As a function of the threshold value delta,
Figure BDA0003909994180000048
Figure BDA0003909994180000049
is characterized by the phenotypic characteristic K h The number of samples in the p category in (1) defines the enclosed space D [ alpha, beta ]]∈{K h },
Figure BDA00039099941800000410
Is characterized by the phenotypic characteristic K h Of the p-class in (1) does not belong to the satisfactory number of samples of the closed section D, and
Figure BDA0003909994180000051
compared with the prior art, the invention has the beneficial effects that:
effective phenotypic characteristics are extracted based on a graph convolution network and used for comparing rare disease data with common disease data, fusion is conducted on rare disease data and common disease data to build a 'rare disease-common disease' common database, the phenotypic characteristics of patients are matched with the database, disease matching differences are extracted from the commonalities of rare diseases and common diseases to predict diseases, the problem of easy confusion among the diseases can be solved, and misdiagnosis and missed diagnosis of the rare diseases are avoided to a certain extent.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic diagram of a phenotype analysis method for disease prediction.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Example 1:
referring to fig. 1, in an embodiment of the present invention, a phenotype analysis method for disease prediction includes the following steps:
the method comprises the following steps: constructing a database, including constructing a rare disease database, constructing a common disease database and constructing a patient information database;
step two: comparing and fusing the databases, determining the phenotype similarity of the phenotype characteristics of the rare diseases and the phenotype characteristics of the common diseases, simultaneously referring to the difference between the gene data of the rare diseases and the gene data of the common diseases, constructing a 'rare disease-common disease' common database, defining the phenotype similarity of the phenotype characteristics of the rare diseases and the phenotype characteristics of the common diseases as sim, and having the phenotype similarity for the rare diseases x and the common diseases y
Figure BDA0003909994180000061
Using sim (x, y) with phenotype similarity between the rare disease x and the common disease y as prior information for detecting gene difference, and using the difference var (x, y) between the rare disease gene data and the common disease gene data n =α*W p *var(x,y) n-1 + (1- α) sim (x, y), α represents weight value, W p Representing a gene interaction network, the classification loss function in the "rare-common disease" co-class database is
Figure BDA0003909994180000062
N represents the number of samples in the database participating in the classification
Figure BDA0003909994180000063
For verifying the loss associated with disease x and disease y,
Figure BDA0003909994180000064
matching degree of disease x and disease y, gamma represents influencing factor in optimization process, d is dimension of characteristic vector of disease x and disease y, C x 、C y Representing a covariance matrix of disease x and disease y eigenvectors in d dimension, phi is an input initial constant;
step three: patient information data processing defining a set of patient phenotypic characteristics H as a set K = { K = h And (5) searching appropriate phenotypic characteristics and calculating corresponding phenotypic characteristic selection scores by using an adjacency matrix, and further calculating side weights to obtain the optimal combination of phenotypic characteristics, wherein alpha is h Is of the phenotype K h Is the phenotype characteristic K of the two nodes h V, w ∈ H,
Figure BDA0003909994180000071
wherein
Figure BDA0003909994180000072
Is the number of samples with the required phenotypic characteristic Kh when K h When it is a non-quantitative phenotypic characteristic, defining
Figure BDA0003909994180000073
As a function of the threshold value theta,
Figure BDA0003909994180000074
Figure BDA0003909994180000075
is characterized by the phenotypic characteristic K h The number of samples in the p and u categories in (1) and γ =1, when Kh is constantWhen a characteristic is of a scale type, define
Figure BDA0003909994180000076
As a function of the threshold value delta,
Figure BDA0003909994180000077
Figure BDA0003909994180000078
is characterized by the phenotypic characteristic K h The number of samples in the p category in (1) defines the closed area D [ alpha, beta ]]∈{K h },
Figure BDA0003909994180000079
Is characterized by the phenotypic characteristic K h The number of satisfactory samples of the p-type of (1) which do not belong to the closed section D, and
Figure BDA00039099941800000710
inputting the optimal combination of the phenotypic characteristics of the patients into a rare disease database, a common disease database and a 'rare disease-common disease' common database, respectively calculating the disease matching scores of the optimal combination of the phenotypic characteristics of the patients in the multiple databases, and outputting Z = { Z = i }∈R n×p Match score, Z, representing all data i Representing a matching score of an inode in the database;
step five: calculating cross entropy loss for phenotypic feature matching models for disease prediction
Figure BDA00039099941800000711
Y ij Tag information representing data;
step six: classification loss function L in 'rare disease-common disease' co-classification database H-C Cross entropy loss L with phenotypic feature matching model w The weighted sum of the two is used as the total loss function of the disease prediction model, and the smaller the value of the total loss function is, the higher the accuracy of the prediction result is;
step seven: outputting the prediction result
By adopting the technical scheme: effective phenotypic characteristics are extracted based on a graph-convolution network and used for comparing data of rare diseases and common diseases, rare disease data and common disease data are fused to construct a 'rare disease-common disease' common database, phenotypic characteristics of patients are matched with the database, disease matching differences are extracted from the commonalities of the rare diseases and the common diseases to predict the diseases, the problem of easy confusion among the diseases can be solved, and misdiagnosis and missed diagnosis of the rare diseases are avoided to a certain extent.
Example 2:
referring to fig. 1, in an embodiment of the present invention, a phenotype analysis method for disease prediction includes the following steps:
the method comprises the following steps: establishing a database, including establishing a rare disease database, establishing a common disease database and establishing a patient information database, wherein the rare disease database comprises rare disease categories, rare disease genetic information, rare disease phenotype characteristics and rare disease gene data, different phenotypes and gene sequences are correspondingly positioned in rare disease entries by combining the conventional rare disease knowledge base and corresponding rare disease cases, the common disease database comprises common disease categories, common disease genetic information, common disease phenotype characteristics and common disease gene data, different phenotypes and gene sequences are correspondingly supplemented into the common disease entries by combining common disease pathogenic genes and clinical medical cases, and the patient information database comprises patient disease symptoms and medical examination data;
step two: comparing and fusing the databases, determining the phenotype similarity between the phenotype characteristics of the rare diseases and the phenotype characteristics of the common diseases, and simultaneously constructing a 'rare diseases-common diseases' common database by referring to the difference between the gene data of the rare diseases and the gene data of the common diseases, wherein the phenotype similarity between the phenotype characteristics of the rare diseases and the phenotype characteristics of the common diseases is defined as sim, and the phenotype similarity between the phenotype characteristics of the rare diseases x and the common diseases y is defined as sim
Figure BDA0003909994180000081
Phenotypic similarity sim (x, y) between rare disease x and common disease y is used as prior information for detecting gene difference, rarelySee difference var (x, y) between disease gene data and common disease gene data n =α*W p *var(x,y) n-1 + (1- α) sim (x, y), α represents weight value, W p Representing a gene interaction network, the classification loss function in the "rare-common disease" co-class database is
Figure BDA0003909994180000091
N represents the number of samples in the database that participate in the classification,
Figure BDA0003909994180000092
for verifying the loss associated with disease x and disease y,
Figure BDA0003909994180000093
matching degree of disease x and disease y, gamma represents influencing factor in optimization process, d is dimension of characteristic vector of disease x and disease y, C x 、C y A covariance matrix representing the d-dimension of disease x and disease y eigenvectors, phi being an input initial constant;
step three: patient information data processing defining a set of patient phenotypic characteristics H as a set K = { K = h The basic information of the patient, the family genetic disease history of the patient, the physical representation of the patient are included, and the adjacency matrix is utilized
Figure BDA0003909994180000094
Finding suitable phenotypic characteristics and calculating corresponding phenotypic characteristic selection scores, and calculating edge weight to obtain optimal combination of phenotypic characteristics, wherein alpha h Is of phenotype K h Is the phenotype characteristic K of the two nodes h V, w ∈ H,
Figure BDA0003909994180000095
wherein
Figure BDA0003909994180000096
Is the number of samples with a phenotypic characteristic Kh meeting the requirement, and when Kh is a non-quantitative phenotypic characteristic, defining
Figure BDA0003909994180000097
As a function of the threshold value theta,
Figure BDA0003909994180000101
Figure BDA0003909994180000102
is characterized by the phenotypic characteristic K h The number of satisfactory samples in the p and u categories in (1) and γ =1, when Kh is a quantitative phenotypic characteristic, define
Figure BDA0003909994180000103
As a function of the threshold value delta,
Figure BDA0003909994180000104
Figure BDA0003909994180000105
is characterized by the phenotypic characteristic K h The number of samples in the p category in (1) defines the closed area D [ alpha, beta ]]∈{K h },
Figure BDA0003909994180000106
Is characterized by the phenotypic characteristic K h The number of satisfactory samples of the p-type of (1) which do not belong to the closed section D, and
Figure BDA0003909994180000107
inputting the optimal combination of the phenotypic characteristics of the patients into a rare disease database, a common disease database and a 'rare disease-common disease' common database, respectively calculating the disease matching scores of the optimal combination of the phenotypic characteristics of the patients in the multiple databases, and outputting Z = { Z = i }∈R n×p Match score, Z, representing all data i Representing a matching score of an inode in the database;
step five: calculating cross entropy loss for phenotypic feature matching models for disease prediction
Figure BDA0003909994180000108
Y ij Tag information representing data;
step six: classification loss function L in 'rare disease-common disease' co-classification database H-C Cross-entropy loss L with phenotypic feature matching model w The weighted sum of the two is used as the total loss function of the disease prediction model, and the smaller the value of the total loss function is, the higher the accuracy of the prediction result is;
step seven: outputting a prediction result, evaluating the prediction result, verifying the accuracy of the prediction result by reversely verifying the phenotypic characteristics of the patient according to the prediction result, wherein the error rate or the accuracy of the result obtained by classifying the disease can be changed to a certain extent along with the increase of the disease data in the database because the rare disease database has the problems of small number of samples and large distribution of disease categories, the number of the rare disease samples and the number of the common disease samples are ensured to be balanced as much as possible in the disease classification process, so that the classification performance of the disease classification can not be unstable i The sample set in the common disease data is { C } i H, validation sample H i And C i Degree of matching of
Figure BDA0003909994180000111
d represents sample H i And C i When improving the classification penalty function
Figure BDA0003909994180000112
N represents the total number of samples, ε is the adjustment coefficient, P i Representing the prediction probability of the sample feature.
By adopting the technical scheme: the prediction result is reversely verified, the database is fused to extract the phenotypic characteristics, data equalization processing is added in the data classification process, meanwhile, a verification model loss function of the prediction result is improved to optimize the verification model, the accuracy of the prediction result obtained by comparing the verification result with the phenotypic characteristics of the patient can ensure the balance of the data classification, the negative influence on the data classification caused by overlarge sample data difference is avoided, the accuracy of the prediction result is improved on the winning degree, and the problem of easy confusion between rare diseases and common diseases is solved.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered as the technical solutions and the inventive concepts of the present invention within the technical scope of the present invention.

Claims (7)

1. A phenotypic analysis method oriented to disease prediction, comprising the following analysis steps:
the method comprises the following steps: constructing a database, including constructing a rare disease database, constructing a common disease database and constructing a patient information database;
step two: performing database comparison and fusion treatment, namely constructing a 'rare disease-common disease' common database by measuring the phenotype similarity of rare disease phenotype characteristics and common disease phenotype characteristics and simultaneously referring to the difference between rare disease gene data and common disease gene data;
step three: patient information data processing defining a set of patient phenotypic characteristics H as a set K = { K = } h The basic information of the patient, the family genetic disease history of the patient, the physical representation of the patient are included, and the adjacency matrix is utilized
Figure FDA0003909994170000011
Searching for suitable phenotypic characteristics, calculating response phenotypic characteristic selection scores, and calculating edge weight to obtain phenotypic characteristic optimal combination;
Inputting the best combination of the phenotypic characteristics of the patients into a rare disease database, a common disease database and a 'rare disease-common disease' common database, respectively calculating the disease matching scores of the best combination of the phenotypic characteristics of the patients in a plurality of databases, and outputting Z = { Z = i }∈R n×p Match score, Z, representing all data i Representing a matching score of an inode in the database;
step five: calculating cross entropy loss for phenotypic feature matching models for disease prediction
Figure FDA0003909994170000012
Y ij Tag information representing data;
step six: classification loss function L in 'rare disease-common disease' co-classification database H-C Cross-entropy loss L with phenotypic feature matching model W The weighted sum of the two is used as the total loss function of the disease prediction model, and the smaller the value of the total loss function is, the higher the accuracy of the prediction result is;
step seven: and outputting a prediction result.
2. The phenotype analysis method for disease prediction according to claim 1, wherein the rare disease database in the first step comprises rare disease categories, rare disease genetic information, rare disease phenotypic characteristics, rare disease genetic data, different phenotypes and gene sequences are correspondingly positioned into rare disease entries by combining an existing rare disease knowledge base and corresponding rare disease cases, the common disease database comprises common disease categories, common disease genetic information, common disease phenotypic characteristics, common disease genetic data, common disease pathogenic genes and clinical medical cases, different phenotypes and gene sequences are correspondingly supplemented into common disease entries, and the patient information database comprises patient diseased symptoms and medical examination data.
3. The phenotype analysis method oriented to disease prediction of claim 1, wherein in the second step, the data classification in the rare disease database is performed with up-sampling processing on samples based on a clustering algorithm, so that the accuracy of data classification is increased, and the negative influence on the data classification caused by the cross and overlap phenomenon existing between each disease category of the rare disease database is reduced.
4. The method for phenotypic analysis according to claim 1, wherein the phenotypic similarity between the phenotypic characteristics of rare diseases and common diseases is defined as sim, and has phenotypic similarity for x and y
Figure FDA0003909994170000021
Using sim (x, y) with phenotype similarity between the rare disease x and the common disease y as prior information for detecting gene difference, and using the difference var (x, y) between the rare disease gene data and the common disease gene data n =α*W p *var(x,y) n-1 + (1- α) sin (x, y), α represents weight value, W p Representing a network of gene interactions.
5. The phenotypic analysis method based on disease prediction according to claim 4, wherein the classification loss function in the "rare disease-common disease" classification database is
Figure FDA0003909994170000022
N represents the number of samples in the database that participate in the classification,
Figure FDA0003909994170000023
for verifying the loss of correlation of disease x with disease y,
Figure FDA0003909994170000024
the matching degree of the disease x and the disease y is used, upsilon represents an influence factor in the optimization process, d is the dimension of the characteristic vector of the disease x and the disease y,C x 、C y the covariance matrix in d-dimension, representing the disease x and disease y eigenvectors, is the input initial constant.
6. The method of claim 1, wherein the adjacency matrix in step three is a matrix for disease prediction
Figure FDA0003909994170000031
Middle alpha h Is of phenotype K h Is a phenotype selection score of, gamma is a phenotypic characteristic K of two nodes h V, w ∈ H,
Figure FDA0003909994170000032
wherein
Figure FDA0003909994170000033
Is the number of samples for which the phenotypic characteristic Kh meets the requirement.
7. The phenotypic analysis method for disease prediction according to claim 5, wherein Kh is defined when it is a non-quantitative phenotypic characteristic
Figure FDA0003909994170000034
As a function of the threshold value theta,
Figure FDA0003909994170000035
Figure FDA0003909994170000036
is characterized by the phenotypic characteristic K h The satisfactory sample numbers in p and u categories, while γ =1, when K is h For quantifying phenotypic characteristics, define
Figure FDA0003909994170000037
In respect of the threshold value deltaThe function of the function(s) is,
Figure FDA0003909994170000038
Figure FDA0003909994170000039
is characterized by the phenotypic characteristic K h The number of samples in the p category in (1) defines the enclosed space D [ alpha, beta ]]∈{K h },
Figure FDA00039099941700000310
Is characterized by the phenotypic characteristic K h The number of satisfactory samples of the p-type of (1) which do not belong to the closed section D, and
Figure FDA00039099941700000311
CN202211320189.3A 2022-10-26 2022-10-26 Phenotype analysis method for disease prediction Pending CN115641956A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211320189.3A CN115641956A (en) 2022-10-26 2022-10-26 Phenotype analysis method for disease prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211320189.3A CN115641956A (en) 2022-10-26 2022-10-26 Phenotype analysis method for disease prediction

Publications (1)

Publication Number Publication Date
CN115641956A true CN115641956A (en) 2023-01-24

Family

ID=84947234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211320189.3A Pending CN115641956A (en) 2022-10-26 2022-10-26 Phenotype analysis method for disease prediction

Country Status (1)

Country Link
CN (1) CN115641956A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116228759A (en) * 2023-05-08 2023-06-06 浙江大学滨江研究院 Computer-aided diagnosis system and apparatus for renal cell carcinoma type
CN116246701A (en) * 2023-02-13 2023-06-09 广州金域医学检验中心有限公司 Data analysis device, medium and equipment based on phenotype term and variant gene
CN116343913A (en) * 2023-03-15 2023-06-27 昆明市延安医院 Analysis method for predicting potential pathogenic mechanism of single-gene genetic disease based on phenotype semantic association gene cluster regulation network
CN116796046A (en) * 2023-08-29 2023-09-22 武汉大学人民医院(湖北省人民医院) Case retrieval method and device based on rare characteristics

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246701A (en) * 2023-02-13 2023-06-09 广州金域医学检验中心有限公司 Data analysis device, medium and equipment based on phenotype term and variant gene
CN116246701B (en) * 2023-02-13 2024-03-22 广州金域医学检验中心有限公司 Data analysis device, medium and equipment based on phenotype term and variant gene
CN116343913A (en) * 2023-03-15 2023-06-27 昆明市延安医院 Analysis method for predicting potential pathogenic mechanism of single-gene genetic disease based on phenotype semantic association gene cluster regulation network
CN116343913B (en) * 2023-03-15 2023-11-14 昆明市延安医院 Analysis method for predicting potential pathogenic mechanism of single-gene genetic disease based on phenotype semantic association gene cluster regulation network
CN116228759A (en) * 2023-05-08 2023-06-06 浙江大学滨江研究院 Computer-aided diagnosis system and apparatus for renal cell carcinoma type
CN116796046A (en) * 2023-08-29 2023-09-22 武汉大学人民医院(湖北省人民医院) Case retrieval method and device based on rare characteristics
CN116796046B (en) * 2023-08-29 2023-11-10 武汉大学人民医院(湖北省人民医院) Case retrieval method and device based on rare characteristics

Similar Documents

Publication Publication Date Title
CN115641956A (en) Phenotype analysis method for disease prediction
CN109935336B (en) Intelligent auxiliary diagnosis system for respiratory diseases of children
CN108564117B (en) SVM-based poverty and life assisting identification method
CN110853756B (en) Esophagus cancer risk prediction method based on SOM neural network and SVM
CN113113130A (en) Tumor individualized diagnosis and treatment scheme recommendation method
CN114093425A (en) lncRNA and disease association prediction method fusing heterogeneous network and graph neural network
CN112927757A (en) Gastric cancer biomarker identification method based on gene expression and DNA methylation data
CN114841280A (en) Prediction classification method, system, medium, equipment and terminal for complex diseases
Prayogo et al. Classification of pneumonia from X-ray images using siamese convolutional network
Moteghaed et al. Biomarker discovery based on hybrid optimization algorithm and artificial neural networks on microarray data for cancer classification
Alkaragole et al. Comparison of data mining techniques for predicting diabetes or prediabetes by risk factors
CN113643756A (en) Protein interaction site prediction method based on deep learning
CN110010204B (en) Fusion network and multi-scoring strategy based prognostic biomarker identification method
CN112215259A (en) Gene selection method and apparatus
CN116259415A (en) Patient medicine taking compliance prediction method based on machine learning
CN113764034B (en) Method, device, equipment and medium for predicting potential BGC in genome sequence
Kramer et al. Analysis of medical data using community detection on inferred networks
Bidgoli et al. Evolutionary computation in action: Hyperdimensional deep embedding spaces of gigapixel pathology images
CN114596253A (en) Alzheimer's disease identification method based on brain imaging genome features
CN117195027A (en) Cluster weighted clustering integration method based on member selection
CN113284627A (en) Medication recommendation method based on patient characterization learning
CN116956138A (en) Image gene fusion classification method based on multi-mode learning
Kumari et al. A hybrid rough set shuffled frog leaping knowledge inference system for diagnosis of lung cancer disease
CN113838519B (en) Gene selection method and system based on adaptive gene interaction regularization elastic network model
Chellamuthu et al. Data mining and machine learning approaches in breast cancer biomedical research

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination