CN117476096B - Mental disease biomarker prediction method, system and storage medium - Google Patents

Mental disease biomarker prediction method, system and storage medium Download PDF

Info

Publication number
CN117476096B
CN117476096B CN202311830150.0A CN202311830150A CN117476096B CN 117476096 B CN117476096 B CN 117476096B CN 202311830150 A CN202311830150 A CN 202311830150A CN 117476096 B CN117476096 B CN 117476096B
Authority
CN
China
Prior art keywords
protein
data
mental
pqtl
biomarker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311830150.0A
Other languages
Chinese (zh)
Other versions
CN117476096A (en
Inventor
张程程
李勇军
袁诚松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
West China Hospital of Sichuan University
Original Assignee
West China Hospital of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West China Hospital of Sichuan University filed Critical West China Hospital of Sichuan University
Priority to CN202311830150.0A priority Critical patent/CN117476096B/en
Publication of CN117476096A publication Critical patent/CN117476096A/en
Application granted granted Critical
Publication of CN117476096B publication Critical patent/CN117476096B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Genetics & Genomics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Artificial Intelligence (AREA)
  • Medicinal Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Data Mining & Analysis (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention belongs to the technical field of mental disease diagnosis and treatment, and particularly relates to a mental disease biomarker prediction method, a mental disease biomarker prediction system and a mental disease biomarker storage medium. The method of the invention comprises the following steps: acquiring pQTL data and GWAS data of the mental diseases; analyzing the pQTL data and the GWAS data by adopting a double-sample Mendelian randomization method to obtain a primary screening result of causal proteins; analyzing the preliminary screening result by adopting Steiger filtering analysis and co-localization analysis to obtain candidate protein biomarkers after the level of pleiotropic effects is eliminated; and verifying the specificity of the candidate protein biomarker by adopting a phenotype-Mendelian randomization method, and obtaining the protein biomarker with specificity through verification. The biomarker and the drug target point which have diagnostic and prognostic value on mental diseases can be predicted according to pQTL data and GWAS data, and the method has good application prospect.

Description

Mental disease biomarker prediction method, system and storage medium
Technical Field
The invention belongs to the technical field of mental disease diagnosis and treatment, and particularly relates to a mental disease biomarker prediction method, a mental disease biomarker prediction system and a mental disease biomarker storage medium.
Background
Drug therapy is one of the important means for the treatment of mental diseases. The medicine can help to relieve symptoms, improve illness state and improve life quality. For example, bipolar Disorder (BD) is a polygenic genetic disease, has the characteristics of high recurrence, high disability, high suicide, high co-disease, and the like, seriously impairs the psychosocial function of the patient, and increases the death risk of the patient. BD ranks the seventeenth worldwide among the common disability disease ranks. Current BD therapeutic drugs include mood stabilizers, second generation antipsychotics and antidepressants. However, the current medicines for treating mental diseases generally have the problems of large individual treatment response difference, low curative effect response rate and only functional alleviation of symptom onset. Therefore, how to study the pathogenic genes of various mental diseases and find new drug therapeutic targets is an important research topic.
Taking BD research as an example, the current research on the BD drug target from the genetic level mainly comprises: (1) whole genome association analysis (Genome Wide Association Study, GWAS) has been widely used for pathogenic gene identification of various diseases, particularly neuropsychiatric disorders. The international psychotic gene consortium (PCG) has published a number of genes, CACNA1C, ODZ, ANK3, SYNE1, etc., identified by GWAS as for BD pathogenic genes. However, the genes analyzed by the GWAS method are located in non-coding regions, the pathogenic mechanism of the genes cannot be further verified, linkage disequilibrium phenomenon exists among the genes, and it is very difficult to simply locate pathogenic genes and drug targets by using the GWAS method; (2) the full transcriptome association analysis (TWAS) can further identify related mRNA of the disease and related gene loci such as NEK4, LMAN2L, PBX4 and the like at the level of GWAS, analyze genetic mechanisms of the disease at the level of transcription, and overcome the defects of part of GWAS. However, the optimal choice of the drug target is not mRNA but protein, and the mRNA obtained by TWAS identification is not in linear relation with the protein expression, so that the drug target is difficult to be further identified through the mRNA; (3) whole proteome association analysis (PWAS) can identify proteins associated with diseases and their associated genetic loci such as ADD3 and XPNPEP3, etc., and the identification result is further related to drug targets, but PWAS cannot analyze the causal relationship between proteins and diseases, so that the identification of the obtained proteins cannot be used as effective drug targets for drug development.
Therefore, the existing prediction method of mental disease biomarkers still has problems, and the method for obtaining the biomarker with diagnostic and prognostic value and the prediction result of drug targets is difficult. Thus, there remains a need in the art to develop new methods to enable the prediction of new mental disease biomarkers and therapeutic targets.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a mental disease biomarker prediction method, a mental disease biomarker prediction system and a mental disease biomarker storage medium.
A method of mental disease biomarker prediction comprising the steps of:
step 1, pQTL data and GWAS data of mental diseases are obtained;
step 2, analyzing the pQTL data and the GWAS data by adopting a double-sample Mendelian randomization method to obtain a primary screening result of causal proteins;
step 3, analyzing the preliminary screening result by adopting Steiger filtering analysis and co-localization analysis to obtain candidate protein biomarkers after the level of pleiotropic activity is eliminated;
step 4, verifying the specificity of the candidate protein biomarker by adopting a phenotype-Mendelian randomization method, and obtaining the protein biomarker with specificity through verification;
step 2 comprises the following steps:
step 2.1, making SNP with strong correlation with protein in tissue in pQTL data as tool variable, protein as exposure factor and GWAS data as ending factor; evaluating causal relationship between proteins and the mental disease based on a two-sample mendelian randomization design theory;
step 2.2, correcting the P threshold of the result by using a Bonferroni method, screening proteins with obvious causal relation with the mental diseases and gene loci thereof, and obtaining a primary screening result of the causal proteins;
after correction in step 2.2, the P threshold of brain protein was 8.22×10 -5 Plasma proteins have a P threshold of 8.16X10 -5 Cerebrospinal fluid proteinP threshold of 2.34×10 -4 The method comprises the steps of carrying out a first treatment on the surface of the When the protein P-value is less than the corresponding P-threshold, the protein has a significant causal relationship with the mental disorder.
Preferably, the mental disorder is bipolar disorder and the pQTL data is pQTL data of at least one tissue in brain, cerebrospinal fluid or plasma.
Preferably, the pQTL data is screened according to the following criteria:
(1) the correlation of a protein with its corresponding pQTL satisfies p<5×10 -8
(2) One genetic locus encodes only one protein;
(3) linkage balance is kept between gene loci;
(4) the pQTL is the cis pQTL;
(5) the F statistic of pQTL data is > 10.
Preferably, in step 2.1, the evaluation method is selected in the following manner: if 1 protein corresponds to 1 SNP data, the Wald Ratio method is used; if 1 protein corresponds to two or more SNP data, an inverse variance weighting method is used.
Preferably, step 3 comprises the steps of:
step 3.1, adopting Steiger filtering analysis to verify whether the causal relationship directionality of single SNP and mental diseases in the primary screening result is correct; excluding proteins with causal directionality inconsistencies;
and 3.2, calculating posterior probability of protein and mental diseases driven by the same SNP on the basis of correlation with a certain genome region by adopting co-localization analysis, and excluding the protein with the posterior probability less than 0.80.
Preferably, step 4 comprises the steps of:
step 4.1, enabling the candidate protein biomarker obtained in the step 3 to be an exposure factor, enabling SNP corresponding to the candidate protein biomarker to be a tool variable, and enabling common diseases and GWAS data thereof to be a ending factor; phe-MR analysis was performed with the aid of a double sample Mendelian randomization strategy to elucidate the causal relationship of selected candidate protein biomarkers with common diseases;
step 4.2, correcting the P threshold of the result by using a Bonferroni method, screening and removing proteins and gene loci thereof which have obvious causal relationship with common diseases of the mental diseases;
the modified P threshold in step 4.2 is 1.60×10 -5 When the protein P-value is less than P-threshold, the protein has a significant causal relationship with at least one of the common diseases of the non-mental diseases.
Preferably, the method further comprises the step of evaluating the prediction results, wherein the evaluation comprises sensitivity analysis and external verification.
The invention also provides a mental disease biomarker prediction system, comprising:
the input module is used for inputting pQTL data and GWAS data of the mental diseases;
the prediction module is used for predicting the biomarker according to the mental disease biomarker prediction method;
and the output module is used for outputting the prediction result.
The present invention also provides a computer-readable storage medium having stored thereon: a computer program for implementing the above mental disorder biomarker prediction method.
In the invention, the biomarker is a biological protein highly related to disease onset, and the protein can be used as an intermediate link of influencing the occurrence of the disease by corresponding risk genes, namely can be used as a potential drug target for diagnosing and treating the disease. .
By adopting the technical scheme of the invention, the following beneficial effects can be obtained:
1. the method for carrying out causal relation prediction on mental diseases (such as bipolar disorder) based on protein quantitative trait locus (pQTL) data locates protein biomarkers and drug targets of mental diseases while locating pathogenic gene loci, and provides a direction for developing therapeutic drugs for mental diseases.
2. In a preferred embodiment, the technology of the invention locates plasma, cerebrospinal fluid and even more accurate and more convincing risk genes and protein biomarkers in brain tissue for the first time against bipolar disorder, provides new possible drug targets for drug development, and is expected to find more drug targets for bipolar disorder by applying the method of the invention to more plasma, cerebrospinal fluid and brain tissue pQTL data sets.
3. In a preferred embodiment, the invention optimizes parameters (e.g., P-threshold, data screening criteria, etc.) in the regimen for the characteristics of mental disorders, thereby obtaining biomarker and drug target prediction results that are more diagnostic and prognostic.
4. The method can build a model, update the model by automatically collecting data, continuously predict objective biological markers of mental diseases according to new data, has very low technical difficulty in use and maintenance and has better popularization.
It should be apparent that, in light of the foregoing, various modifications, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
The above-described aspects of the present invention will be described in further detail below with reference to specific embodiments in the form of examples. It should not be understood that the scope of the above subject matter of the present invention is limited to the following examples only. All techniques implemented based on the above description of the invention are within the scope of the invention.
Drawings
Fig. 1 is a schematic flow chart of a mental disease biomarker prediction method of example 1.
Detailed Description
It should be noted that, in the embodiments, algorithms of steps such as data acquisition, transmission, storage, and processing, which are not specifically described, and hardware structures, circuit connections, and the like, which are not specifically described may be implemented through the disclosure of the prior art.
Example 1 mental disease biomarker prediction method
The present embodiment provides a method for predicting a biomarker for a mental disease, and a specific Bipolar Disorder (BD) is selected as an example of the mental disease, and a schematic flow chart of the prediction method is shown in fig. 1. The method comprises the following specific steps:
step 1, pQTL data and GWAS data of mental diseases are obtained:
1.1 acquisition and screening of three tissue pQTL data:
in the embodiment, the association degree of different tissues and BD onset is combined, three dominant tissues, namely brain tissue, plasma and cerebrospinal fluid are finally drawn up, and the corresponding pQTL data are selected to finish the screening work of drug targets. The sources of the data for each pQTL are as follows: (1) brain tissue pQTL data is derived from a human brain pQTL dataset published by Wingo team, and is most widely applied in the existing brain tissue pQTL dataset, the proteomic data and genomic data of 8356 proteins including the medullary dorsal prefrontal cortex tissue (DLFC) proteomic data of 376 European blood system human specimens, and significant cis-correlation between 1475 proteins and genetic variation of genetic loci was found (Nature genetics, 2021, 53 (2): 143-6); (2) plasma pQTL data is derived from the pQTL database published by the Zheng team, which effectively integrates the five previously published GWAS research results (Nature genetics, 2020, 52 (10): 1122-31); (3) the cerebrospinal fluid pQTL data is from the pQTL dataset published by the Yang team (Nature neuroscience,2021, 24 (9): 1302-12.). To obtain qualified pQTL data, the present embodiment screens the three data sets according to the following criteria: (1) the protein is highly correlated with its corresponding pQTL (p<5×10 -8 ) The method comprises the steps of carrying out a first treatment on the surface of the (2) One genetic locus encodes only one protein; (3) linkage balance between gene loci (standard 1000kb, r 2 <0.001 A) is provided; (4) the pQTL is a cis pQTL (cis-pQTL), namely SNP affecting protein coding is located within a distance of 500kb before and after the initial site of gene expression; (5) the F statistic of pQTL data is more than 10, and the robustness is strong.
After screening according to the above criteria, the present embodiment finally incorporates 608 brain proteins and the corresponding 616 cis pQTL;612 plasma proteins and corresponding 616 pQTLs; 214 cerebrospinal fluid proteins and corresponding 233 pQTL. (Table 1).
1.2. GWAS data acquisition for BD:
this example co-incorporates three sets of BD-related GWAS data from the Mullins team report (Nature genetics, 2021, 53 (6): 817-29.), the UK Biobank database (PLoS media, 2015, 12 (3): e 1001779.), and the FinnGen database (Nature, 2023,613 (7944):508-18.). Aiming at the step of constructing a prediction model, a Mullins team database is selected, which collects 57 BD patient queues in Australia, north America and European regions respectively, and totally comprises the GWAS data of 41,917 BD patients and 371,539 healthy control persons; for the model evaluation step, UK Biobank and FinnGen databases were selected, which collected GWAS data for 2,242 BD patients in the UK region, 125,870 healthy controls, 1216 BD patients in the finland region, 375961 healthy controls, respectively (table 2).
Table 1: pQTL data characteristic of brain tissue, plasma and cerebrospinal fluid tissue
Table 2: bipolar disorder whole genome association analysis (GWAS) data characterization
Step 2, analyzing the pQTL data and the GWAS data by adopting a double-sample Mendelian randomization method to obtain a primary screening result of causal proteins:
the step integrates pQTL data and GWAS data issued by Mullins team, and the causal relationship between protein and BD is predicted by MR analysis, so that the primary screening of protein is realized.
The specific process is as follows:
(1) SNP with strong correlation with proteins corresponding to three tissues in pQTL data is used as a tool variable; these corresponding proteins are exposure factors; the GWAS data for BD of the Mullins team is the ending factor. Based on the double-sample mendelian randomization design theory, the causal relationship between brain protein, plasma protein or cerebrospinal fluid protein and BD was evaluated using the above data, respectively. The selection criteria for the evaluation method are as follows: if 1 protein corresponds to 1 SNP data, the Wald Ratio method is used; if 1 protein corresponds to two or more SNP data, an inverse variance weighting method is used. Finally, the causal correlation between each protein and BD and the P value thereof are obtained.
(2) The Bonferroni method is used for correcting the P threshold value of the result, and the corrected value is obtained as follows: brain protein: p (P)<8.22×10 -5 The method comprises the steps of carrying out a first treatment on the surface of the Plasma protein: p (P)<8.16×10 -5 The method comprises the steps of carrying out a first treatment on the surface of the Cerebrospinal fluid protein: p (P)<2.34×10 -4 . By comparing the P value of the correlation between the protein and BD with the corresponding P threshold, the significance of the causal relationship between each protein and BD (when the P value of the protein is smaller than the corresponding P threshold, the protein has significant causal relationship with BD) can be judged, and the protein with significant causal relationship with BD and the gene locus thereof can be screened.
The above procedure was compiled using R software, the package used was a twosamplemer package whose running results showed a series of proteins and risk genes with significant causal relationship to BD, which would be incorporated into the co-localization analysis and phenotype-mendel randomization procedure.
And 3, analyzing the primary screening result by adopting Steiger filtering analysis and co-localization analysis to obtain candidate protein biomarkers after the level of pleiotropic activity is eliminated.
Because of the overlapping nature of human genes, the genetic locus obtained by screening in the step 2 may have multiple regulatory functions, namely horizontal pleiotropic, so as to confuse the causal relationship between the target protein and BD. To exclude this, the present example is intended to employ Steiger filtering analysis and co-localization analysis.
The specific process is as follows:
(1) Steiger filtering analysis verifies whether the causal relationship directionality of individual SNPs with BD is correct: the method is used to test whether the correlation between each SNP and its corresponding candidate protein is significantly greater than the correlation between the SNP and BD. If the SNP is significantly larger than the SNP, the SNP is considered to affect the corresponding protein unidirectionally, namely the tool variable is effective; otherwise, the SNP is considered as an invalid or inverted tool variable and the analysis results are removed. On this basis, SNP-protein combinations passing the assay were integrated with GWAS data of BD and subjected to co-localization analysis.
(2) And (3) co-localization analysis, namely respectively calculating the correlation between the protein biomarker, BD and each mutation site in the target gene site and the nearby region: specifically, the posterior probability that protein and BD are driven by the same SNP on the basis of correlation with a certain genomic region was calculated. If the posterior probability of a single genetic variation site is >0.80, both the corresponding protein biomarker and BD are driven by the same variation site. If the posterior probability is <0.80, the corresponding protein biomarker and BD are not driven by the same mutation site, i.e. the gene site can bypass the corresponding protein and cause BD by other means. A posterior probability greater than 0.80 may demonstrate a significant causal relationship between candidate protein biomarkers and BD.
Combining Steiger filtering analysis with co-localization analysis can effectively improve the accuracy and efficiency of co-localization analysis, better eliminate horizontal pleiotropic and verify the aforementioned causal relationship.
R software compiling is also used in the steps, the used package is a coll package, and the running result verifies more obvious causal relation among corresponding variables. The corresponding protein will be further incorporated into the phenotype mendelian randomization step.
And step 4, verifying the specificity of the candidate protein biomarker by adopting a phenotype-Mendelian randomization method, wherein the verification results show that the protein biomarker with specificity is obtained, namely a predicted result of a drug target.
The embodiment aims to analyze and exclude the influence of the identified proteins and drug targets on other common diseases by adopting a Phe-MR method, and ensure the specificity and safety of BD drug targets. And 3, building a prediction model through the steps 2 to 4.
The specific process is as follows:
(1) Based on the disease summary data of UK Biobank, the co-localization analysis of selected proteins was made to serve as exposure factors, 783 common diseases and their GWAS data as outcome factors, SNPs corresponding to the proteins were used as tool variables, and Phe-MR analysis was performed by means of a double sample Mendelian randomization strategy to elucidate the causal relationship of the selected proteins with other common diseases. The specific evaluation method is the Wald Ratio method. After the operation is completed, the causal relevance of each protein and common diseases and the P value thereof are obtained.
(2) Similar to the preliminary screening step, the threshold value of the P value of the Phe-MR analysis result was corrected using Bonferroni correction to obtain a corrected threshold value of 1.60×10 -5 According to the method, the significance of each protein associated with common diseases (when the protein P value is smaller than the P threshold value, the protein has significant causal relationship with at least one of the common diseases of the non-mental diseases) can be judged, so that corresponding influence effects can be obtained.
The above validation used R software compilation, using a twosamplemer package, with input of working variables, exposure factors and outcome variables, which was run to obtain protein and risk gene results with significant causal relationships to other common diseases.
Step 5, evaluating the prediction result
After the prediction model is built, BD causal proteins screened by the model need to be further verified, and reliability and robustness of the model need to be evaluated. The present embodiment employs two verification methods, sensitivity analysis and external verification. The former may exclude some bias risks inherent in the mendelian randomization process; the latter can verify whether the model screened proteins still have causal relationship with BD in other GWAS databases, further increasing their evidence as drug targets.
The specific process is as follows:
(1) Sensitivity analysis
In the above study, the input of mendelian randomized data satisfies three preconditions: (1) correlation hypothesis that there is a strong correlation between SNPs and exposure factors; (2) independence assumption-independent SNP and confounding factors; (3) the exclusive assumption is that SNPs can only act on outcomes by exposure factors. And infers causal relationships of exposure to outcomes based on the assumptions. In some cases, however, causal relationships of MR analysis may be potentially biased to affect experimental results. Thus, to evaluate whether the MR analysis results are robust, whether the conclusions are spectral, whether potential bias effects are present, and require verification using sensitivity analysis. In this regard, the present example mainly uses a heterogeneity test, a leave-one-out analysis, and a multi-effect analysis.
(2) External authentication
Because the steps are carried out in the process of initially screening the bipolar disorder protein biomarker and the drug target, only pQTL data and the same group of bipolar disorder GWAS data are utilized for carrying out integrated analysis aiming at human brain protein, plasma protein and cerebrospinal fluid protein, and the results are obtained. To further verify the significance of the association and causal relationship between the as-screened protein biomarkers and bipolar affective disorder, this example introduced the bipolar affective disorder GWAS data in the UK Biobank and finngan database, and performed MR analysis based on the two-sample mendelian randomization design theory.
(2.1) this step is based on a two-sample mendelian randomization design theory, and similar to the preliminary screening step, the three included tissue-corresponding proteins are respectively made to be exposure factors, BD is a ending factor, and the screened pQTL data is a working variable. Based on the input working variable, exposure factor and outcome factor data, the causal relationship between brain protein, plasma protein or cerebrospinal fluid protein and BD is evaluated, and the evaluation process comprises two methods: the Wald Ratio method and the inverse variance weighting method have the following specific application conditions: if 1 exposure factor, i.e., protein, corresponds to 1 pQTL data, the Wald Ratio method is used, and if 1 exposure factor, i.e., protein, corresponds to two or more pQTL data, the inverse variance weighting method is used.
And (2.2) obtaining the correlation and the P value of each single working variable and BD according to the operation result, and judging the significance of each single working variable and BD by taking P=0.05 as a standard. Protein biomarkers and gene loci thereof with significant association with bipolar disorder are obtained.
(2.3) integrating the externally verified risk gene result with the initially screened risk gene result, and the result of the co-localization analysis and the full-phenotype Mendelian randomization analysis, comprehensively analyzing to obtain genes with obvious causal relationship with BD and brain protein, plasma protein or cerebrospinal fluid protein corresponding to the genes, and taking the genes as protein biomarkers with causal relationship with BD.
The external verification step is also compiled by using R software, wherein the used package is a TwoSampleMR package, the working variables are input, the exposure factors and the ending variables are input, and the risk gene results with obvious causal relation are obtained through operation.
The experimental results of BD drug target prediction according to the above method are as follows:
1. according to the steps of preliminary screening candidate risk genes and protein biomarkers, obtaining the P value of the correlation between each SNP and bipolar disorder in the pQTL database of brain protein, plasma protein and cerebrospinal fluid protein, and correcting the P value threshold by using Bonferroni method (brain protein: P)<8.22×10 -5 The method comprises the steps of carrying out a first treatment on the surface of the Plasma protein: p (P)<8.16×10 -5 The method comprises the steps of carrying out a first treatment on the surface of the Cerebrospinal fluid protein: p (P)<2.34×10 -4 ) As a standard, 16 proteins and their corresponding genetic loci and their corresponding protein biomarkers were initially screened from the pQTL database of brain proteins, plasma proteins and cerebrospinal fluid proteins, including dynamin 3 (DNM 3), multiplex C2 and transmembrane domain containing 1 (MCTP 1), ATP binding cassettesubfamily B member (ABCB 8), flanness, autosomal dominant 5 (DFNA 5), peptide Deformylase (PDF) in brain proteins; cathepsin F (CTSF), LDL receptor related protein (LRP 8), interlukin 36 alpha (IL 36A), frizzled Relatedprotein (FRZB), agouti relatedneuropeptide (AGRP) in the cerebrospinal fluid protein; lectin, mannosebinding 2 like (LMAN 2L), C-X3-C motif chemokineligand 1 (CX 3CL 1), peptidase inhibitor (PI 3), neural cell adhesion molecule 1 (NCAM 1), TIMP metallopeptidaseinhibitor 4 (TIMP 4), inter-alpha-trypsin inhibitor heavy chain 1 (ITIH 1) in plasma proteins. The screening results are shown in Table 3.
Table 3: initially screened diphasic affective disorder candidate risk gene
2. According to Steiger filtering analysis and co-localization analysis steps, the level pleiotropic of 16 preliminarily screened bipolar disorder candidate risk genes was verified, and the genes passing the verification were: DFNA5, MCTP1, DNM3 and PDF genes expressed in brain tissue; CTSF, AGRP, FRZB and LRP8 genes expressed in cerebrospinal fluid and LMAN2L, NCAM1, CX3CL1 and TIMP4 genes expressed in plasma.
3. The association of the aforementioned genes with 783 common diseases was verified by Phe-MR analysis procedure. The results show that: the LRP8 gene target spot in cerebrospinal fluid has an improving effect on the incidence risk of schizophrenia and other neuropsychiatric disorders, and the P value is 9.28 multiplied by 10 -6 . In addition, other genes were not found to be associated with the risk of onset of any common disease.
In view of the above results, the present embodiment is based on the ROSMAP brain protein pQTL database, the plasma protein pQTL database published by Zheng team, the cerebrospinal fluid protein pQTL database published by Yang team, and the bipolar affective disorder GWAS data published by Mullins team, the bipolar affective disorder GWAS data in UK Biobank, and the bipolar affective disorder GWAS data in finngan, and through the above steps, we can determine proteins and gene loci that are significantly associated with bipolar affective disorders, and determine the causal relationship between protein biomarkers and bipolar affective disorders. The level change of the protein biomarker can influence the incidence risk of bipolar disorder, so that the protein biomarker can be considered as a drug target of bipolar disorder, and a solid foundation is laid for deeper drug target research and drug research and development.
In the prior art, many studies used GWAS data alone to obtain results, and did not use pQTL data (complex university. A multivariate Mendelian randomization method to infer causal relationship of images to phenotypes: CN202310034302.X [ P ].2023-05-09.; university of Siam traffic. A method to predict complex diseases and phenotype associated metabolites based on Mendelian randomization: CN202010139836.5[ P ]. 2020-06-26.). Many studies have used PWAS or TWAS methods (however, such methods still suffer from poor causal relationship, insufficient linearity of correlation, etc.. Proteome-wideAssociation Study Provides Insights Into the Genetic Component of Protein Abundance in Psychiatric disorders, biol Psychiary 2021 Dec 1;90 (11): 781-789.Doi: 10.1016/j. Biops ch 2021.06.022. Epub 2021 Jul 6. PMID: 34454697; identification of novel proteins for lacunar stroke by integrating genome-wide association dataand human brain proteins. BMC Med. 2022 Jun 23;20 (1): 211. Doi: 10.1186/s12916-022-02408-y. PMID: 35733147; PMCID: PMC 9219149.) are different from the methods of the present invention in which GWAS and pQTL data are directly fused and analyzed for Mendelian randomization. Many studies have failed to consider error factors, and the combination of methods, system of methods, or screening methods for the same errors employed differ from the present invention, e.g., lack of co-localization analysis, exclusion of levels of multiple effects, steiger filtering analysis, etc. (complex university. A multivariate Mendelian randomization method to infer causal relationships of images to phenotypes: CN202310034302.X [ P ].2023-05-09.; university of traffic of West Ann. A method to predict complex diseases and phenotype-associated metabolites based on Mendelian randomization: CN202010139836.5[ P ]. 2020-06-26.; university of Suzhou. An integrated method to analyze plasma proteome, genome, and obesity-associated traits: CN202111539524.4[ P ]. 2022-03-15.). In summary, many technicians focus on the difference from the present invention, and the difficulty of integrating the predictive model used in the present invention is to reasonably select data, comprehensively consider errors, and reasonably combine and use the method.
Example 2 mental disorder biomarker prediction System
This embodiment provides a system for implementing the mental disease biomarker predictive method of embodiment 1, comprising:
the input module is used for inputting pQTL data and GWAS data of the mental diseases;
a prediction module for predicting a drug target according to the mental disease biomarker prediction method described in embodiment 1;
and the output module is used for outputting the prediction result.
According to the embodiment, the biomarker prediction method for the mental diseases is provided, can predict biomarkers and drug targets with diagnosis and prognosis values for the mental diseases according to pQTL data and GWAS data, and has good application prospects.

Claims (9)

1. A method for predicting a biomarker for mental disease, comprising the steps of:
step 1, pQTL data and GWAS data of mental diseases are obtained;
step 2, analyzing the pQTL data and the GWAS data by adopting a double-sample Mendelian randomization method to obtain a primary screening result of causal proteins;
step 3, analyzing the preliminary screening result by adopting Steiger filtering analysis and co-localization analysis to obtain candidate protein biomarkers after the level of pleiotropic activity is eliminated;
step 4, verifying the specificity of the candidate protein biomarker by adopting a phenotype-Mendelian randomization method, and obtaining the protein biomarker with specificity through verification;
step 2 comprises the following steps:
step 2.1, making SNP with strong correlation with protein in tissue in pQTL data as tool variable, protein as exposure factor and GWAS data as ending factor; evaluating causal relationship between proteins and the mental disease based on a two-sample mendelian randomization design theory;
step 2.2, correcting the P threshold of the result by using a Bonferroni method, screening proteins with obvious causal relation with the mental diseases and gene loci thereof, and obtaining a primary screening result of the causal proteins;
after correction in step 2.2, the P threshold of brain protein was 8.22×10 -5 Plasma proteins have a P threshold of 8.16X10 -5 Pehreshold for cerebrospinal fluid proteins is 2.34×10 -4 The method comprises the steps of carrying out a first treatment on the surface of the When the protein P-value is less than the corresponding P-threshold, the protein has a significant causal relationship with the mental disorder.
2. A method of mental disease biomarker prediction according to claim 1, wherein: the mental disease is bipolar disorder, and the pQTL data is pQTL data of at least one tissue in brain, cerebrospinal fluid or blood plasma.
3. The method for predicting mental disease biomarkers according to claim 1, wherein in step 2.1, said SNPs with strong correlation are screened according to the following criteria:
(1) the correlation of a protein with its corresponding pQTL satisfies p<5×10 -8
(2) One genetic locus encodes only one protein;
(3) linkage balance is kept between gene loci;
(4) the pQTL is the cis pQTL;
(5) the F statistic of pQTL data is > 10.
4. A method of predicting a mental disorder biomarker according to claim 3, wherein: in step 2.1, the evaluation method is selected in the following manner: if 1 protein corresponds to 1 SNP data, the Wald Ratio method is used; if 1 protein corresponds to two or more SNP data, an inverse variance weighting method is used.
5. A method of mental disease biomarker prediction according to claim 1, wherein: step 3 comprises the following steps:
step 3.1, adopting Steiger filtering analysis to verify whether the causal relationship directionality of single SNP and mental diseases in the primary screening result is correct; excluding proteins with causal directionality inconsistencies;
and 3.2, calculating posterior probability of protein and mental diseases driven by the same SNP on the basis of correlation with a certain genome region by adopting co-localization analysis, and excluding the protein with the posterior probability less than 0.80.
6. A method of mental disease biomarker prediction according to claim 1, wherein: step 4 comprises the following steps:
step 4.1, enabling the candidate protein biomarker obtained in the step 3 to be an exposure factor, enabling SNP corresponding to the candidate protein biomarker to be a tool variable, and enabling common diseases and GWAS data thereof to be a ending factor; phe-MR analysis was performed with the aid of a double sample Mendelian randomization strategy to elucidate the causal relationship of selected candidate protein biomarkers with common diseases;
step 4.2, correcting the P threshold of the result by using a Bonferroni method, screening and removing proteins and gene loci thereof which have obvious causal relationship with common diseases of the mental diseases;
the modified P threshold in step 4.2 is 1.60×10 -5 When the protein P-value is less than P-threshold, the protein has a significant causal relationship with at least one of the common diseases of the non-mental diseases.
7. A method of mental disease biomarker prediction according to claim 1, wherein: the method further comprises the step of evaluating the prediction results, wherein the evaluation comprises sensitivity analysis and external verification.
8. A mental disorder biomarker prediction system, comprising:
the input module is used for inputting pQTL data and GWAS data of the mental diseases;
a prediction module for performing a prediction of a biomarker according to the mental disease biomarker prediction method of any of claims 1 to 7;
and the output module is used for outputting the prediction result.
9. A computer-readable storage medium having stored thereon: a computer program for implementing the mental disorder biomarker prediction method according to any of claims 1 to 7.
CN202311830150.0A 2023-12-28 2023-12-28 Mental disease biomarker prediction method, system and storage medium Active CN117476096B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311830150.0A CN117476096B (en) 2023-12-28 2023-12-28 Mental disease biomarker prediction method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311830150.0A CN117476096B (en) 2023-12-28 2023-12-28 Mental disease biomarker prediction method, system and storage medium

Publications (2)

Publication Number Publication Date
CN117476096A CN117476096A (en) 2024-01-30
CN117476096B true CN117476096B (en) 2024-03-08

Family

ID=89638363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311830150.0A Active CN117476096B (en) 2023-12-28 2023-12-28 Mental disease biomarker prediction method, system and storage medium

Country Status (1)

Country Link
CN (1) CN117476096B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110349623A (en) * 2019-01-17 2019-10-18 哈尔滨工业大学 Based on the senile dementia ospc gene and site selection method for improving Mendelian randomization
CN114188029A (en) * 2021-12-15 2022-03-15 苏州大学 Method for integrated analysis of plasma proteome, genome and obesity related traits
CN115948540A (en) * 2023-01-17 2023-04-11 杭州医学院 Early marker for knee osteoarthritis and application thereof
CN115960930A (en) * 2022-07-21 2023-04-14 华中科技大学同济医学院附属同济医院 Use of PLAU and PSMA4 targets in prevention and treatment of aortic aneurysms
WO2023168499A1 (en) * 2022-03-08 2023-09-14 PolygenRx Pty Ltd A method of precision treatment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230111145A1 (en) * 2021-10-12 2023-04-13 Washington University Methods for detecting proteins associated with ad

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110349623A (en) * 2019-01-17 2019-10-18 哈尔滨工业大学 Based on the senile dementia ospc gene and site selection method for improving Mendelian randomization
CN114188029A (en) * 2021-12-15 2022-03-15 苏州大学 Method for integrated analysis of plasma proteome, genome and obesity related traits
WO2023168499A1 (en) * 2022-03-08 2023-09-14 PolygenRx Pty Ltd A method of precision treatment
CN115960930A (en) * 2022-07-21 2023-04-14 华中科技大学同济医学院附属同济医院 Use of PLAU and PSMA4 targets in prevention and treatment of aortic aneurysms
CN115948540A (en) * 2023-01-17 2023-04-11 杭州医学院 Early marker for knee osteoarthritis and application thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Identification of novel proteins for lacunar stroke by integrating genome-wide association data and human brain proteomes;张程程等;《BMC Medicine》;20220630;第20卷(第211期);1-11 *
Xiao-Jing Gu等.Expanding causal genes for Parkinson’s disease via multi-omics analysis.《npj Parkinson's Disease 》.2023,(第9期),1-8. *
支链氨基酸与外周动脉粥样硬化因果关系的孟德尔随机化研究;付媛媛等;《上海预防医学》;20230620;第35卷(第6期);536-541 *

Also Published As

Publication number Publication date
CN117476096A (en) 2024-01-30

Similar Documents

Publication Publication Date Title
Shen et al. Genetic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers
De Jager et al. Alzheimer's disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci
US20060034508A1 (en) Computer system and method for medical assistance with imaging and genetics information fusion
De Velasco Oriol et al. Benchmarking machine learning models for late-onset alzheimer’s disease prediction from genomic data
JP2021522575A (en) Systems and methods for modifying the indication regimen
US20050021236A1 (en) Statistically identifying an increased risk for disease
Zhang et al. Distinct CSF biomarker-associated DNA methylation in Alzheimer’s disease and cognitively normal subjects
Li et al. A review of brain imaging biomarker genomics in Alzheimer’s disease: implementation and perspectives
Wang et al. Genome‐wide methylomic regulation of multiscale gene networks in Alzheimer's disease
CN117476096B (en) Mental disease biomarker prediction method, system and storage medium
KR20190000212A (en) SNP marker set for predicting of prognosis of rheumatoid arthritis
C. Silva et al. Distinct sex-specific DNA methylation differences in Alzheimer’s disease
Garrido-Martín et al. A fast non-parametric test of association for multiple traits
Fazal et al. RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci
Duggan et al. Proteome-wide analysis identifies plasma immune regulators of amyloid-beta progression
Gu et al. Developing a genetic biomarker-based diagnostic model for major depressive disorder using random forests and artificial neural networks
CN113160985A (en) Prediction method and system for COVID-19 clinical adverse prognosis risk
KR20220075700A (en) Type 2 diabetes mellitus prediction system using genome-wide Polygenic Risk Score
Yusufujiang et al. Cathepsins and Parkinson’s disease: insights from Mendelian randomization analyses
Prabhakaran et al. Analysis of structure and cost in a longitudinal study of alzheimer’s disease
Xia et al. Key gene network related to primary ciliary dyskinesia in hippocampus of patients with Alzheimer’s disease revealed by weighted gene co-expression network analysis
US20070088509A1 (en) Method and system for selecting a marker molecule
Li et al. Transcriptomic analysis reveals associations of blood-based A-to-I editing with Parkinson’s disease
Liu et al. Brain transcriptional regulatory architecture and schizophrenia etiology converge between East Asian and European ancestral populations
Reus et al. Connecting dementia risk loci to the CSF proteome identifies pathophysiological leads for dementia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant