CN107341366A - A kind of method that complex disease susceptibility loci is predicted using machine learning - Google Patents

A kind of method that complex disease susceptibility loci is predicted using machine learning Download PDF

Info

Publication number
CN107341366A
CN107341366A CN201710592222.0A CN201710592222A CN107341366A CN 107341366 A CN107341366 A CN 107341366A CN 201710592222 A CN201710592222 A CN 201710592222A CN 107341366 A CN107341366 A CN 107341366A
Authority
CN
China
Prior art keywords
machine learning
complex disease
model
susceptibility loci
positive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710592222.0A
Other languages
Chinese (zh)
Inventor
董珊珊
杨铁林
姚石
陈霄
陈一霄
郭燕
张钰洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201710592222.0A priority Critical patent/CN107341366A/en
Publication of CN107341366A publication Critical patent/CN107341366A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a kind of method that complex disease susceptibility loci is predicted using machine learning, comprise the following steps:(1) positive collection of the complex disease susceptibility loci known to collecting as machine learning model, speculated according to positive collection and collected with the incoherent site of complex disease as negative, and carry out the annotation of commitment element;(2) complex disease commitment model is established using machine learning;(3) according to the model of foundation, site whole in the range of full-length genome is just predicted, obtains potential susceptibility loci of the final prediction result as complex disease.The inventive method combines epigenetics information and DNA information, commitment element characteristics are extracted by machine learning, and then in the susceptibility loci of full-length genome scope interior prediction complex disease, the genetic force that the susceptibility loci found is explained is remarkably improved, potential target is provided for subsequent design medicine and disease detection.

Description

A kind of method that complex disease susceptibility loci is predicted using machine learning
Technical field
The present invention relates to complex disease susceptibility loci electric powder prediction, and in particular to one kind is multiple using machine learning prediction The screening technique in miscellaneous disease-susceptible humans site.
Background technology
In recent years, whole-genome association, which turns into, discloses complex disease susceptibility loci (Single nucleotide Polymorphism, SNP) most hot and effective research method.Profit in this way, there is now two over thousands of paper publishings in state On the high level magazine of border, nearly ten thousand complex disease susceptibility locis of successful identification.Although whole-genome association achievement is rich, But the anticipation of scientist is reached far away --- find most of disease-susceptible humans site.For specific complex disease, report The disease genetic that explanation is accumulated in disease-susceptible humans site makes a variation less than 15%, still has a large amount of unknown inherent causes, i.e., " loses Genetic force " urgently excavate.This is the common issue faced in all complex disease genetics research, reflects us to existing The utilization of data resource and excavation deficiency.In order to find unknown genetic virulence factor, there is an urgent need to propose conscientiously may be used at this stage Capable new method, new tool, deeply, human genome data are systematically excavated, its result helps to disclose the hair of complex disease Anttdisease Mechanism, the design of targeted drug and research and development and clinical early screening and individuation preventing and treating etc..
Genome includes two class hereditary information:That is DNA sequence dna hereditary information and epigenetics information.At present, apparent something lost The achievement in research learned is passed applied in the research and treatment of some diseases.Therefore, when carrying out disease-susceptible humans site estimation, It is highly desirable to include the information of epigenetics.It is existing easy based on genome commitment element characteristics prediction complex disease The method for feeling site is varied, and majority is the hereditary variation for predicting exon region or specific gene seat.But noncoding region Polymorphism can equally influence the expression of downstream gene, so as to disclose the pathogenesis of complex disease.Therefore extremely it is necessary Site in the range of full-length genome is screened, finds the site related to complex disease.At present, existing multiple databases are taken off Genome epigenetics information is shown, but hundreds of millions of genetic markers and the component information of multidimensional are to the pre- of genetic locus Survey brings huge challenge.Machine learning is nearly more than the 20 years multi-field cross disciplines risen, in order to abundant and effective Ground utilizes biological data, and the crossing research of biology and machine learning becomes increasingly active.Therefore, based on genome commitment member Part feature is very necessary using the complex disease susceptibility loci in the range of the method prediction full-length genome of machine learning.
The content of the invention
The defects of in order to overcome prior art, it is an object of the invention to provide a kind of method using machine learning, knot The Forecasting Methodology of the complex disease susceptible inheritance mark of commitment element characteristics is closed, by epigenetics information and genome DNA information combines, and commitment element characteristics are extracted by machine learning, and then complicated in full-length genome scope interior prediction The susceptibility loci of disease, explained genetic force is remarkably improved, is provided potentially for subsequent design medicine and disease detection Target.
To achieve these goals, the technical proposal of the invention is realized in this way:
A kind of method that complex disease susceptibility loci is predicted using machine learning, is comprised the following steps:
P1:Positive collection of the complex disease susceptibility loci known to collection as machine learning model, speculates according to positive collection Site incoherent with complex disease collects as negative, and carries out the annotation of commitment element;
P2:Complex disease commitment model is established using machine learning;
P3:According to the model of foundation, whole sites in the range of full-length genome are just predicted, obtain final prediction knot Potential susceptibility loci of the fruit as complex disease.
Affiliated step P1 is specifically included:
P11:A certain disease is collected using public database GWAS catalog, PheGenI and Pubmed pertinent literatures Known susceptible SNP, and the genotype data announced using thousand human genome plans is calculated and the high chain SNP of known susceptibility loci Collect as the positive;
P12:Collect for feminine gender, we screen the negative set of SNP compositions for meeting following condition in the range of full-length genome:A. With in positive set in the range of SNP certain distances;B. the difference of SNP minimum gene frequency is small in corresponding positive set In 0.05;C. independently of all SNP (r in positive set2<0.1);
P13:All commitment component informations of genome, including transcription factor are obtained from UCSC and Roadmap databases Binding site, histone modification site and chromatin cutting state;Linked groups' gene expression quantity is obtained from GTEx databases Trait locuses information;Sequence conservation feature is obtained from ANNOVAR databases, every kind of controlling element saves as a text text Part;
P14:Using the commitment component information of acquisition, according to the physical location of genome to above-mentioned positive collection and feminine gender SNP in collection is annotated.If the principle of correspondence has overlapping for SNP with position in the room of some controlling element, then it is assumed that the SNP Arrived by this controlling element annotation.
The step P2 is specifically included:
P21:For the result after above-mentioned annotation, the correlation between controlling element is calculated using the corrplot bags in R And remove high related controlling element at random, annotation result is then randomly divided into training set and test set two parts, wherein Training set accounts for the 80% of total collection, and test set accounts for the 20% of total collection, and this step carries out 5 folding cross validations;
P22:Model established to gained training set annotation matrix of consequence in P21 with different machines learning algorithm, and with testing Collect the reliability of judgment models.Evaluation index includes sensitivity sensitivity, specific specificity, precision Precision, degree of accuracy accuracy and F1 fraction, calculation formula are as follows:
Sensitivity=TP/P=TP/ (TP+FN)
Specificity=TN/N=TN/ (TN+FP)
Precision=TP/P '=TP/ (TP+FP)
Accuracy=(TP+TN)/(P+N)
F1=2 × TP/ (2 × TP+FP+FN)
Wherein, TP is true positives, and FN is false negative, and TN is true negative, and FP is false positive;
P23:The model-evaluation index according to P22, model is optimized using element characteristics selection.Specific steps are such as Under:Importance ranking of the controlling element to model is obtained by model;Multiple character subsets, collection are built according to the importance of element Characteristic in conjunction increases to maximum from 1;The optimal subset of model is determined according to model-evaluation index, to predict new complexity Disease-susceptible humans genetic locus.
The step P3 is specifically included:
P31:The optimal subset of machine learning model is obtained by P2 steps, using the controlling element included in subset to complete Whole sites are annotated in genome range;
P32:According to the optimal models of foundation, whole sites in the range of full-length genome are predicted, finally given and sun Property controlling element annotate the potential susceptibility loci in similar site, as complex disease.
It is of the present invention to be based on genome commitment element characteristics, predict complex disease susceptibility loci using machine learning Screening technique, suitable for various complex diseases, for example, various cancers, endocrine system disease, angiocardiopathy, metabolism class disease, Immune class disease etc..The present invention carried it is a kind of using machine learning, with reference to the susceptible something lost of the complex disease of commitment element characteristics The Forecasting Methodology of mark is passed, epigenetics information and DNA information are combined, extracted by machine learning apparent Controlling element feature, and then in the susceptibility loci of full-length genome scope interior prediction complex disease, it is remarkably improved explained something lost Power transmission, potential target is provided for subsequent design medicine and disease detection.
Brief description of the drawings
Fig. 1 is the flow chart provided by the invention that complex disease susceptibility loci screening technique is predicted using machine learning.
Embodiment
Present disclosure is described in further detail below in conjunction with the accompanying drawings.
Example:By taking complex disease type ii diabetes as an example, using the method for the present invention, type ii diabetes susceptibility loci is carried out Prediction, be described in detail below.
As shown in figure 1, the present invention, which provides one kind, is based on genome commitment element characteristics, predicted using machine learning multiple Miscellaneous disease-susceptible humans site selection method, comprises the following steps P1-P3.
P1:Positive collection of the type ii diabetes susceptibility loci known to collection as machine learning model, and carry out apparent tune Control the annotation of element.
Specifically include:II is collected from the pertinent literature in public database GWAS catalog, PheGenI and Pubmed Susceptible SNP known to patients with type Ⅰ DM, totally 65, collect as the positive.The genotype number announced afterwards using thousand human genome plans The supplement collected according to calculating with the high chain SNPs of this 65 susceptibility locis as the positive, 1769 altogether.Screening meets P12 simultaneously Described in the SNPs of condition gather as negative.Obtained from UCSC, Roadmap, GTEx and ANNOVAR database and II types sugar The related commitment component information of urine disease, after removing high related elements, including 33 kinds of DNA hypersensitive sites, 202 kinds of transcriptions Factor binding site, 315 kinds of histone modification sites, 639 kinds of chromatin cutting states, 17 kinds of gene expression Quantitative Trait Genes Seat information and a kind of sequence conservation feature.Using these commitment component informations, the SNP in positive and negative set is entered Row annotation.
P2:The commitment model of type ii diabetes is established using machine learning.
Specifically include:Annotation result is randomly divided into training set and test set two parts, wherein training set accounts for total collection 80%, test set accounts for the 20% of total collection, and this step carries out 5 folding cross validations, and with a variety of models and element characteristics selection to mould Type optimizes.In type ii diabetes forecast model, performance of Random Forest model when using 60 controlling elements is optimal, Wherein sensitivity be 0.9736, specificity 0.9852, F1 values be 0.9213.
P3:According to the model of foundation, site whole in the range of full-length genome is just predicted, obtains final prediction As a result the potential susceptibility loci as type ii diabetes.
Specifically include:Whole sites in the range of full-length genome are carried out using 60 controlling elements in optimal models in P2 Annotation and prediction, finally give the potential susceptibility loci in the site, as type ii diabetes similar to positive controlling element annotation.
Experimental result:For type ii diabetes, the present invention predicts 15204 potential susceptibility locis altogether.Based on dbGaP numbers Hereditary Capacity is carried out according to the type ii diabetes data (phs000867.v1.p1) in storehouse, finding the site of prediction can dramatically increase Interpretable genetic force (P<0.05).The gene influenceed with the new susceptibility loci of prediction carries out path analysis, finds gene Significant enrichment is in type I diabetes mellitus, antigen processing and presentation, graft- versus-host disease、allograft rejection、cytokine-cytokine receptor In the paths such as interaction.These paths have been reported that the generation to type ii diabetes is related.This explanation is based on genome table Controlling element feature is seen, predicts that the screening technique of complex disease susceptibility loci is feasible using machine learning.
Particular embodiments described above, the purpose of the present invention, technical scheme and beneficial effect are carried out further Describe in detail.But protection scope of the present invention is not limited thereto.Within the spirit and principles of the invention, that is done is any Modification, equivalent substitution, improvement etc., it should all be included within protection scope of the present invention.

Claims (4)

1. a kind of screening technique that complex disease susceptibility loci is predicted using machine learning, it is characterised in that comprise the following steps:
P1:Positive collection of the complex disease susceptibility loci known to collection as machine learning model, speculates and multiple according to positive collection The miscellaneous incoherent site of disease collects as negative, and carries out the annotation of commitment element;
P2:Complex disease commitment model is established using machine learning;
P3:According to the model of foundation, site whole in the range of full-length genome is just predicted, obtains final prediction result Potential susceptibility loci as complex disease.
2. a kind of screening technique that complex disease susceptibility loci is predicted using machine learning according to claim 1, it is special Sign is that the step P1 specifically includes following steps:
P11:A certain complex disease is collected using public database GWAS catalog, PheGenI and Pubmed pertinent literatures Known susceptible SNP, and the genotype data announced using thousand human genome plans is calculated and the high chain SNP of known susceptibility loci Collect as the positive;
P12:Collect for feminine gender, screen the negative set of SNP compositions for meeting following condition in the range of full-length genome:A. collect with the positive In conjunction in the range of SNP certain distances;B. the difference of SNP minimum gene frequency is less than 0.05 in corresponding positive set; C. independently of all SNP (r in positive set2<0.1);After selection finishes, the ratio of positive collection and negative collection is 1:20;
P13:All commitment component informations of genome are obtained from UCSC and Roadmap databases, including transcription factor combines Site, histone modification site and chromatin cutting state;Linked groups' gene expression quantitative character is obtained from GTEx databases Locus information;Sequence conservation feature is obtained from ANNOVAR databases, every kind of controlling element saves as a text;
P14:Using the commitment component information of acquisition, according to the physical location of genome in above-mentioned positive collection and negative collection SNP annotated, if the principle of correspondence be SNP have with position in the room of some controlling element it is overlapping, then it is assumed that the SNP is by this One controlling element annotation arrives.
3. a kind of screening technique that complex disease susceptibility loci is predicted using machine learning according to claim 1, it is special Sign is that the step P2 specifically includes following steps:
P21:For the result after above-mentioned annotation, calculate the correlation between controlling element using the corrplot bags in R and incite somebody to action High related controlling element removes at random, annotation result then is randomly divided into training set and test set two parts, wherein training Collection accounts for the 80% of total collection, and test set accounts for the 20% of total collection, and this step carries out 5 folding cross validations;
P22:Model, the machine learning are established to gained training set annotation matrix of consequence in P21 with different machines learning algorithm Method includes but is not limited to random forest, decision tree, SVMs;And referred to the reliability of test set judgment models, evaluation Mark includes sensitivity sensitivity, specific specificity, precision precision, accuracy and F1 points of the degree of accuracy Number, calculation formula are as follows:
Sensitivity=TP/ (TP+FN)
Specificity=TN/ (TN+FP)
Precision=TP/ (TP+FP)
Accuracy=(TP+TN)/(TP+FN+FP+TN)
F1=2 × TP/ (2 × TP+FP+FN)
Wherein, TP is true positives, and FN is false negative, and TN is true negative, and FP is false positive;
P23:The model-evaluation index according to P22, model is optimized using element characteristics selection, comprised the following steps that: Importance ranking of the controlling element to model is obtained by model;Multiple character subsets, set are built according to the importance of element In characteristic maximum be gradually reduced to 1;The optimal subset of model is determined according to model-evaluation index, to predict new complexity Disease-susceptible humans genetic locus.
4. a kind of screening technique that complex disease susceptibility loci is predicted using machine learning according to claim 1, it is special Sign is that the step P3 specifically includes herein below:
P31:The optimal subset of machine learning model is obtained by P2 steps, using the controlling element included in subset to full genome Whole sites is annotated in the range of group;
P32:According to the optimal models of foundation, site whole in the range of full-length genome is predicted, finally given and the positive Controlling element annotates the potential susceptibility loci in similar site, as complex disease.
CN201710592222.0A 2017-07-19 2017-07-19 A kind of method that complex disease susceptibility loci is predicted using machine learning Pending CN107341366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710592222.0A CN107341366A (en) 2017-07-19 2017-07-19 A kind of method that complex disease susceptibility loci is predicted using machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710592222.0A CN107341366A (en) 2017-07-19 2017-07-19 A kind of method that complex disease susceptibility loci is predicted using machine learning

Publications (1)

Publication Number Publication Date
CN107341366A true CN107341366A (en) 2017-11-10

Family

ID=60215979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710592222.0A Pending CN107341366A (en) 2017-07-19 2017-07-19 A kind of method that complex disease susceptibility loci is predicted using machine learning

Country Status (1)

Country Link
CN (1) CN107341366A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108592A (en) * 2017-12-29 2018-06-01 北京聚道科技有限公司 A kind of construction method of machine learning model for the pathogenic marking of hereditary variation
CN108334749A (en) * 2018-02-06 2018-07-27 西安交通大学 A kind of method and system based on chromatin control loop detection complex disease epistasis
CN108363902A (en) * 2018-01-30 2018-08-03 成都奇恩生物科技有限公司 A kind of accurate prediction technique of pathogenic hereditary variation
CN108920893A (en) * 2018-09-06 2018-11-30 南京医科大学 A kind of cranio-maxillofacial bone and soft tissue form prediction method based on artificial intelligence
CN109390032A (en) * 2018-11-02 2019-02-26 吉林大学 A method of SNP relevant with disease is explored in the data of whole-genome association based on evolution algorithm and is combined
WO2020077552A1 (en) * 2018-10-17 2020-04-23 上海允英医疗科技有限公司 Tumor prognostic prediction method and system
CN111489788A (en) * 2020-03-27 2020-08-04 北京航空航天大学 Deep association nuclear learning technology for explaining complex disease genetic relationship
CN111724911A (en) * 2020-05-13 2020-09-29 深圳哲源生物科技有限责任公司 Target drug sensitivity prediction method and device, terminal device and storage medium
CN113284611A (en) * 2021-05-17 2021-08-20 西安交通大学 System, device and storage medium for diagnosing and prognosing cancer based on individual pathway activity
CN113838525A (en) * 2021-09-29 2021-12-24 中山大学 Method and system for predicting pathogenic gene pair
CN114496099A (en) * 2022-01-26 2022-05-13 腾讯科技(深圳)有限公司 Cell function annotation method, device, equipment and medium
CN116959585A (en) * 2023-09-21 2023-10-27 中国农业科学院作物科学研究所 Deep learning-based whole genome prediction method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682456A (en) * 2016-12-30 2017-05-17 西安交通大学 Method for exploring complex disease susceptibility genes based on characteristics of genome epigenetic regulation elements

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682456A (en) * 2016-12-30 2017-05-17 西安交通大学 Method for exploring complex disease susceptibility genes based on characteristics of genome epigenetic regulation elements

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAO S ET AL.: "Regulatory element-based prediction identifies new susceptibility regulatory variants for osteoporosis", 《HUMAN GENETICS》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108592B (en) * 2017-12-29 2020-06-16 北京聚道科技有限公司 Construction method of machine learning model for genetic variation pathogenicity scoring
CN108108592A (en) * 2017-12-29 2018-06-01 北京聚道科技有限公司 A kind of construction method of machine learning model for the pathogenic marking of hereditary variation
CN108363902B (en) * 2018-01-30 2022-02-25 成都奇恩生物科技有限公司 Accurate prediction method for pathogenic genetic variation
CN108363902A (en) * 2018-01-30 2018-08-03 成都奇恩生物科技有限公司 A kind of accurate prediction technique of pathogenic hereditary variation
CN108334749A (en) * 2018-02-06 2018-07-27 西安交通大学 A kind of method and system based on chromatin control loop detection complex disease epistasis
CN108334749B (en) * 2018-02-06 2020-07-28 西安交通大学 Method and system for detecting epistasis of complex disease based on chromatin regulation and control loop
CN108920893A (en) * 2018-09-06 2018-11-30 南京医科大学 A kind of cranio-maxillofacial bone and soft tissue form prediction method based on artificial intelligence
CN108920893B (en) * 2018-09-06 2019-04-16 南京医科大学 A kind of cranio-maxillofacial bone and soft tissue form prediction method based on artificial intelligence
WO2020077552A1 (en) * 2018-10-17 2020-04-23 上海允英医疗科技有限公司 Tumor prognostic prediction method and system
CN109390032A (en) * 2018-11-02 2019-02-26 吉林大学 A method of SNP relevant with disease is explored in the data of whole-genome association based on evolution algorithm and is combined
CN109390032B (en) * 2018-11-02 2020-07-31 吉林大学 Method for exploring disease-related SNP (single nucleotide polymorphism) combination in data of whole genome association analysis based on evolutionary algorithm
CN111489788A (en) * 2020-03-27 2020-08-04 北京航空航天大学 Deep association nuclear learning technology for explaining complex disease genetic relationship
CN111489788B (en) * 2020-03-27 2022-05-20 北京航空航天大学 Deep association kernel learning system for explaining genetic relationship of complex diseases
CN111724911A (en) * 2020-05-13 2020-09-29 深圳哲源生物科技有限责任公司 Target drug sensitivity prediction method and device, terminal device and storage medium
CN113284611A (en) * 2021-05-17 2021-08-20 西安交通大学 System, device and storage medium for diagnosing and prognosing cancer based on individual pathway activity
CN113284611B (en) * 2021-05-17 2023-06-06 西安交通大学 Cancer diagnosis and prognosis prediction system, apparatus and storage medium based on individual pathway activity
CN113838525A (en) * 2021-09-29 2021-12-24 中山大学 Method and system for predicting pathogenic gene pair
CN113838525B (en) * 2021-09-29 2023-09-29 中山大学 Prediction method and system for pathogenic gene pair
CN114496099A (en) * 2022-01-26 2022-05-13 腾讯科技(深圳)有限公司 Cell function annotation method, device, equipment and medium
CN116959585A (en) * 2023-09-21 2023-10-27 中国农业科学院作物科学研究所 Deep learning-based whole genome prediction method
CN116959585B (en) * 2023-09-21 2023-12-12 中国农业科学院作物科学研究所 Deep learning-based whole genome prediction method

Similar Documents

Publication Publication Date Title
CN107341366A (en) A kind of method that complex disease susceptibility loci is predicted using machine learning
Montesinos-López et al. A benchmarking between deep learning, support vector machine and Bayesian threshold best linear unbiased prediction for predicting ordinal traits in plant breeding
Ramstein et al. Breaking the curse of dimensionality to identify causal variants in Breeding 4
CN102473247B (en) Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules
Yan et al. Machine learning bridges omics sciences and plant breeding
Lin et al. lncLocator 2.0: a cell-line-specific subcellular localization predictor for long non-coding RNAs with interpretable deep learning
Xu et al. Marker‐assisted selection in plant breeding: From publications to practice
Carstens et al. Model selection as a tool for phylogeographic inference: an example from the willow S alix melanopsis
Yin et al. Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype
Ji et al. Identifying time-lagged gene clusters using gene expression data
CN113519028A (en) Methods and compositions for estimating or predicting genotypes and phenotypes
Sun et al. The role and basics of computer simulation in support of critical decisions in plant breeding
CN106971071A (en) A kind of Clinical Decision Support Systems and method
Mamani Machine Learning techniques and Polygenic Risk Score application to prediction genetic diseases
CN106295246A (en) Find the lncRNA relevant to tumor and predict its function
CN109727641B (en) Whole genome prediction method and device
Edriss et al. Genomic prediction in a large African maize population
Saeys et al. In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists
Pool Genetic mapping by bulk segregant analysis in Drosophila: experimental design and simulation-based inference
Fang et al. De novo drug design by iterative multiobjective deep reinforcement learning with graph-based molecular quality assessment
CN105938522A (en) Method for predicting effector molecules of bacterial IV-type secretory system
CN107918725A (en) A kind of DNA methylation Forecasting Methodology based on machine learning selection optimal characteristics
CN118072823A (en) Rice phenotype prediction method and system based on whole genome selection
Li et al. Fast diffusion of domesticated maize to temperate zones
CN106682456B (en) A kind of method for digging of the complex disease tumor susceptibility gene based on genome commitment element characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171110

WD01 Invention patent application deemed withdrawn after publication