CN107247873A - A kind of recognition methods of differential methylation site - Google Patents

A kind of recognition methods of differential methylation site Download PDF

Info

Publication number
CN107247873A
CN107247873A CN201710419211.2A CN201710419211A CN107247873A CN 107247873 A CN107247873 A CN 107247873A CN 201710419211 A CN201710419211 A CN 201710419211A CN 107247873 A CN107247873 A CN 107247873A
Authority
CN
China
Prior art keywords
mrow
msub
site
sample
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710419211.2A
Other languages
Chinese (zh)
Other versions
CN107247873B (en
Inventor
凡时财
宋应
邹见效
何建
徐红兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Publication of CN107247873A publication Critical patent/CN107247873A/en
Application granted granted Critical
Publication of CN107247873B publication Critical patent/CN107247873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Abstract

The invention discloses a kind of recognition methods of differential methylation site, differential methylation site is converted into a kind of found by feature selection approach by the thought of classification has the site of significant contribution, i.e. differential methylation site to classifying.Specifically, the chip data that methylated first to the 450K got from public database carries out error between error in data prediction, including standardized data elimination group, removal batch effect elimination group, removes the less site of variance;Secondly, contribution margin of each site to classification is obtained by building Random Forest model;Finally, if site contribution margin is more than 0, then it is assumed that the site is differential methylation site.The differential methylation site obtained by this method can have more preferable kind judging performance, and more precise results are provided for cancer diagnosis.

Description

A kind of recognition methods of differential methylation site
Technical field
The invention belongs to DNA methylation identification technology field, more specifically, it is related to a kind of differential methylation site and knows Other method.
Background technology
As human genome epigenetic phenomenon the most typical, DNA methylation is played the part of in the activity of a variety of key physiologicals Drill key player.Its methylation state and the generation of various diseases, particularly cancer are closely related.Specifically, institute is not it There are methylation sites relevant with cancer, only some specific methylation sites are related to cancer.These specific first in text Base site is referred to as differential methylation site.
At present, statistical method generally is used in the recognizer of differential methylation site, such as fastDMA employs variance Analysis, ChAMP employ linear regression combination t hypothesis testing methods.But use traditional statistical method can only be roughly The site of significant difference from the statistical significance is found, therefore the differential methylation site found is more, not all difference methyl Changing site has cancer diagnosis function.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of recognition methods of differential methylation site, from The angle of feature selecting finds differential methylation site, so that the differential methylation site obtained can have good classification to sentence Qualitative energy.
For achieving the above object, a kind of differential methylation site recognition methods of the invention, it is characterised in that including with Lower step:
(1) the N groups 450K that a kind of cancer, is arbitrarily obtained from cancer gene group public database methylates chip data sample This;
(2) the chip data sample that, methylated to N groups 450K is pre-processed
(2.1), it is standardized using the SWAN algorithms chip data sample that methylated to all 450K, eliminates every Group 450K methylate chip data sample group in error;
(2.2), it is standardized using the ComBat algorithms chip data sample that methylated to every group of 450K, eliminates single Group 450K methylate chip data sample group between error;
(2.3), calculate every group of 450K according to site dimension to methylate the variance of chip data sample, then remove and account for all positions The small site of variance of point 1/3, obtains N group standards 450K and methylates chip data sample;
(3), by N group standards 450K methylate chip data sample deposit data set data, and data set data have f spy Levy;
(4), build random forest disaggregated model and find differential methylation site
(4.1) random forest disaggregated model is built
From site dimension partitioned data set data, s subset is obtained, m is randomly choosed from f feature in each subset Individual feature;
Split each subset that s son is concentrated from sample dimension again, each subset is split again and obtain t subset, finally Obtain s*t subset;
Finally, decision tree modeling is carried out using s*t subset, forms s*t tree, and then obtain random forest classification mould Type;
(4.2) importance of m feature, is calculated respectively
To feature gθCalculate importanceFormula be:
Wherein,Represent have in being set at the τIndividual node is by feature gθSegmentation,Represent special Levy gθSome in being set at the τ is by feature gθInformation gain ratio at the node of segmentation,Represent the τ tree in NodeThe sample number at place, num.in. τ represent in the τ tree that in root node sample number u, v is two balance factors;
WAcc represents that certain one tree, in training, is tested the sample do not used in this one tree, test index is The accuracy of Weight, calculation formula is as follows:
Wherein, c represents that sample has c classes, nxyThe sample number of y classes is assigned to the sample for belonging to xth class in prediction in expression;
(4.3), judge whether the importance of each feature is more than 0, if greater than 0, then the corresponding site of this feature is poor Different methylation sites.
What the goal of the invention of the present invention was realized in:
A kind of differential methylation site recognition methods of the invention, is turned the identification of differential methylation site by the thought of classification Be changed to a kind of found by feature selection approach has the site of significant contribution, i.e. differential methylation site to classifying.Specifically, The chip data that methylated first to the 450K got from public database carries out data prediction, including standardized data is eliminated Organize error between interior error, removal batch effect elimination group, remove the less site of variance;Secondly, by building random forest mould Type obtains contribution margin of each site to classification;Finally, if site contribution margin is more than 0, then it is assumed that the site is differential methylation Site.The differential methylation site obtained by this method can have more preferable kind judging performance, be that cancer diagnosis is carried For more precise results.
Meanwhile, a kind of differential methylation site recognition methods of the invention also has the advantages that:
(1), the present invention has used classificating thought to find differential methylation site, have selected a kind of more preferably reflection feature to dividing The evaluation index of class influence;
(2) multiple data subsets, are formed to Segmentation of Data Set can more preferably handle higher-dimension Small Sample Database, and model is more steady It is fixed;
(3), the present invention obtains the differential methylation site more less than other method, more representative, is tested with intersection When card compares, performance is more excellent.
Brief description of the drawings
Fig. 1 is a kind of differential methylation site recognition methods flow chart of the invention;
Fig. 2 is feature RI arrangement figures;
Fig. 3 is that two kinds of feature selecting algorithms intersect proving and comparisom figure on ESCA.
Embodiment
The embodiment to the present invention is described below in conjunction with the accompanying drawings, so as to those skilled in the art preferably Understand the present invention.Requiring particular attention is that, in the following description, when known function and design detailed description perhaps When can desalinate the main contents of the present invention, these descriptions will be ignored herein.
Embodiment
Describe, the relevant speciality term occurred in embodiment is illustrated for convenience first:
fastDMA(Fast Differential MethylationAnalysis):Quick differential methylation analysis method
ChAMP(The Chip Analysis Methylation Pipeline):Methylate chip analysis
TCGA(The Cancer Genome Atlas):Cancer gene group collection of illustrative plates;
SWAN(Subset-quantile Within Array Normalization):Subset quantile standardization side Method;
ComBat(Empirical Bayes methods):Empirical Bayes method;
ESCA(esophageal carcinoma):Cancer of the esophagus.
Fig. 1 is a kind of differential methylation site recognition methods flow chart of the invention.
In the present embodiment, as shown in figure 1, a kind of differential methylation site recognition methods of the invention, comprises the following steps:
S1, the N groups 450K for arbitrarily obtaining from cancer gene group public database TCGA a kind of cancer methylate chip-count According to sample, for finding the differential methylation site relevant with cancer;
In the present embodiment, 201 groups of ESCA 450K are obtained from TCGA to methylate chip data sample, and are possessed 379785 sites (feature).
S2, the chip data sample that methylated to 201 groups of 450K are pre-processed
S2.1, obtain 450K methylate chip data when, used two kinds of different probes (Typy-I and Typy-II), Cause there are two types site data, and both distributions are variant, therefore have very big shadow to differential methylation Locus Analysis in Shoots below Ring, therefore need to adjust two types data distribution so that both are similar, this process is referred to as data normalization processing;
In the present embodiment, it is standardized place using the SWAN algorithms chip data sample that methylated to all 450K Reason, eliminate every group of 450K methylate chip data sample group in error;
S2.2, in 201 groups of 450K methylate chip data sample, have the variation sample that there is biology irrelevant factor, This significant difference with master sample, therefore, rower is entered using the ComBat algorithms chip data sample that methylated to every group of 450K Quasi-ization processing, eliminate single group 450K methylate chip data sample group between error;
S2.3, calculate according to site dimension every group of 450K and methylate the variance of chip data sample, then remove and account for all positions The small site of variance of point 1/3, obtains 201 groups of standard 450K and methylates chip data sample, there remains 267252 sites.
S3, methylate 201 groups of standard 450K chip data sample deposit data set data, and data set data has 267252 features.
S4, structure random forest disaggregated model
From site dimension partitioned data set data, s subset is obtained, m is randomly choosed from f feature in each subset Individual feature;
Split each subset that s son is concentrated from sample dimension again, each subset is split again and obtain t subset, finally Obtain s*t subset, s*t son concentrates each subset to be methylated chip data sample comprising 66% standard 450K;
Finally, decision tree modeling is carried out using s*t subset, forms s*t tree, and then obtain random forest classification mould Type;
Wherein, decision tree modeling is carried out using s*t subset, the specific method for forming s*t tree is:
Decision tree T is now built by taking one of subset D that s*t son is concentrated as an example, other subsets build similar.
The site of subset D is made to gather (characteristic set) for A, it is ε that decision tree, which completes condition threshold, and decision tree T is described below Specific building process be:
If 1), all samples belong to same class C in subset Dk, then T is put for single node tree, and by CkIt is used as the node Class, returns to T;
If 2),T is then put for single node tree, and by the class C of instance number maximum in DkAs the class of the node, return Return T;
3), otherwise, calculate each feature in A according to equation below and, to D information gain ratio, select information gain than maximum Feature Ag
Information gain than calculating use equation below:
Wherein, H (D) is the empirical entropy of subset D, and calculation formula is as follows:
Wherein, | D | sample size, i.e. number of samples in subset D are represented, there is K classification Ck, k=1,2 ... K, | Ck| for category In class CkNumber of samples,
H (D | A) empirical condition entropys of the set A to subset D is characterized, calculation formula is as follows:
Wherein, DiRepresent that D divides one that n son is concentrated according to feature A, | Di| it is DiIn number of samples;
HA(D) it is worth entropy of the subset D on characteristic set A, calculation formula is as follows:
If 4), AgInformation gain ratio be less than threshold epsilon, then put T for single node tree, and by the class C of sample number maximum in Dk As the class of the node, T is returned;
5), otherwise, to AgEach probable value aj, according to Ag=ajD is divided into some non-NULL D of subseti, by DiMiddle sample number Maximum class builds child node as mark, constitutes tree T by node and its child node, returns to T;
6), to node r, with DiFor training set, with A- { AgCollection is characterized, step (a)~step (e) is recursively called, son is obtained Set Tr, return to Tr
S5, the importance for calculating using random forest disaggregated model m feature respectively
To feature gθCalculate importanceFormula be:
Wherein,Represent have in being set at the τIndividual node is by feature gθSegmentation,Represent special Levy gθSome in being set at the τ is by feature gθInformation gain ratio at the node of segmentation,Represent the τ tree in NodeThe sample number at place, num.in. τ represent in the τ tree that in root node sample number u, v is two balance factors;
WAcc represents that certain one tree, in training, is tested the sample do not used in this one tree, test index is The accuracy of Weight, calculation formula is as follows:
Wherein, c represents that sample has c classes, nxyThe sample number of y classes is assigned to the sample for belonging to xth class in prediction in expression;
By above-mentioned formula, the importance of each feature can be calculated;
S6, random forest disaggregated model is utilized to find differential methylation site
Judge whether the importance of each feature is more than 0, if greater than 0, then the corresponding site of this feature is difference methyl Change site, finally give difference methyl site for 4136, such as Fig. 2.
S7, in order to verify that the inventive method increases, it is necessary to and tradition really in terms of differential methylation site is found Method is contrasted, and the control methods selected here is fastDMA, and contrast index is average accuracy macc, that is, assesses two methods Classification performance.Specifically, differential methylation site set f1 of the inventive method on ESCA is obtained by S6 first, then tied Close ESCA data sets data and obtain data subset d1;Differential methylation site set f2 of the fastDMA algorithms on ESCA, then tie Close ESCA data sets data and obtain data subset d2.Secondly, in order to verify f1 and f2 kind judging performance, it is necessary to selection sort Device, the present invention have selected three kinds of common graders, i.e. svm (SVMs), nb (Naive Bayes Classifier), Logistic (logistic regression);Finally, 10 folding cross validations are carried out in three kinds of graders respectively on data set d1 and d2 to obtain Respective average accuracy macc, such as Fig. 3.
10 folding cross validation methods are now done by brief elaboration with data set d1:
Data set d1 is divided into 10 subsets of capacity identical from sample dimension, and to model training 10 times.In the u times (u =1,2 ..., 10) when training, with all trained models except u-th of subset, then with obtained model to u-th Subset calculates accuracy wacc, and the numerical approximation of model Generalization Ability is used as using 10 wacc mean values macc.
Although illustrative embodiment of the invention is described above, in order to the technology of the art Personnel understand the present invention, it should be apparent that the invention is not restricted to the scope of embodiment, to the common skill of the art For art personnel, as long as various change is in the spirit and scope of the present invention that appended claim is limited and is determined, these Change is it will be apparent that all utilize the innovation and creation of present inventive concept in the row of protection.

Claims (2)

1. a kind of differential methylation site recognition methods, it is characterised in that comprise the following steps:
(1) the N groups 450K that a kind of cancer, is arbitrarily obtained from cancer gene group public database methylates chip data sample;
(2) the chip data sample that, methylated to N groups 450K is pre-processed
(2.1), it is standardized using the SWAN algorithms chip data sample that methylated to all 450K, eliminates every group 450K methylate chip data sample group in error;
(2.2), it is standardized using the ComBat algorithms chip data sample that methylated to every group of 450K, eliminates single group 450K methylate chip data sample group between error;
(2.3), calculate every group of 450K according to site dimension to methylate the variance of chip data sample, then remove and account for all sites 1/ The small site of 3 variance, obtains N group standards 450K and methylates chip data sample;
(3), N group standards 450K is methylated chip data sample deposit data set data, and data set data has f feature;
(4), build random forest disaggregated model and find differential methylation site
(4.1) random forest disaggregated model is built
From site dimension partitioned data set data, s subset is obtained, m spy is randomly choosed from f feature in each subset Levy;
Split each subset that s son is concentrated from sample dimension again, each subset is split again and obtain t subset, finally give S*t subset;
Finally, decision tree modeling is carried out using s*t subset, forms s*t tree, and then obtain random forest disaggregated model;
(4.2) importance of m feature, is calculated respectively using random forest disaggregated model
To feature gθCalculate importanceFormula be:
<mrow> <msub> <mi>RI</mi> <msub> <mi>g</mi> <mi>&amp;theta;</mi> </msub> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>&amp;tau;</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>s</mi> <mi>t</mi> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <mi>w</mi> <mi>A</mi> <mi>c</mi> <mi>c</mi> <mo>)</mo> </mrow> <mi>u</mi> </msup> <munder> <mo>&amp;Sigma;</mo> <mrow> <msub> <mi>n</mi> <mrow> <mi>g</mi> <mi>&amp;theta;</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>&amp;tau;</mi> <mo>)</mo> </mrow> </mrow> </munder> <mi>I</mi> <mi>G</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <msub> <mi>g</mi> <mi>&amp;theta;</mi> </msub> </msub> <mo>(</mo> <mi>&amp;tau;</mi> <mo>)</mo> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <mi>n</mi> <mi>u</mi> <mi>m</mi> <mo>.</mo> <mi>i</mi> <mi>n</mi> <mo>.</mo> <msub> <mi>n</mi> <msub> <mi>g</mi> <mi>&amp;theta;</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>&amp;tau;</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>n</mi> <mi>u</mi> <mi>m</mi> <mo>.</mo> <mi>i</mi> <mi>n</mi> <mo>.</mo> <mi>&amp;tau;</mi> </mrow> </mfrac> <mo>)</mo> </mrow> <mi>v</mi> </msup> </mrow>
Wherein,Represent have in being set at the τIndividual node is by feature gθSegmentation,Represent feature gθ Some in being set at the τ is by feature gθInformation gain ratio at the node point of segmentation,Represent in the τ tree in knot PointThe sample number at place, num.in. τ represent in the τ tree that in root node sample number u, v is two balance factors;
WAcc represents that certain one tree, in training, is tested the sample do not used in this one tree, test index is cum rights The accuracy of weight, calculation formula is as follows:
<mrow> <mi>w</mi> <mi>A</mi> <mi>c</mi> <mi>c</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>c</mi> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>x</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>c</mi> </munderover> <mfrac> <msub> <mi>n</mi> <mrow> <mi>x</mi> <mi>x</mi> </mrow> </msub> <mrow> <msub> <mi>n</mi> <mrow> <mi>x</mi> <mn>1</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>n</mi> <mrow> <mi>x</mi> <mn>2</mn> </mrow> </msub> <mo>+</mo> <mo>...</mo> <mo>+</mo> <msub> <mi>n</mi> <mrow> <mi>x</mi> <mi>c</mi> </mrow> </msub> </mrow> </mfrac> </mrow>
Wherein, c represents that sample has c classes, nxyThe sample number of y classes is assigned to the sample for belonging to xth class in prediction in expression;
(4.3), judge whether the importance of each feature is more than 0, if greater than 0, then the corresponding site of this feature is difference first Base site.
2. a kind of differential methylation site recognition methods according to claim 1, it is characterised in that s × t described son Standard 450K of each subset comprising M% is concentrated to methylate chip data sample.
CN201710419211.2A 2017-03-29 2017-06-06 Differential methylation site recognition method Active CN107247873B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2017101994145 2017-03-29
CN201710199414 2017-03-29

Publications (2)

Publication Number Publication Date
CN107247873A true CN107247873A (en) 2017-10-13
CN107247873B CN107247873B (en) 2020-04-14

Family

ID=60018442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710419211.2A Active CN107247873B (en) 2017-03-29 2017-06-06 Differential methylation site recognition method

Country Status (1)

Country Link
CN (1) CN107247873B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109887543A (en) * 2019-02-27 2019-06-14 中南大学 A kind of differential methylation site recognition methods of hypomethylation level
CN110349628A (en) * 2019-06-27 2019-10-18 广东药科大学 A kind of protein phosphorylation site recognition methods, system, device and storage medium
CN112382342A (en) * 2020-11-24 2021-02-19 山西三友和智慧信息技术股份有限公司 Cancer methylation data classification method based on integrated feature selection
CN113326652A (en) * 2021-05-11 2021-08-31 广汽本田汽车有限公司 Data batch effect processing method, device and medium based on empirical Bayes
WO2021227950A1 (en) * 2020-05-09 2021-11-18 广州燃石医学检验所有限公司 Cancer prognostic method
CN115274123A (en) * 2022-07-15 2022-11-01 中国人民解放军总医院 Physical ability level prediction method, system, device, medium, and program product

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100003689A1 (en) * 2008-06-19 2010-01-07 John Wayne Cancer Institute Use of Methylation Status of MINT Loci as a Marker for Rectal Cancer
CN103559423A (en) * 2013-10-31 2014-02-05 深圳先进技术研究院 Method and device for predicting methylation
CN104462868A (en) * 2014-12-11 2015-03-25 西安电子科技大学 Genome-wide SNP (single nucleotide polymorphism) site analysis method based on combination of random forest and Relief-F
CN104781422A (en) * 2012-09-20 2015-07-15 香港中文大学 Non-invasive determination of methylome of fetus or tumor from plasma
CN104915679A (en) * 2015-05-26 2015-09-16 浪潮电子信息产业股份有限公司 Large-scale high-dimensional data classification method based on random forest weighted distance
CN106503458A (en) * 2016-10-26 2017-03-15 南京信息工程大学 A kind of surface air temperature data quality control method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100003689A1 (en) * 2008-06-19 2010-01-07 John Wayne Cancer Institute Use of Methylation Status of MINT Loci as a Marker for Rectal Cancer
CN104781422A (en) * 2012-09-20 2015-07-15 香港中文大学 Non-invasive determination of methylome of fetus or tumor from plasma
CN103559423A (en) * 2013-10-31 2014-02-05 深圳先进技术研究院 Method and device for predicting methylation
CN104462868A (en) * 2014-12-11 2015-03-25 西安电子科技大学 Genome-wide SNP (single nucleotide polymorphism) site analysis method based on combination of random forest and Relief-F
CN104915679A (en) * 2015-05-26 2015-09-16 浪潮电子信息产业股份有限公司 Large-scale high-dimensional data classification method based on random forest weighted distance
CN106503458A (en) * 2016-10-26 2017-03-15 南京信息工程大学 A kind of surface air temperature data quality control method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
EVERSON, TM 等: "DNA methylation loci associated with atopy and high serum IgE: a genome-wide application of recursive Random Forest feature selection", 《GENOME MEDICINE》 *
FAN, SC 等: "Predicting CpG methylation levels by integrating Infinium HumanMethylation450 BeadChip array data", 《GENOMICS》 *
MICHAŁ DRAMIŃSKI 等: "Monte Carlo Feature Selection and Interdependency Discovery in Supervised Classification", 《ADVANCES IN MACHINE LEARNING II》 *
张秋伊 等: "高维 DNA 甲基化数据的随机森林降维分析", 《中华疾病控制杂志》 *
李承哲: "DNA甲基化状态在线预测平台的设计与实现", 《万方数据知识服务平台》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109887543A (en) * 2019-02-27 2019-06-14 中南大学 A kind of differential methylation site recognition methods of hypomethylation level
CN109887543B (en) * 2019-02-27 2022-12-09 中南大学 Differential methylation site recognition method for low methylation level
CN110349628A (en) * 2019-06-27 2019-10-18 广东药科大学 A kind of protein phosphorylation site recognition methods, system, device and storage medium
CN110349628B (en) * 2019-06-27 2021-06-15 广东药科大学 Protein phosphorylation site recognition method, system, device and storage medium
WO2021227950A1 (en) * 2020-05-09 2021-11-18 广州燃石医学检验所有限公司 Cancer prognostic method
CN112382342A (en) * 2020-11-24 2021-02-19 山西三友和智慧信息技术股份有限公司 Cancer methylation data classification method based on integrated feature selection
CN113326652A (en) * 2021-05-11 2021-08-31 广汽本田汽车有限公司 Data batch effect processing method, device and medium based on empirical Bayes
CN115274123A (en) * 2022-07-15 2022-11-01 中国人民解放军总医院 Physical ability level prediction method, system, device, medium, and program product
CN115274123B (en) * 2022-07-15 2023-03-24 中国人民解放军总医院 Physical ability level prediction method, system, device, medium, and program product

Also Published As

Publication number Publication date
CN107247873B (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN107247873A (en) A kind of recognition methods of differential methylation site
Dudek et al. The choice of variable normalization method in cluster analysis
de Carvalho Fuzzy c-means clustering methods for symbolic interval data
US11194865B2 (en) Hybrid approach to approximate string matching using machine learning
Franco et al. A method for combining molecular markers and phenotypic attributes for classifying plant genotypes
CN103559504B (en) Image target category identification method and device
US20160283533A1 (en) Multi-distance clustering
Gustafsson et al. Comparison and validation of community structures in complex networks
CN110135167B (en) Edge computing terminal security level evaluation method for random forest
CN105045812A (en) Text topic classification method and system
CN105243394B (en) Evaluation method of the one type imbalance to disaggregated model performance influence degree
CN106991447A (en) A kind of embedded multi-class attribute tags dynamic feature selection algorithm
CN111401785A (en) Power system equipment fault early warning method based on fuzzy association rule
CN104040561A (en) Method for identifying microorganisms via mass spectrometry and score normalisation
CN102254033A (en) Entropy weight-based global K-means clustering method
CN104598774A (en) Feature gene selection method based on logistic and relevant information entropy
Sapkota et al. Data summarization using clustering and classification: Spectral clustering combined with k-means using nfph
Benso et al. A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory
Warrens Inequalities between similarities for numerical data
CN114398898B (en) Method for generating KPI curve and marking wave band characteristics based on log event relation
CN109271515A (en) A kind of antibiotic medicine method for risk stratification based on clustering
US20160283862A1 (en) Multi-distance similarity analysis with tri-point arbitration
CN106991171A (en) Topic based on Intelligent campus information service platform finds method
CN107368844A (en) A kind of bidirectional clustering method based on AP clusters and ISA bidirectional clusterings
CN106844765A (en) Notable information detecting method and device based on convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant