CN103366100A - Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome - Google Patents

Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome Download PDF

Info

Publication number
CN103366100A
CN103366100A CN2013102796270A CN201310279627A CN103366100A CN 103366100 A CN103366100 A CN 103366100A CN 2013102796270 A CN2013102796270 A CN 2013102796270A CN 201310279627 A CN201310279627 A CN 201310279627A CN 103366100 A CN103366100 A CN 103366100A
Authority
CN
China
Prior art keywords
snp
data
disease
individuality
population
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013102796270A
Other languages
Chinese (zh)
Inventor
张军英
刘丹
赵晓雪
谭芳慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN2013102796270A priority Critical patent/CN103366100A/en
Publication of CN103366100A publication Critical patent/CN103366100A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from a whole-genome is used for the pathogenetic mechanism research of complex diseases, the early diagnosis and biological medicine development. The method comprises the following steps: (1), pre-processing and initializing SNP data, and processing the SNP data into data only including 0, 1, 2, 3 as per the principle that the influence of the variation of a random gene among the alleles of homologous chromosomes on diseases can be in equal treatment; (2), defining the relevance measure, namely defining the relevance I (Y;X) between the SNP subset X and the diseases Y as mutual information MI (Y;X) between X and Y; (3), searching SNP groups of the candidate suspected pathogenesis in the SNP set by adopting an FGSA (factor based genetic search algorithm) method; (4), selecting the SNP group of which the occurrence frequency of frequentness exceeds a threshold value in the set of SNP groups of the candidate suspected pathogenesis according to the frequentness-relevance priority criterion; (5), outputting the SNP of which the frequentness is larger than the threshold value and ranking at the headmost. According to the invention, the method can reserve the SNP corresponding to the pathogenesis covered by other pathogenesis, so as to lay a foundation for the discovery of the follow-up pathogenesis.

Description

From full genome, filter the method with the irrelevant SNP of complex disease
Technical field
The invention belongs to technical field of data processing, specifically, proposed a kind of from full genome single nucleotide polymorphism (Single Nucleotide Polymorphism, SNP) method of the irrelevant SNP of filtration and complex disease in the data can be used for complex disease pathogenesis, early diagnosis and bio-pharmaceutical development.
Background technology
Complex disease is produced by multiple inherent cause, environmental factor acting in conjunction, and its generation and development are subject to the impact of a plurality of genes of complex network structures.Complex disease is different from the Mendelian inheritance disease, in most of situation, often be not enough to the key-gene that causes a disease, single-gene wherein may insignificantly even not exist the effect of causing a disease, but these single assortments of genes that may play insignificant effect, its joint effect but may be the pathogenesis of complex disease.These characteristics have brought very large difficulty for the Disease-causing gene of finding complex disease, and the pathogenesis, early diagnosis and the bio-pharmaceutical that are difficult to find Disease-causing gene or mark of correlation to be used for complex disease are developed.How finding out pathogenic a plurality of reasons and which gene federation in full genome range, to become a pathogenic reason be present Main Problems.
Solution can be divided into two classes: direct method and two-step approach.Direct method is directly searched at original SNP collection, can only process the middle and small scale data, and handled data scale is looked the algorithm difference and different (such as MDR, it is larger that the treatment scale of BEAM etc. just differs).Two-step approach is first by filtering, and filters out those and the SNP of disease independent from original SNP set, then searches in remaining SNP gathers.The present invention relates to the first step in the two-step approach: SNP filters.
The first step of most two-step approachs is exhaustive carrying out all, namely look for SNP to make up in twos score high, give over to second step and process (such as BOOST, AntEpiSeeker etc.).The ability of the method processing large-scale data is extremely limited, therefore introduce ant group algorithm, look for the SNP combination (AntEpiSeeker) than the higher order of possible order, and introduce random forest by looking for the strong SNP subset of classification capacity, need to obtain the further SNP set of investigating.All there is following deficiency in these filter methods:
1. the SNP order that can process is very limited, and is huge such as exhaustive calculated amount, and only exhaustive in twos SNP obtains its score, thereby can only keep 2 rank SNP reciprocations (i.e. the reciprocation of two SNP), and loses more high-order SNP reciprocation.
2. the SNP scale that can process is very limited, and this is because filter process needs complicated calculations, and only about 100 SNP, comparatively speaking, AntEpiSeeker can process more SNP (such as 5,000SNPs) such as the handled data scale of random forest method.
3. can not process a pathogenesis by the many pathogenesis situation under other pathogenesis coverage conditions.
Summary of the invention
The object of the invention is to overcome and adopt two-step approach in full genome range, to find out pathogenic a plurality of reasons and which gene federation to become existing deficiency aspect the pathogenic reason, invent a kind of method of from full genome, filtering with the irrelevant SNP of complex disease, this method is from full genome SNP data, filter out those single or join together all and the irrelevant SNP of disease phenotype, thereby keep those single or join together may with all high-order SNP paathogenic factors of disease association, for therefrom further the detection and Identification SNP reason of causing a disease lay the foundation.
Realize technical scheme of the present invention, comprise the steps:
(1) full genome SNP data is carried out pre-service and initialization
According to the variation of arbitrary gene in the homologue allele impact of disease is equal to and treats principle, with the pre-service of SNP data be:
Figure BSA00000921376200021
X wherein i∈ { 0,1,2,3} dValue for the corresponding site of SNP i: two allele on the corresponding site get 1 when for homozygote AA, get 2 during homozygote aa, get 3 when heterozygote Aa or aA, get 0 when this shortage of data; y i{ 1,2} is sample x to ∈ iThe class mark, 1 the expression disease group, 2 the expression control groups, N is the number of sample in the SNP data, d is the number of SNP in the data, and remembers that the set of related SNP is Ω;
(2) definition relevance measure
Each SNP subset may become a paathogenic factor by reciprocation, and one has the factor of 1 SNP to be called l rank factor, with the relevance I (Y between SNP factor X and the disease Y; X) be defined as mutual information MI (Y between X and the Y; X):
I(Y;X)=H(Y)-H(Y|X) (1)
Wherein H ( Y ) = - Σ y ∈ { 0,1 } p ( y ) log p ( y ) With H ( Y | X ) = - Σ y ∈ { 0,1 } Σ x ∈ { 1,2,3 , } | X | p ( y | x ) log p ( y | x ) Be respectively entropy and conditional entropy;
(3) utilization is based on the doubtful nosogenetic SNP group of candidate in the set omega of genetic searching method (Factor based Genetic Search Algorithm, FGSA) the search SNP of factor;
(4) according to frequency-related prioritization criteria, SNP is sorted;
(5) output comes top frequency greater than the SNP of thresholding.
The remarkable result that the present invention has compared with prior art:
The invention discloses a kind of method of from full genome single nucleotide polymorphism SNP, filtering with the irrelevant SNP of complex disease---based on the genetic searching method FGSA (factor based genetic search algorithm) of factor, its take factor as basic guarantee to related a little less than single and the disease, join together search with the strongly connected SNP factor of disease, it is peeled off layer by layer criterion and guarantees to be unlikely to when a plurality of paathogenic factors exist owing to other also search of strongly connected factor has been covered in the existence of certain strong relation factor, the frequency that adopts-related prioritization criteria has then guaranteed the relative stability of the solution that searches out like this, has following remarkable result:
(1) this method can realize from full genome to effective filtration of disease independent SNP, that is: filter out that those are single or join together all and the irrelevant SNP of disease phenotype, and make the quantity of the residue SNP after the filtration as far as possible little.
(2) this method can keep those by the corresponding SNP of pathogenesis that other pathogenesis covered, thereby lays the foundation for follow-up these nosogenetic discoveries.
(3) this method can be processed full genome SNP scale, such as the SNP data more than 10,000, such as processing AMD (causing the visual deprivation at the center of looking owing to retina damages) data, wherein contain 103611 SNP, 96 case samples and 50 check samples.The SNP relevant with AMD that finds with other several different methods be (seeing experiment contrast effect declaratives for details) in the set of the SNP after this method is filtered all.
Description of drawings
Fig. 1 is the process flow diagram of FGSA algorithm of the present invention;
Fig. 2 is the process flow diagram of the genetic algorithm among Fig. 1.
Embodiment
See figures.1.and.2, method of the present invention is called the FGSA method, and its specific implementation step is as follows:
Step 1 is carried out pre-service and initialization to the SNP data.
(1.1) can be equal to the principle for the treatment of according to the variation of arbitrary gene in the homologue allele to the impact of disease, the SNP data are processed into
Figure BSA00000921376200041
X wherein i∈ { 0,1,2,3} dValue for the corresponding site of SNP i: two allele on the corresponding site get 1 when for homozygote AA, get 2 during homozygote aa, get 3 when heterozygote Aa or aA, get 0 when the allele data on the corresponding site lack; y i{ 1,2} is sample x to ∈ iThe class mark, 1 the expression disease group, 2 the expression control groups, N is the number of sample in the SNP data, d is the number of SNP in the data, only contains 0,1,2,3 data, wherein 0 the expression missing data, the set of related SNP is designated as Ω.
Step 2, the definition relevance measure.
(2.1) each SNP subset may become a paathogenic factor, and one has the factor of l SNP to be called l rank factor.With the relevance I (Y between a factor X and the disease Y; X) be defined as mutual information MI (Y between X and the Y; X), be expressed as formula (1):
I(Y;X)=H(Y)-H(Y|X) (1)
Wherein H ( Y ) = - Σ y ∈ { 0,1 } p ( y ) log p ( y ) With H ( Y | X ) = - Σ y ∈ { 0,1 } Σ x ∈ { 1,2,3 , } | X | p ( y | x ) log p ( y | x ) Be respectively entropy and conditional entropy.
Step 3 uses the FGSA method to search for the doubtful nosogenetic SNP group of candidate in the set omega of SNP.
(3.1) genetic algorithm parameter and FGSA correlation parameter are set
SNP number l in the SNP group that setting will be looked for.Genetic algorithm parameter is set, comprises population scale N l, crossover probability P c, the variation probability P m, iterations Iter lThe FGSA correlation parameter is set, comprises repeat search times N um l, every less important interactive number M in l rank that looks for l
(3.2) the initialization interactive number k=1 in l rank that will look for; If doubtful pathogenesis related SNP S set 1=Φ; If the SNP that investigates set is Ω *=Ω;
(3.3) random initializtion population: from Ω *In SNP in random generate l different effective SNP numbering, consists of one by one body of a l rank factor conduct.Amount to and generate N lIndividual formation population;
(3.4) fitness calculates: to each individuality in the population, calculate mutual information as this individual fitness according to formula (1);
(3.5) select operation: according to each individual fitness numerical value in the population, adopt roulette mode and elitism strategy to select operation, select N lIndividuality;
(3.6) interlace operation: from N lAppoint in the individuality and get two individualities, select at random the point of crossing, to these two individualities, with crossover probability p cThe part of back, point of crossing is exchanged, form two new individualities;
(3.7) mutation operation: to each individuality, generate at random an effective SNP numbering, according to the variation Probability p mReplace the arbitrary SNP numbering in this individuality;
(3.8) produce population of future generation: all individualities that obtain after being operated by (3.7) are as population of future generation;
(3.9) if iterations less than Iter l, then jump to (3.4);
(3.10) get the individuality that has the maximum adaptation degree in the population and be designated as s kJoin doubtful and SNP S set disease association kIn, and from the SNP set omega *In remove this individuality, namely
Figure BSA00000921376200051
(3.11) repeat (3.3)~(3.10) M lInferior, at every turn to S kMiddle adding is body one by one, and removes this individuality from data, through M lInferior repetition gets S k
(3.12) reset Ω *=Ω repeats (3.2)~(3.11) and amounts to Num lInferior, thus the SNP set obtained S 1 , S 2 , . . . , S Num l ;
(3.13) output comprises the interactive SNP set in each l rank
Figure BSA00000921376200053
Step 4, the frequency of each SNP among the calculating v.
(4.1) calculating of frequency: the number of times that the doubtful nosogenetic SNP that finds in the step 3 is occurred is as the frequency of this SNP;
(4.2) according to frequency-related prioritization criteria SNP is sorted, that is: by frequency large preferential, the large preferential principle of single SNP and disease association mutual information during with frequency, SNP is sorted.
Step 5, output comes top frequency greater than the SNP of thresholding.
Wherein, the enforcement of genetic algorithm is with from Ω in (3.3) in the step 3~(3.8) *In SNP in the random l of generation different effective SNP numbering consist of a l rank factor as body one by one, and the cross and variation of passing through them obtains the genetic evolution searching method of more excellent individuality, embodied the characteristics that the genetic evolution SNP take factor as the basis of this method filters, thereby guaranteed related a little less than single and the disease, the search with the strongly connected SNP factor of disease of joining together; (3.9) in the step 3 then join in the SNP set of doubtful and disease association by the individuality that will have the maximum adaptation degree in the population, and from data, remove this individuality, realization is peeled off layer by layer to paathogenic factor, thereby guarantees to be unlikely to when a plurality of paathogenic factors exist owing to other also search of strongly connected factor has been covered in the existence of certain strong relation factor; Frequency in the step 4-related prioritization criteria has then guaranteed the relative stability of the solution that searches out like this.
The present invention will be described in more detail our legal effect by following experimental example.These experimental example are used for purpose for example, and do not attempt to limit the scope of the invention.
In following experiment, the parameter of this method is taken as: N l=10, P c=0.9, P m=0.25, Iter l=5000, Num l=20, M l=8.
Experiment 1: the filtration of emulated data SNP.
Emulated data is on the basis of the true SNP data of New York population, is added by the biologist that 7 known SNPs groups relevant with complex disease obtain, and these 7 SNP groups are different from the correlation model of disease.Data have two groups: first group comprises 2000 samples, and 100 SNP represent with SNP100; Second group comprises 2000 samples, and 2000 SNP represent with SNP2000.Data message such as table 1.
Table 1 experimental data
Figure BSA00000921376200061
These two groups of data are tested, wherein parameter (the Iter of FGSA 2, Iter 3) respectively to two groups of data values, be respectively (Iter 2, Iter 3)=(600,1100), (1200,1700), (M 2, M 3) identical to 2 groups of data, be (M 2, M 3)=(8,5).Just the frequency thresholding is different in carry out two groups experiments: for 2 groups of data, thresholding is taken as respectively Th=3, and 2 and and be taken as Th=1.
FGSA algorithm and several characteristic feature system of selection (minimal redundancy maximal correlation method--mRMR, maximum entropy method--ME etc.) experimental result is shown in table 2 and the table 3, wherein "-" expression calculated amount is excessive and do not find the result, compressibility be the SNP number that obtains after filtering with data in the ratio of SNP sum, the factor rate be the true paathogenic factor number that comprises among the SNP after filtering with data in the ratio of true paathogenic factor number.
In the table 2, a is true positives, and the factor rate is defined as and comprises the paathogenic factor number among the SNP after the filtration and account for the number percent that comprises the paathogenic factor sum before filtering among the Ω; The number percent of the SNP sum before compressibility is filtered for the SNP number after filtering accounts among the Ω.
Table 2FGSA-frequency Algorithm Performance reaches the comparison with other algorithm
Table 3FGSA-frequency Algorithm Performance (getting the result that the frequency thresholding is Th=1)
Figure BSA00000921376200072
Can be found out by table 2 and table 3:
(a) the FGSA method is in the situation that compressibility is substantially suitable, and its factor rate has shown algorithm complexity all greater than additive method;
(b) the factor rate of mRMR only is 3/7, and namely selected SNP is integrated into and only completely in 7 paathogenic factors has comprised 3, and the factor rate of FGSA method is 5/7~6/7, has namely completely at least comprised 5 or 6, obviously shows the validity of FGSA method;
(c) compressibility increases with SNP scale N, and the larger compressibility of N is higher, has been issued to 97% 2000 SNP situations, complete show that the FGSA method is more effective to genome SNP situation;
(d) when the SNP in the factor is too much, dimension disaster can occur, this is that length is 5 paathogenic factor with these methods one of complete major reason of choosing of failing all the time.
Experiment 2: the SNP of true AMD data filters.
AMD is the medical condition that affects the elderly, and he can cause the visual deprivation at the center of looking owing to retina damages.AMD data (seeing Table 4) contain 103611 SNP, and 96 case samples and 50 check samples wherein have 0.811% loss of data.Sample numerical value is 0,1,2,3, wherein 0 this loss of data of expression.The AMD data are usually used in the SNP association analysis, and existing many methods are used on the AMD data and have obtained some relevant associated genes.Table 5 has provided the SNP that other method of usefulness (such as BOOST, AntEpiSeeker, epiMODE, BEAM, HapForest, Single-Marker, the methods such as DASSO-MB etc.) of finding is also found out.
Table 4AMD data
The SNP number The case sample number The control sample given figure The shortage of data ratio
The AMD data 103611 96 50 0.822%
The SNP tabulation that other method of usefulness that table 5FGSA method is found out is also found out
Figure BSA00000921376200091
As can be seen from Table 5:
(a) SNP that finds of the SNP that finds of FGSA and other method has very high plyability, has shown the validity of FGSA method.
(b) FGSA give that additive method do not find out but also very high SNP of frequency comprise being numbered 19405,6693,56674,80178,76784,92627,46516,88957,42568,51958,41808,47428 SNP, its frequency is respectively 35,26,26,25,24,24,22,21,20,15,11,9, do not get rid of the possibility of they and disease association, especially, do not get rid of in the set that just SNP and above-mentioned SNP consist of in table 5 of real paathogenic factor, or on the subset that wherein some SNP consists of.

Claims (1)

1. the method for the irrelevant SNP of filtration and complex disease from full genome comprises the steps:
Step 1 is carried out pre-service and initialization to full genome SNP data
According to the variation of arbitrary gene in the homologue allele impact of disease is equal to the principle for the treatment of, the pre-service of SNP data is become:
Figure FSA00000921376100011
X wherein i∈ { 0,1,2,3} dValue for the corresponding site of SNP i: two allele on the corresponding site get 1 when for homozygote AA, get 2 during homozygote aa, get 3 when heterozygote Aa or aA, get 0 when the allele data on the corresponding site lack; y i{ 1,2} is sample x to ∈ iThe class mark, 1 the expression disease group, 2 the expression control groups, N is the number of sample in the SNP data, d is the number of SNP in the data, and remembers that the set of related SNP is Ω;
Step 2, the definition relevance measure
With the relevance I (Y between a SNP factor X and the disease Y; X) be defined as mutual information MI (Y between X and the Y; X), be expressed as formula (1):
I(Y;X)=H(Y)-H(Y|X) (1)
In the formula H ( Y ) = - Σ y ∈ { 0,1 } p ( y ) log p ( y ) Be entropy, H ( Y | X ) = - Σ y ∈ { 0,1 } Σ x ∈ { 1,2,3 , } | X | p ( y | x ) log p ( y | x ) Be conditional entropy;
Step 3, utilization is based on the doubtful nosogenetic SNP group of candidate in the set omega of genetic searching method " FGSA " the search SNP of factor, thus the SNP that filtration and complex disease have nothing to do;
(3.1) genetic algorithm parameter and FGSA correlation parameter are set
SNP number l in the SNP group that setting will be looked for.Genetic algorithm parameter is set, comprises population scale Nl, crossover probability P c, the variation Probability p m, iterations Iter lThe FGSA correlation parameter is set, comprises repeat search times N um l, every less important interactive number M in l rank that looks for l
(3.2) the initialization interactive number k=1 in l rank that will look for; If doubtful pathogenesis related SNP S set 1=Φ; If the SNP that investigates set is Ω *=Ω;
(3.3) random initializtion population: from Ω *In SNP in random generate l different effective SNP numbering, consists of a l rank factor as body one by one, amount to generation N lIndividual formation population;
(3.4) fitness calculates: to each individuality in the population, calculate mutual information as this individual fitness according to formula (1);
(3.5) select operation: according to each individual fitness numerical value in the population, adopt roulette mode and elitism strategy to select operation, select N lIndividuality;
(3.6) interlace operation: from N lAppoint in the individuality and get two individualities, select at random the point of crossing, to these two individualities with crossover probability p cThe part of back, point of crossing is exchanged, form two new individualities;
(3.7) mutation operation: to each individuality, generate at random an effective SNP numbering, according to the variation Probability p mReplace the arbitrary SNP numbering in this individuality;
(3.8) produce population of future generation: all individualities that obtain after being operated by step (3.7) are as population of future generation;
(3.9) if iterations less than Iter l, then jump to step (3.4);
(3.10) get the individuality that has the maximum adaptation degree in the population and be designated as s kJoin doubtful and SNP S set disease association kIn, and from the SNP set omega *In remove this individuality, namely
Figure FSA00000921376100021
(3.11) repeating step (3.3)~(3.10) M lInferior, at every turn to S kMiddle adding is body one by one, and removes this individuality from data, through M lThe inferior S that repeats to get k
(3.12) reset Ω *=Ω repeats (3.2)~(3.11) Num lInferior, obtain the SNP set S 1 , S 2 , . . . , S Num l ;
(3.13) output comprises the interactive SNP set in each l rank
Figure FSA00000921376100023
Step 4, the frequency of each SNP among the calculating v
(4.1) frequency of each SNP calculates: the doubtful nosogenetic SNP occurrence number that finds by step (3) is as the frequency of this SNP;
(4.2) according to frequency-related prioritization criteria SNP is sorted, that is: press large preferential of frequency, the large preferential principle of single SNP and disease association mutual information sorts SNP during with frequency;
Step 5, output comes top frequency greater than the SNP of thresholding.
CN2013102796270A 2013-06-25 2013-06-25 Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome Pending CN103366100A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013102796270A CN103366100A (en) 2013-06-25 2013-06-25 Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013102796270A CN103366100A (en) 2013-06-25 2013-06-25 Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome

Publications (1)

Publication Number Publication Date
CN103366100A true CN103366100A (en) 2013-10-23

Family

ID=49367427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013102796270A Pending CN103366100A (en) 2013-06-25 2013-06-25 Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome

Country Status (1)

Country Link
CN (1) CN103366100A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462868A (en) * 2014-12-11 2015-03-25 西安电子科技大学 Genome-wide SNP (single nucleotide polymorphism) site analysis method based on combination of random forest and Relief-F
CN108256293A (en) * 2018-02-09 2018-07-06 哈尔滨工业大学深圳研究生院 A kind of statistical method and system of the disease association assortment of genes
CN110135057A (en) * 2019-05-14 2019-08-16 北京工业大学 Solid waste burning process dioxin concentration flexible measurement method based on multilayer feature selection
CN110428897A (en) * 2019-06-19 2019-11-08 西安电子科技大学 Medical diagnosis on disease information processing method based on SNP pathogenic factor Yu disease association relationship
CN112270957A (en) * 2020-10-19 2021-01-26 西安邮电大学 High-order SNP (Single nucleotide polymorphism) pathogenic combination data detection method, system and computer equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894216A (en) * 2010-07-16 2010-11-24 西安电子科技大学 Method of discovering SNP group related to complex disease from SNP information
CN102629305A (en) * 2012-03-06 2012-08-08 上海大学 Feature selection method facing to SNP (Single Nucleotide Polymorphism) data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894216A (en) * 2010-07-16 2010-11-24 西安电子科技大学 Method of discovering SNP group related to complex disease from SNP information
CN102629305A (en) * 2012-03-06 2012-08-08 上海大学 Feature selection method facing to SNP (Single Nucleotide Polymorphism) data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JUNYING ZHANG等: "A Genetic Algorithm to Filter SNPs for SNP Association Study", 《WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), 2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCES ON》, vol. 1, 7 December 2012 (2012-12-07), pages 684 - 687, XP032391329, DOI: doi:10.1109/WI-IAT.2012.146 *
蒋胜利: "高维数据的特征选择与特征提取研究"", 《中国博士学位论文全文数据库 信息科技辑》, vol. 2011, no. 12, 15 December 2011 (2011-12-15), pages 138 - 49 *
蒋胜利等: "基于多重遗传算法的单核苷酸多态性特征选择", 《四川大学学报(工程科学版)》, vol. 42, no. 2, 20 March 2010 (2010-03-20), pages 132 - 138 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462868A (en) * 2014-12-11 2015-03-25 西安电子科技大学 Genome-wide SNP (single nucleotide polymorphism) site analysis method based on combination of random forest and Relief-F
CN104462868B (en) * 2014-12-11 2017-04-05 西安电子科技大学 A kind of full-length genome SNP site analysis method of combination random forest and Relief F
CN108256293A (en) * 2018-02-09 2018-07-06 哈尔滨工业大学深圳研究生院 A kind of statistical method and system of the disease association assortment of genes
CN110135057A (en) * 2019-05-14 2019-08-16 北京工业大学 Solid waste burning process dioxin concentration flexible measurement method based on multilayer feature selection
CN110135057B (en) * 2019-05-14 2021-03-02 北京工业大学 Soft measurement method for dioxin emission concentration in solid waste incineration process based on multilayer characteristic selection
US11976817B2 (en) 2019-05-14 2024-05-07 Beijing University Of Technology Method for detecting a dioxin emission concentration of a municipal solid waste incineration process based on multi-level feature selection
CN110428897A (en) * 2019-06-19 2019-11-08 西安电子科技大学 Medical diagnosis on disease information processing method based on SNP pathogenic factor Yu disease association relationship
CN110428897B (en) * 2019-06-19 2022-03-18 西安电子科技大学 Disease diagnosis information processing method based on relation between SNP (Single nucleotide polymorphism) pathogenic factor and disease
CN112270957A (en) * 2020-10-19 2021-01-26 西安邮电大学 High-order SNP (Single nucleotide polymorphism) pathogenic combination data detection method, system and computer equipment
CN112270957B (en) * 2020-10-19 2023-11-07 西安邮电大学 High-order SNP pathogenic combination data detection method, system and computer equipment

Similar Documents

Publication Publication Date Title
Esselstyn et al. Single-locus species delimitation: a test of the mixed Yule–coalescent model, with an empirical application to Philippine round-leaf bats
Rozenfeld et al. Spectrum of genetic diversity and networks of clonal organisms
Powell Accounting for uncertainty in species delineation during the analysis of environmental DNA sequence data
CN103366100A (en) Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome
Hotaling et al. Demographic modelling reveals a history of divergence with gene flow for a glacially tied stonefly in a changing post‐Pleistocene landscape
Mota et al. The evolution of haploid chromosome numbers in the sunflower family
Laurens et al. Delineation of the genomics field by hybrid citation-lexical methods: interaction with experts and validation process
Dale et al. Quantitative analysis of ecological networks
Fernandes et al. Coevolution creates complex mosaics across large landscapes
Collautti et al. SNNAP: Solver-based nearest neighbor for algorithm portfolios
Basiri et al. A novel hybrid ACO-GA algorithm for text feature selection
Bona et al. Unfavourable habitat conditions can facilitate hybridisation between the endangered Betula humilis and its widespread relatives B. pendula and B. pubescens
Septiarini et al. Model assessment of land suitability decision making for oil palm plantation
Mendes de Paula et al. Establishment of gene pools for systematic heterosis exploitation in sugarcane breeding
CN104573004B (en) A kind of double clustering methods of the gene expression data based on double rank genetic computations
Thaher et al. An enhanced evolutionary based feature selection approach using grey wolf optimizer for the classification of high-dimensional biological data
Azam et al. Game-theoretic rough sets for feature selection
Long et al. A conservation-oriented SNP panel for Smallmouth Bass (Micropterus dolomieu), with emphasis on Interior Highlands lineages
Souza et al. Exploring genotype× environment interaction in sweet sorghum under tropical environments
Murphy et al. Invasiveness in exotic plants: immigration and naturalization in an ecological continuum
Yoshida et al. Multimodal genetic programming by using tree structure similarity clustering
Migdałek et al. Measuring population-level plant gene flow with topological data analysis
CN106533651A (en) Cost-based complex network side attack method under weight changing
CN112035545A (en) Method for maximizing competitive influence considering non-active nodes and community boundaries
Ferebee et al. Exploring the utility of regulatory network-based machine learning for gene expression prediction in maize

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20131023