CN109658980A - A kind of screening and application of excrement gene marker - Google Patents

A kind of screening and application of excrement gene marker Download PDF

Info

Publication number
CN109658980A
CN109658980A CN201810227886.1A CN201810227886A CN109658980A CN 109658980 A CN109658980 A CN 109658980A CN 201810227886 A CN201810227886 A CN 201810227886A CN 109658980 A CN109658980 A CN 109658980A
Authority
CN
China
Prior art keywords
gene
marker
excrement
dna
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810227886.1A
Other languages
Chinese (zh)
Other versions
CN109658980B (en
Inventor
肖勤
钱逸维
陈生弟
杨晓东
徐绍卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Original Assignee
Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd filed Critical Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Priority to CN201810227886.1A priority Critical patent/CN109658980B/en
Publication of CN109658980A publication Critical patent/CN109658980A/en
Application granted granted Critical
Publication of CN109658980B publication Critical patent/CN109658980B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to field of biomedicine more particularly to a kind of screenings and application of the excrement gene marker based on high-throughput metagenomics sequencing analysis.The present invention is analyzed by extracting faeces DNA using shotgun progress metagenomics sequencing and ROC curve, filters out gene marker, and verify gene marker by fluorescence real-time quantitative PCR and obtain its consistency.The present invention carries out macro genome high throughput air gun sequencing technologies measurement in Chinese PD patient's excrement for the first time, establish the screening technique of excrement gene marker, it was found that numerous differential genes in PD patient's excrement, it is screened out from it 25 excrement gene markers, it can be used as PD diagnosis marker, there is important academic significance and application prospect for enriching and understanding in depth the pathomechanism etc. of PD for the auxiliary diagnosis of PD patient.

Description

A kind of screening and application of excrement gene marker
Technical field
The present invention relates to field of biomedicine more particularly to a kind of excrement based on high-throughput metagenomics sequencing analysis Just the screening and application of gene marker.
Background technique
Parkinson's disease (Parkinson's disease, PD) is the nerveous system that the common one kind of the middle-aged and the old is slowly in progress System degenerative disease, seriously affects the quality of life of patient.Main clinical characteristics are motor symptoms, including static tremor, flesh are strong Directly, bradykinesia and posture abnormal gait etc..With the quickening of China human mortality aging process, PD disease incidence increasingly increases, hair Patient's number increases year by year, and the PD patient in China is more than 2,000,000 at present.The main pathological change of PD is alpha-synapse nucleoprotein The abnormal aggregation of (a-synuclein, a-syn) cause substantia nigra of midbrain compact part dopaminergic neuron selectivity lose and it is residual Deposit the formation of Lewy body in neuron plasma.When there are typical motor symptoms in PD patient, substantia nigra of midbrain dopamine at this time The existing serious loss of serotonergic neuron, so PD early diagnosis, early intervention are particularly important.The main base of the diagnosis of PD at present In the clinical manifestation of patient and the clinical experience of doctor, failed to pinpoint a disease in diagnosis which results in many PD patients or mistaken diagnosis.Therefore, on the one hand It needs to find sensitive reliable biological markers, easy to operate, economic auxiliary examination methods is on the other hand needed to examine for PD It is disconnected that help is provided.
It is existing at present to be concentrated mainly on clinical symptoms, neuroimaging and biochemistry about the research of PD biological markers Several respects:
1, PD clinical symptoms: motor symptoms are the main foundations of current PD diagnosis, however bradykinesia, myotonia and static Property tremble these typical motor symptoms only when dopaminergic neuron largely lacks just occur, cannot function as early diagnosing Marker.The generation of non-motor symptoms such as hyposphresia, REM sleep behavior disorder and constipation can be far earlier than movement disorder Shape, but it is not high to the susceptibility and specificity of PD diagnosis.
2, neuroimaging: the Imaging Method for the PD diagnosis being currently known has Positron Emission Computed Tomography (positron emission tomography, PET), single photon emission computed tomography (single-photon Emissioncomputed tomography, SPECT), magnetic resonance imaging (magnetic resonance imaging, MRI) Deng because expensive, practicality is not strong or poor specificity is difficult to promote.
3, biochemistry: the biochemical markers research about PD early diagnosis is related to being immunized, inflammation, oxidative stress, The multiple fields such as Apoptosis, wherein with α-syn most Research Prospects.Borghi etc. detects α-syn in cerebrospinal fluid for the first time Afterwards, the research in relation to this biological markers is concerned.However, the variation of PD Cerebrospinal Fluid in Patients α-syn expression is deposited There is larger dispute.Due to the limitation of cerebrospinal fluid detection in clinical application, some researchers begin to focus on peripheral blood and saliva α- Syn expression variation.But up to the present, the result of study for coming autoblood and saliva sample is not fully identical.
Therefore, the biology that a susceptibility and specificity are high, can diagnose in the PD of wide clinical application is not yet found so far Learn marker.
Summary of the invention
The object of the present invention is to provide the screenings and application of a kind of excrement gene marker.The object of the invention searching one is quick Sensitivity and the high biological marker for PD diagnosis of specificity.
The present invention is that technical solution used by solving its technical problem is:
A kind of screening technique of excrement gene marker, includes the following steps:
(1) excrement for collecting Parkinsonian and its healthy spouse, saves at -20~-80 DEG C;
(2) excrement gene DNA in (1) is extracted;
(3) analysis of biological information is carried out after shotgun sequencing, is established and is referred to gene set, matches from Parkinsonian and health Objective gene sequence is sifted out in even differential gene as gene marker, carries out Receiver Operating Characteristics (receiver Operating characteristic, ROC) tracing analysis, its disease separating capacity is specified, specific method includes the following:
S1: the quality inspection of excrement gene DNA: using the ultraviolet micro-spectrophotometer of ThermoNanoDrop 2000 and 1~ 3% agarose gel electrophoresis carries out total DNA quality inspection;
S2: gene DNA fragment:
1) in 1.5mL LoBind pipe, with the genome of 1X Low TE Buffer dilution 30ng~1000ng high quality DNA to 120~150 μ L;
2) genomic DNA after diluting is shifted to miniature tube;
3) miniature tube is placed in Covaris Tube Holder, interrupts DNA with Covaris S2 system ultrasound, be arranged Parameter is as follows: duty ratio 10~15%, intensity 4~5, and each cycle cycle-index 200~250 times, time 50-60 second, mode is Frequency sweep, temperature are 6~8 DEG C;
4) sample after transfer ultrasound is concentrated in vacuo to 50 μ L, obtains DNA fragmentation concentrate;
S3: library construction and quality inspection: to genomic DNA carry out fragmentation, end repair, 3 ' ends add A, jointing, Enriching step, completes sequencing sample library construction, and built library uses2.0Fluorometer detectable concentration, The size in Agilent2100 detection library;
S4:DNA sequencing fragment: according to corresponding process shown in cBot User Guide, in Illumina HiSeq sequenator Cluster generation and first is completed on matched cBot to hybridize to sequencing primer;Microarray dataset is Illumina HiSeq Ten, Prepare sequencing reagent according to Illumina User Guide, by machine on the flow cell for carrying cluster, selects paired- End program, carries out both-end sequencing, and the data collection software that sequencing procedure is provided by Illumina is controlled System, and real-time data analysis is carried out, the underproof reads of Adaptor joint sequence is removed, removal includes base number >=3 N Reads cuts off the end of sequence 3 ', removes the base of mass value < 20, and filters and cut off the former length of rear length < 60% reads;Host genome is compared by SOAPaligner, the reads of host's pollution is rejected, cleanreads is obtained;
S5: sequence assembly assembling and predictive genes: splicing cleanreads using metaSPAdes software, uses Filtered data are assembled without the Kmer (21,33,55) of size, it, will inside the scaffolds at gap Scaffolds is broken into new scafting again, and removal length is less than the reads of 500bp, from the assembling result of different Kmer The middle maximum assembling of selection N50 to assembling result as a result, carry out the prediction of open reading window using software MetaGeneMark, so .fna gene order file is reduced into according to obtained .gff comment file afterwards, and is screened out from it the base that length is greater than 100bp Because of sequence, and translate into corresponding amino acid sequence;
S6: gene set building: using CD-HIT by all gene clusters for predicting and, wherein degree of corroboration > 95% is covered Degree > 90% selects to remove remaining redundancy gene after longest gene order in every one kind, from ncbi database downloading from China The macro genomic gene of the enteron aisle gene of east and southern type-2 diabetes mellitus and cirrhosis project, constructs completely new gene set, goes Except the gene supported less than 2 reads, nonredundancy gene set is obtained;
S7: gene preliminary screening: depicting the relational graph of gene dosage Yu non-zero sample quantity, according to gene dosage with it is non- The relational graph of zero sample size carries out preliminary genetic screening after determining non-zero quantity, obtains gene sets;
S8: gene abundance and differential gene statistical check: will be non-superfluous in cleanreads comparison by SOAPaligener Each gene sample again can be calculated according to the length of reads item number and gene that each gene is compared in remaining gene set Relative abundance in product calculates each gene by Wilcoxon rank sum test in Parkinson's disease then according to gene abundance The difference of group and healthy spouse's control group, obtains differential gene intersection, P < 0.05;
S9: it differential gene clustering: is obtained in Parkinsonian and healthy spouse according to Wilcoxon rank sum test Differential gene set carries out gene clusters analysis (metagenomic species, MGS) cluster, calculates gene two-by-two and is owning The Pearson correlation coefficient of Abundances in sample, utilizes single-linkage clustering algorithm, it is desirable that related coefficient is not in its class Less than 0.9, between class related coefficient be not more than 0.1, obtain MGS cluster, MGS cluster according to: a) comprising number of genes be not less than 50 It is a;B) it is annotated according to the library genome to same genus;C) genus annotation rate is greater than 90%, and screening obtains MGS group;
S10: the screening of gene marker: MGS group is screened to obtain significant gene using minimal redundancy maximal correlation algorithm, Again with preferentially Algorithms of Selecting, the gene that can be improved classifying quality is selected from remaining significant gene sets every time, until classification Until effect cannot be promoted, gene marker is obtained;
The minimal redundancy maximal correlation algorithm includes following content:
I) reduce the redundancy of marker gene: the sum of correlation metric of alternate labels gene minimizes two-by-two;
Ii) reinforce the predictive ability of marker gene: otherness of the marker gene under Parkinson's disease group/health spouse's group refers to The sum of mark maximizes;
The preferentially Algorithms of Selecting includes following content:
A. first from significant gene sets, optimal the gene of predictive ability is selected, as selected feature;
B. a gene is selected from remaining significant gene sets again, be added in selected characteristic set, so that selected spy It is best to collect the predictive ability closed;
C. step b is repeated, until predictive ability no longer improves;Export selected characteristic set;
The predictive ability appraisal procedure includes following content:
Using linear discriminant analysis as sorting algorithm, with staying the Matthew related coefficient being calculated under a crosscheck Judgment basis as selected feature predictive ability.Wherein the formula of Matthew related coefficient MCC is as follows:
Wherein TP is the quantity of true positives, and TN is the quantity of true negative, and FP is the quantity of false positive, and FN is the number of false negative Amount;
S11: the verifying of marker separating capacity: gene marker is constructed into Training Support Vector Machines model, it is bent to draw ROC Line, at the same by gene marker testing result be integrated into a Parkinson's disease index (Parkinson ' sdisease index, PDI), the PDI of each sample j, i.e. IjCalculation formula is as follows:
Wherein, AijIt is the relative abundance of i-th of marker of sample j, N and M are gene markers in corresponding Parkinson The enrichment subset of patient and healthy spouse's control group, in addition, | N | and | M | it is the size of the two subsets respectively;
PDI after comparing the conversion between Parkinsonian and healthy spouse's control group by Wilcoxon rank sum test Index differential.
S12: using fluorescence real time aggregation enzyme chain reaction (Real-time polymerase chain reaction, Real-time PCR) technology, the clinical value for carrying out original sample and enlarged sample to the gene marker sequence that filters out comments Valence.
The method of extraction excrement gene DNA includes the following: in (2)
1~2mlinhibitor EX buffer is added in 200mg excrement and mixes well, 12000~15000rpm 1~2min of lower centrifugation takes 500~600 μ l supernatants to be added in 25~30 μ l Proteinase Ks, adds 500~600 μ lbuffer AL is mixed well, and 10~15min is incubated at 65~80 DEG C, is added alcohol of 500~600 μ l volumetric concentrations greater than 95% and is filled Divide and mix, obtains lysate, lysate is transferred in QIAamp Fast DNA Stool Mini Kit kit QIAamp adsorption column is centrifuged 1~2min under 12000~15000rpm, removes waste liquid, takes above-mentioned processed QIAamp absorption 500~600 μ l buffer AW1 are added in column, and 1~2min is centrifuged at 12000~15000rpm, remove waste liquid, take above-mentioned place 500~600 μ l buffer AW2 are added in the QIAamp adsorption column managed, and 3~4min is centrifuged at 12000~15000rpm, Above-mentioned processed QIAamp adsorption column is taken, 3~4min is centrifuged at 12000~15000rpm, is taken above-mentioned processed The ddH that 100~200 μ l are preheated at 65~80 DEG C is added in QIAamp adsorption column20,1~2min is stood under greenhouse, 12000 It is centrifuged 1~2min under~15000rpm, precipitating is taken to obtain excrement gene DNA.
A kind of application of gene marker: the gene marker can be used for preparing diagnosis or auxiliary diagnosis Parkinson's disease Reagent, kit or biochip, or it is used to prepare the reagent, kit or biological core of screening treatment anti-parkinson drug Piece;The kit or biochip of detection reagent containing gene marker, can be used to detect gene marker in fecal sample Expression quantity, for diagnosis or auxiliary diagnosis Parkinson's disease, or screening, preparation treatment anti-parkinson drug.
Bring of the present invention the utility model has the advantages that
(1) present invention carries out macro genome high throughput air gun sequencing technologies measurement in Chinese PD patient's excrement for the first time, builds The screening technique of vertical excrement gene marker, and find numerous differential genes in PD patient's excrement;Filter out 25 excrement genes Marker can be used as PD diagnosis marker, can be used for the auxiliary diagnosis of PD patient, for enriching and understanding in depth the pathology machine of PD System etc. has important academic significance and application prospect.
(2) present invention is by shotgun sequencing inspection as a result, the method for being converted into Real-timePCR detects excrement gene 25 markers are guaranteeing discrimination, susceptibility and while suitable specificity, the PDI being calculated can with auxiliary judgment by Whether examination person suffers from PD, so that clinician be instructed to provide prevention or therapeutic scheme to subject.
(3) using excrement as sample, biological sample is more easy to get compared with blood or cerebrospinal fluid etc. for invention, collect it is noninvasive, easy, Quantity is big, and the macroer genome shotgun method sequencing of price, can by detection excrement gene marker compared to cheap, detection cycle is short It has broad application prospects and dives as disease biomarker to carry out diagnosis or auxiliary diagnosis, these gene markers Can, it is the tool for being suitble to a wide range of clinical diagnosis Parkinson's disease in population of China, provides new thinking for the diagnosis of Parkinson's disease With direction.
Detailed description of the invention
Fig. 1 is the relational graph of gene dosage Yu non-zero sample quantity.
Fig. 2 is non-differential gene ratio chart.
Fig. 3 is that PD and normal healthy controls result are distinguished in macro gene order-checking using 25 gene markers.
Fig. 4 is target gene building schematic diagram.
Fig. 5 is that 25 gene markers distinguish 40 PD and 40 normal healthy controls results in Real-time PCR.
Fig. 6 distinguishes 70 PD and 64 normal healthy controls results using 25 gene markers in Real-time PCR.
Specific embodiment
In order to be easy to understand the technical means, the creative features, the aims and the efficiencies achieved by the present invention, tie below Diagram and specific embodiment are closed, the present invention is further explained.
Embodiment 1
Qualified 65 PD patients and its healthy spouse are screened from 120 sporadic PD patients.PD patient enters Group standard includes: the Parkinson disease of (1) clinical definite, and diagnostic criteria is referring to Britain's think-tank diagnosis of Parkinson disease mark It is quasi-;(2) atypia and secondary parkinson's syndrome are excluded;(3) without cerebral seizure history, dementia and any serious mind Through systemic disease history;(4) without malignant disease, such as tumour, heart failure;(5) without other chronic diseases for influencing intestinal flora: including Diabetes, cirrhosis, cardiovascular disease etc.;(6) without autoimmune, hemorrhagic disease;(7) without serious intestines problem, packet Include irritable bowel syndrome etc..Health, which is matched, drops in group requirement without history of disease.Collecting pair that antibiotic is taken in fecal sample 3 months As not being included in present study.Finally, 40 PD patients (for PD group, wherein 21 women (52.5%), average age is 66.6 ± 7.1 years old) it is included in its healthy spouse's (for healthy control group, containing 19 women (47.5%), average age is 66.3 ± 8.1 years old) Ultimate analysis.
It is all enter group experimental subjects collect fecal sample with unified fecal sampler in the morning.All fecal samples are used Ice bag transports to Ruijin Hospital laboratory, is immediately placed at the fecal sample of collection in laboratory and dispenses on ice, enters 2ml centrifugation It manages and marks, each sample 10 manages (every pipe 200mg) in total, and it is spare to be placed in -80 DEG C of refrigerator storages.
Excrement gene extracts (RNA isolation kit: the kit provided using Qiagen company, Germany: QIAamp Fast DNA Stool Mini Kit)
1) 1ml inhibitor EX buffer is added in every pipe, thoroughly shakes 1min, until all samples are completely mixed It is even;
2) it is put into centrifuge, 14,000rpm centrifugation 1min;
3) a new 2ml centrifuge tube, interior addition 25ul Proteinase K are taken;
4) the supernatant 600ul in 2) after centrifugation is taken to be added in 3) in the centrifuge tube containing Proteinase K;
5) 600ul buffer AL is added, concussion mixes 15s or more;
6) 70 DEG C of incubation 10min;
7) 600ul alcohol (95%-100%) is added, concussion mixes;
8) the 600ul lysate in 7) is carefully transferred to the QIAamp adsorption column in kit, 14,000rpm centrifugations 1min.Waste liquid is outwelled, adsorption column is put into 2ml collecting pipe.This process is until the solution in 7) is collected entirely repeatedly;
9) waste liquid in collecting pipe is removed, 500ul buffer AW1,14,000rpm centrifugation 1min is added;
10) waste liquid is removed, 500ul buffer AW2 is added, 14,000rpm centrifugation 3min abandon collecting pipe;11) will QIAamp adsorption column moves into new 2ml collecting pipe, and 14,000rpm centrifugation 3min abandon collecting pipe;
12) adsorption column is moved into the 1.5ml centrifuge tube of new label, 100ul ddH2O (65 DEG C of preheatings) is added, room temperature is quiet 1min is set, 14,000rpm centrifugation 1min takes precipitating.
Analysis of biological information is carried out after shotgun sequencing, establishes and refers to gene set, from Parkinsonian and healthy spouse Differential gene in sift out objective gene sequence as gene marker, carry out ROC curve analysis, specify its disease and distinguish energy Power, specific method include the following:
S1: the ultraviolet micro-spectrophotometer of ThermoNanoDrop 2000 and 1% fine jade the quality inspection of excrement gene DNA: are utilized Sepharose electrophoresis carries out total DNA quality inspection;
S2: gene DNA fragment:
1) in 1.5mL LoBind pipe, the genomic DNA of 100ng high quality is diluted extremely with 1X Low TE Buffer 120μL;
2) genomic DNA after diluting is shifted to miniature tube;
3) miniature tube is placed in Covaris Tube Holder, interrupts DNA with Covaris S2 system ultrasound, be arranged Parameter is as follows: duty ratio 10%, intensity 4, and each cycle cycle-index 200 times, the time 55 seconds, mode was frequency sweep, and temperature is 6 DEG C;
4) sample after transfer ultrasound is concentrated in vacuo to 50 μ L, obtains DNA fragmentation concentrate, the DNA fragmentation length is 500bp;
S3: library construction and quality inspection: to genomic DNA carry out fragmentation, end repair, 3 ' ends add A, jointing, Enriching step, completes sequencing sample library construction, and built library uses2.0Fluorometer detectable concentration, The size in Agilent2100 detection library;
S4:DNA sequencing fragment: according to corresponding process shown in cBot User Guide, in Illumina HiSeq sequenator Cluster generation and first is completed on matched cBot to hybridize to sequencing primer;Microarray dataset is Illumina HiSeq Ten, Prepare sequencing reagent according to Illumina User Guide, by machine on the flow cell for carrying cluster, selects paired- End program, carries out both-end sequencing, and the data collection software that sequencing procedure is provided by Illumina is controlled System, and real-time data analysis is carried out, the underproof reads of Adaptor joint sequence is removed, removal includes base number >=3 N Reads cuts off the end of sequence 3 ', removes the base of mass value < 20, and filters and cut off the former length of rear length < 60% reads;Host genome is compared by SOAPaligner, the reads of host's pollution is rejected, cleanreads is obtained;
S5: sequence assembly assembling and predictive genes: splicing cleanreads using metaSPAdes software, uses Filtered data are assembled without the Kmer (21,33,55) of size, it, will inside the scaffolds at gap Scaffolds is broken into new scafting again, and removal length is less than the reads of 500bp, from the assembling result of different Kmer The middle maximum assembling of selection N50 to assembling result as a result, carry out the prediction of open reading window using software MetaGeneMark, so .fna gene order file is reduced into according to obtained .gff comment file afterwards, and is screened out from it the base that length is greater than 100bp Because of sequence, and translate into corresponding amino acid sequence;
S6: gene set building: using CD-HIT by all gene clusters for predicting and, wherein degree of corroboration > 95% is covered Degree > 90% selects to remove remaining redundancy gene after longest gene order in every one kind, from ncbi database downloading from China The macro genomic gene of the enteron aisle gene of east and southern type-2 diabetes mellitus and cirrhosis project, constructs completely new gene set, goes Except the gene supported less than 2 reads, nonredundancy gene set, number 3,367,833 are obtained;
S7: gene preliminary screening: in order to tentatively filter out the gene that the frequency of occurrences is extremely low in the sample, we draw first The relational graph of gene dosage and non-zero sample quantity.It is specific as shown in Figure 1, it is seen then that when the threshold value of non-zero sample quantity is arranged When being 10, the rate of descent of nonredundancy number of genes tends towards stability substantially, therefore for current 80 samples, selects " non-zero sample Quantity=10 " carry out preliminary genetic screening, and 1,118,355 genes are obtained.Subsequent analysis is based on this gene sets.
S8: gene abundance and differential gene statistical check: will be non-superfluous in clean reads comparison by SOAPaligener Each gene sample again can be calculated according to the length of reads item number and gene that each gene is compared in remaining gene set Relative abundance in product calculates each gene by Wilcoxon rank sum test in Parkinson's disease then according to gene abundance The difference of group and healthy spouse's control group, obtains 174,964 differential genes, P < 0.05;Wrong discovery rate (false Discovery rate, FDR) after method correction P, discovery no differential gene in the level of FDR-P < 0.05 therefore depicts Non- differential gene ratio chart (None Hypothesis).Specifically as shown in Fig. 2, cylindricality size more than dotted line can be used as very The estimation of positive findings quantity.This means that although FDR correction is without significant as a result, but wherein there are still a large amount of kidney-Yang sex differernces Gene.This function is realized by the q-value packet in R software.
S9: it differential gene clustering: is obtained in Parkinsonian and healthy spouse according to Wilcoxon rank sum test 174,964 differential gene set carry out gene clusters analysis (metagenomic species, MGS) cluster, calculate base two-by-two Because of the Pearson correlation coefficient of the Abundances in all samples, single-linkage clustering algorithm is utilized, it is desirable that phase in its class Relationship number is not less than 0.9, and related coefficient is not more than 0.1 between class, obtains MGS cluster, and obtain 2003 by statistical analysis MGS (FDR-P < 0.05) includes 72,723 genes.As shown in table 1.MGS cluster according to: a) comprising number of genes be not less than 50 It is a;B) it is annotated according to the library genome to same genus;C) genus annotation rate is greater than 90%, and screening obtains 15 MGS;
1 MGS of table summarizes comprising gene dosage
S10: 51,816 genes for including in 15 MGS the screening of gene marker: are utilized into minimal redundancy maximum phase It closes algorithm and screens to obtain 180 significant genes, then use preferentially Algorithms of Selecting, selecting from remaining significant gene sets every time can The gene for improving classifying quality obtains 25 gene markers until classifying quality cannot be promoted;25 gene marks The sequence of will object is as follows:
>marker 1
ATGGAAGCCTTTCTCGCCAGTGTTGAGAAAGGCTTTTTACTTAATGAAAAGGAGATAAATGTAATGAAA CATCAGTTACGCAGTTCGATGAGTACCGAAGGTCGCCGTATGGCAGGAGCTAGAGCCTTATGGGTAGCCAACGGTAT GAAGAAGGAAATGTTTGGGAAGCCCATTATAGCTATCGTAAATTCGTTTACGCAGTTTGTGCCGGGCCATACACATT TACACGAAATAGGCCAGCAGGTAAAAGCGGAAATCGAGAAATTGGGATGCTTTGCTGCTGAATTTAATACCATTGCC ATTGACGATGGCATTGCAATGGGGCACGACGGAATGCTGTATTCTTTGCCTTCCCGTGACATCATAGCCGACAGTGT GGAATATATGGTGAATGCCCATAAGGCAGATGCGATGGTCTGCATTTCGAATTGCGACAAGATTACTCCGGGAATGC TTATGGCCGCAATGCGACTGAATATCCCTACTGTATTTGTTTCCGGCGGTCCGATGGAAGCTGGAGAGTGGGGAGGC ATGCATCTTGATTTGATAGATGCCATGATTAAATCGGCCGATTCAACAGTGAGTGATGAAGATGTAGCAGAAATAGA ACGTCACGCTTGTCCTGGATGTGGTTGTTGTTCGGGAATGTTTACGGCAAATTCCATGAACTGCCTGAACGAAGCCA TCGGATTAGCCTTACCGGGAAACGGAACAATTGTAGCTACGCATGAAAACCGTAAACGTTTGTTCCGGGATGCTGCC CAGCTTATTGTGAAAAATGCCTATAAGTATTACGAAGAAGGTGATGACAGCGTGTTGCCGCGCAGTATCGCTACCCG TCAGGCCTTCCTGAATGCCATGACACTGGATATTGCGATGGGAGGTTCTACCAACACCGTATTACATTTGTTGGCTG TTGCTCATGAAGCAGAAGTAGATTTCAAGATGGATGACATCGATATGTTGTCGCGTCGGGTACCTTGTTTGTGCAAG GTCGCTCCCAATACACAGAAATATCATATTCAGGATGTCAATCGTGCTGGTGGTATCCTTAATATATTAGGAGAACT TGCAAAGGGAGGACTGCTCGATACTACCGTACACCGAGTCGATGGTTCTACATTAGGAGAGGCAATTGCCAAATACA ATATCTGTAAACCGGATGTAGATGCTGAGGCTATGCGTATTTATACAAGTGCGCCGGGCGGTAAATTCAATATCCAG TTGGGCTCTCAAAACAACACGTATAAAGAACTTGATACAGATCGTGCTACTGGTTGCATCCGCGATTTACAGCATGC TTACAGCAAAGACGGAGGATTGGCGGTACTGAAAGGGAATATTGCGCAAGATGGTTGTGTGGTAAAGACTGCCGGAG TAGACGAAAGCATCTGGAAGTTCTCTGGTCCAGCCAAAGTGTTCGATTCACAGGAATCGGCATGCGAAGGTATCCTT GGTGGCAAAGTTGTGAGCGGAGATGTAGTCGTCATTACTCACGAAGGTCCGAAAGGTGGACCGGGCATGCAGGAAAT GCTTTATCCTACTTCTTATATCAAATCGAAACATTTGGGAAAAGAATGTGCCTTGATTACAGATGGCCGGTTTAGTG GAGGTACTTCCGGTCTGAGTATCGGACATATTTCTCCCGAAGCAGCAGCCGGCGGAAATATCGGTAAGATAGTAGAC GGAGATATTATTGAGATTGATATACCAAACCGTACGATAAATGTGAAACTCACCGACGAGGAACTGGCAGCACGCCC CATGACACCTGTTACTCGTAACCGTCAGGTGTCGAAGGCATTGCGAGCTTATGCCAGTATGGTAAGCTCAGCCGATA AGGGTGGAGTAAGAATTGTATAA
>marker 2
ACTATTGGTTTTGGCAAAGATACATTTTTCTTCAAAATAAAAATCAACTTCCTTGAGCGTAAAGCAAGC AGCAGTAATATTTTTCACGAAAGCAAGATATTACATAATAAAAATATCAAAGTTGACTTTCTTTACAACAAAATGTA TTTAAACCTTATCAAACAGTTGACTAGCTAA
>marker 3
ACGGCGATCCATGCGTTGCGCGTTGCTATCGATGATGATGGCAAACAACAGAACATTATTAAAACAATT CCGAACAAAGGTTACTTATGCAATAAAGAGTATGTGTCTTTGCCGGACTCATCGCCTGCAAAAAAACTGATTATTAC CGACCAGATACAGGAAACAGTACCAGAAGAGATATCTTCTACAACACCGCCTGTGCCTGTTAGAGAAAAACATAAAG TAATAATGGGGCTAGCGTTAGCCGCGGCTGTTATATTTATTGGCTCGACCGTGGGGTATTCACACTTGAAAAGTACA CCTGATGCCCCGCAACTGGTAAAAGAGTCAATTAACAGTCCAAGAATAAAAATATTTCACCTTAGTTCTGGAAAAGA AAATAATTCCGTACCGTTACTGTCACAAACCCTCGCACCAGGAAAAGACAAACTCGATAATTTATTGTCAGCGCATA ATATGACAATGACAACCTATTATAAATATGTTCGTAACCGACTGGAAAGTGATATTGTTCTGCGCAACCAGTGCAAT GGTAGCTGGCAATTAACTTTTAAT
>marker 4
ATGGAAATGCACAACAGAAATCAGTCCGATTTGTTGCCATATGAGAAACTGGCAATACTGGGTATAGAC CGTGAAAAGGCGGACAACTTACCCATGGAAGTGAAGGAGAAGCTGATGGCAGGTGAAGTGACTCCCATCATGCAGGT ATCCATCAACGCAAGGAACGGCAGTGTCATCACCATGCCGATGAAACTCCAGTTGACTACAGACAGAAACGGTGCTC CAGCATTGATTGCCTATCCGGTACGTGCGGAGCTGGACAGGGAACGCAACAAGATTCTCAATCTGACTCACCAGGAG GCTGAACGTTTGGCCAAAGGTGAGGTAATACAGAAGGCGGTCAACGTAAACGGTGAAAAGACTCAGCAGTATCTCCA GCTTGATCCGGAGACGAAATCCGTCATCCACAGACGTGTAACAGACATAAAGCTGGAGCAGCGTCTGAAGGATATGG AGAAGGTGAACGATATCGAACTGGGTATGCAGCAGAAGCAGCAGATACGTGAGGGAAAACCTGTTGAGCTGAATGTG GGCGGTGAGAAGGTTTCGGTCGGCATTGACCTGAAGGAGCAACAGGGATTCAAGCTTATCAAGGGCGACATGAAGGA ATGGGAAAGGCAGCAGAAAATCCGCTATGACGAACTTCATCCTGAATATCTCGGCCTTGTCATGACTGACAGGAACC GATGGGAATACCAGAAGATGGTGGACAAGCAATCCGTTGAGCGTGCCATTTCACTCTCTCCATCCCGGAAGGAAACA AAAGCAAACAGCCTTAAACTCTAA
>marker 5
ATGGATATAGAAAAGATATTGACCGAAGGAATTGTCTTTAAAGGCAGCAAGCCTTCTACCGCTAAAAAA GAAGATAAGGTAAAGACAAAGGCAAAGAAGAAAACGTATATCACCGGATTGCATGGCTCAGGCTCTGCAAAGATGAA AGCTGAATACCGTCGTCGACGCGCTAACCGTCATAAGAATGGATAA
>marker 6
TTGTTACGCCTGGGTAATGAAGAAAAGAAGAGTAGGATTCTGGACTTGTATGCTGATGAACTAAAAATA GTTGTATACATGTTCAGAAAGTTTGTACAGAAAATAAAAAAAATTCTCGATAAATTTATGAAAATATGCCCAAGTTT TGTGCATAACTTGGGCATGTTTTTTAAAGGCTGA
>marker 7
ATGAGTACCGCTAAATTAGTTAAATCAAAAGCGACCAATCTGCTTTATACCCGCAACGATGTCTCTGAC AGCGAGAAAAAAGCAACAGTAGAGTTGCTGAATCGCCAGGTTATCCAGTTTATTGATCTTTCTTTGATTACCAAACA AGCGCACTGGAACATGCGCGGCGCTAACTTCATTGCCGTACATGAAATGCTGGATGGCTTCCGCACTGCACTGATCG ATCATCTGGATACCATGGCAGAACGTGCAGTGCAGCTGGGCGGTGTAGCTCTGGGGACCACTCAAGTTATCAACAGC AAAACTCCGCTGAAAAGTTACCCGCTGGACATCCACAACGTTCAGGATCACCTGAAAGAGCTGGCTGACCGTTACGC AATCGTCGCTAATGACGTACGCAAAGCGATTGACGAAGCGAAAGATGACGACACCGCAGATATCCTGACCGCCGCGT CTCGCGACCTGGATAAATTCCTGTGGTTTATCGAGTCTAACATCGAATAA
>marker 8
ATGATTTTCCTTTCTCAGGCACAAATCGATGCGTTACTTCTGGAAGATATCCAGGGCGGTGACCTGACC ACGCGAGCATTAAATATTGGACACCAGCATGGCTATATAGAGTTTTTTCTCCGTCAGGGCGGCTGCGTCAGCGGAAT TTCTGTCGCGTGTAAGATGTTAACTACGCTGGGATTAACTATTGATGACGCGGTCAGCGACGGTTCACAAGCGAACG CGGGTCAGCGGCTAATCCGTGCGCAAGGTAATGCCGCAGCACTTCATCAGGGATGGAAGGCAGTCCAGAATGTGCTG GAGTGGAGTTGCGGTGTTTCTGATTATCTCGCTCAAATGCTGGCGTTACTTCGTGAACGTTACCCTGATGGCAATAT CGCCTGCACCCGAAAAGCAATTCCGGGCACCCGCTTACTGGCCTCGCAGGCAATTCTGGCTGCCGGAGGACTGATTC ATCGTGCCGGATGTGCGGAAACCATATTATTGTTTGCCAACCACCGCCATTTTCTTCATGACAATCAGGACTGGTCA GGCGCAATCAATCAGTTACGTCGCCACGCACCCGAGAAGAAAATAGTTGTTGAAGCCGACACGCCGAAAGAGGCAAT CGCCGCGTTACGCGCGCAACCAGACGTACTTCAGCTCGACAAATTTAGTCCGCAACAGGCAACTGAAATTGCGCAAA TAGCACCGTCGCTGGCTCCCCACTGCACGCTGGCTCTGACCGGCGGAATTAATCTGACAACACTCAAAAATTACCTC GACTGCGGCATTCGCCTTTTTATTACCTCCGCGCCTTATTACGCGGCACCTGCTGACATTAAAGTCAGTCTGCAACC CGCAGCCTCTATTTAA
>marker 9
ATGTCTACAACACATAACGTCCCTCAGGGCGATCTTGTTTTACGTACTTTAGCCATGCCCGCCGATACC AATGCCAATGGTGACATCTTTGGTGGTTGGTTAATGTCACAAATGGATATTGGCGGCGCTATTCTGGCGAAAGAAAT TGCCCATGGTCGTGTAGTGACCGTGCGGGTTGAAGGAATGACTTTCTTACGGCCGGTTGCGGTCGGCGATGTGGTGT GCTGCTATGCACGCTGTGTCCAGAAAGGGACGACATCAGTCAGCATTAATATTGAAGTGTGGGTGAAAAAAGTGGCG TCTGAACCAATCGGGCAACGCTATAAAGCAACAGAAGCATTATTTAAGTATGTCGCGGTTGATCCCGAAGGAAAACC TCGCGCCTTACCCGTTGAGTAA
>marker 10
ATGACTACATTACGTCAGCCTTACTACGAACTTAGCCCGGCAGTGTATAACGCACTAGTGCAGGCCAAA ACGGCACTGGAAAATAGCACGCTGGATACCACGTTGATGGAGTTAGTTTATTTGCGCGTCTCGCAAATCAACGGCTG CGCATTTTGTCTGGAGATGCACAGCAAAGCATTGCGCAAATCCGGCGTGCCACAGCACAAACTGGACGCCCTGGCAG GTTGGCGCGTAAGCCATCATTTTGATGAACGCGAGCGGGCGGCGCTGGCATGGGCGGAATCGGTAACCGAAATTGCC AGAACCCATGCGGAAGACGAGGTTTATCAGCCTTTGCTTGAGCATTTCAGCGCAGCAGAGATCAGCGACTTAACGTT TGCCATCGGGCTGATGAATTGTTTTAACCGTCTGGCCGTTTCCATGCGGATGTAA
>marker 11
ATGTCATCTTTACTGATCCCCGCAGACTGGAAAGTTAAACGCTCCACCCCATTCTTTACCAAAGAGAAT GTCCCTGCCGCCCTGCTGAGCCACCACAACACCGCGGCTGGCGTCTTCGGCCAGCTGTGCGTCATGGAGGGCACGGT CACCTATTACGGTTTTGCCAATGAACAAGCGACGGAGCCGGAGAAAAAAGTCGTGATTCATGCCGGGCAGTTTGCTA CCAGTCCGCCGCAGTACTGGCACCGCGTCGAACTCAGTGACGACGCCCGCTTCAATATTCACTTCTGGGTGGCGGAA GAAACCGACGGTGAAAACGGGCTGTTCCACGCGAAGAAAGCGTGA
>marker 12
ATGACGGGAAAACTGATTTGGTTAATGGGGGCCTCTGGCTCCGGAAAAGACAGTCTGTTGACGGAACTC CGCCAGCGGGAACAAACTCAGCTACTGGTAGCGCATCGCTACATCACGCGCGCCGCCAGCGCCGGAAGTGAAAACCA TATCGCCCTGAGCGAGCAGGAGTTTTTTACCCGCGCGGGGCAGAACCTTCTGGCCTTAAGCTGGCACGCCAACGGCC TGTATTATGGCATCGGCGTCGAGATTGACCTCTGGCTGCACGCTGGATTTGACGTGGTGGTCAACGGCTCACGCGCC CATCTGCCGCAGGCGCGGGCGCGCTATCAATCGGCGCTGCTGCCCGTTTGTTTACAGGTTTCGCCGGAGATCCTGCG CCAGCGCCTCGAAAACCGTGGTCGTGAAAATGCCAGTGAAATTAACGCCCGACTGGCACGCGCTGCCCGCTATACTC CTCAGGATTGTCTTACGCTCAATAATGACGGCAGCCTGCGCCAGTCGGTCGACAAGCTGCTGACGCTGATTCATCAG AAGGAGAAACACCATGCCTGCTTGTGA
>marker 13
ATGATCCCCGGTGAATATCACTTTAAGCCCGGTCAGATAGCCCTGAATACCGGCCGGGCAACCTGTCGC GTGGTCGTTGAGAACCACGGCGATCGGCCGATTCAGGTCGGTTCGCACTACCATTTCGCCGAGGTTAACCCGGCGCT GAAGTTCGACCGTCAGCAGGCCGCCGGCTATCGCCTGAATATCCCGGCGGGCACGGCGGTACGCTTTGAACCCGGCC AGAAACGCGAGGTCGAGCTGGTGGCCTTCGCCGGTCACCGCGCCGTCTTCGGCTTCCGCGGCGAGGTCATGGGCCCT CTGGAGGTAAACGATGAGTAA
>marker 14
ATGAATATTATGAATGCAATAAAATATATTAATAAAGTTTTTTTCATCCTTCTCTCTTTGGCTTTTGTT GCAGGATGCGATGATGATAATACGTCAGATCTTCAGTTGAACGGGCAAACTTGGCTGAATGCATTGCAACTCGATGA ATATCAGGGAGTCATTGATAATTCGACTAAAACTGTCGTAGTAGGAGTACCTGTCGATTATAATACAGATGCCATGA AGGTAACGGCAATTGAAGTTTCCGATGCTGCTGAAGCTTCAATGAAGGTTGGAGATATTGCCAATTTCTCTTTTCCT CAAACGATAAAAGTAACAAATGGTGATGCTTATTTAGATTATACTGTTACGGTAAAGCATGATGAAGCTAGAATTAC ATCATTTAAATTGAATAATGAGTATACTGGTATTATTGATGAAGAAAATCATTCTATTTTGGTACGTGTACCGACGA GTATCGATATTACTAGTCTTATACCTACGGTTGAAACAACCCAAGGAGCAATGGTTTCTCCAGCTTCAGGACAAGCT GTTGATTTTACAAGTCCAGTAGAGTTTACAGTTACTTATCAGAGTGCAGTTGCTGTTTATGTTGTAACTGTTGTACA GTCTGATTCTCCTAGTGCTGTATATGTAGGTTTGGCTTCTTCTATAGATGAACTAAATGCAGAAGAAAAAGAAGCTG CTAGCTGGATGTTGAAGAATATTCCTAATGCTCAGTACCTCTCATTTGAAGATGTTCGAACCGGAAGAGTTGATTTA AATGATTGCAAGGTTATGTGGTGGCATCTGCATATCGATGGAGGTATCGATTCAATGGATAAATTTGAAGCTGCTGC CCCGAGTGCTGTACAAGCAGTTTCGAAGGTAAAAGAATTTTATGAAAATGGAGGTAGCTTGTTATTGACTCGTTATG CATCGTTCTATGCAGCTAAGCTGGGTGTTACCAAGGATGGAAATGCTCCTAATAACTGTTGGGGACAGTCGGAAGAG ACAGGCGAAATAACTACAGGTCCTTGGAGCTTCTTTGTAACCAATCATGAAACTCATCCTCTTTATGAAGGTGTGGA TATATCAACGATTGATGGTAAAAAGGGAATTTATATGTGTGATGCAGGCTACCGGATAACTAATAGTACTGCTCAGT GGCACATCGGTTCCGATTGGGGAGGATATGAAGATCTGAATAGCTGGGAAACAAGTCATGGTGGTGTAAGTCTAGGA TATGGCGGTGATGGTGCCGTAGTTGTTTGGGAATACCTTTCAAGTGAAACGACTGGTGGAGTACTTTGTATCGGTTC TGGATGTTATGATTGGTATTCTTATGGAGTAGATACTTCAGCAGATTCATATCATGGTAATGTTGCAAAGATTACTC AGAATGCAATTAATTATTTAACAAGTGAATCAAAATAA
>marker 15
CGCATTTTGTGTTACAGCCACTACAGCCAGCGGGCGCTGGCGCATTTCGGCAATGCAAAAGCAGTCGGC AACCCGCGCTTCGACGCCTGGCATAACGGCACGTTTGACCGCATTCTGCCAGAAAATATTCAACCCGATTCCCGTAA ACCTACGGTGCTCTACGCCCCCACGTTTGGCGCATTAAGCTCCCTGCCCCACTGGGCAGAAAAATTGGGGCGCTTAA GTGGCAATGTGAATCTGATTTGCAAACTGCACCACGGCACCTGCTCGCGCCCGGAAGAAGCTGCATCGCTGGCCCTG GCGCGCAGGCACTTAAAACAGCGCACCGACTCCGTCCACCACACGCTGGCGCTACTGGCAAAGGCCGATTACGTGCT GACCGATAACAGCGGCTTTATTTTCGATGCTATCCACGTTGATAAGCGGGTAATCCTGCTCGACTTTCCGGAAATGG CAGCGTTGCTCGACGGTGAGAAAAGTTACTCCACGCCCGAAAGCGCCGACCAGCAAATCCGCGAAATATTGCCGGTG GCTCATGATGTTGCGGAACTGCGTTATTTGTTGTCAGAGGCGTTTGACTGGGGTGCGTTGCTGGCGCGGCTTAAGGA GATTCGTCATCACTATTGCGATGCGTTTATGGATGGCAAGGCGGGAGAACGGGCGGCGATGGTGATTGTGGAGGCGC TGGCGGGCAAGGAATCGTCAGGGATATGTCGATAA
>marker 16
GTGTGGAGAGAAAAAACGAGAGAAATATTTGGAAGATTAGAAAGATGTTCATATCTTTGCACCGCAATT AAGGAAAAACATATTCCTGATGATAGTTGCCCAGATGGCGGAATCGGTAGACGCGCTGGTCTCAAACACCAGTGGAG TAACATCCATCCCGGTTCGACCCCGGGTCTGGGTACAGAAAGAAAGGCTAAGTAA
>marker 17
ATGTTGTTTGATGGGAAAAGTGAAGGATTATTACTGCTTGAAGTAAAATCAATAGCTTTTGAAAAGCAT TCGTATTATTTGATGAGTGTGCAGACGAAAAGTTATGAGCTCTTTTCATATCTCATATTTGGGCATTTATGTAGCCA TGAAAAACTCTTTTTTATCTTTTATTTAAAAACAATCTAA
>marker 18
ATGGGAAAAAGACTTACAGAAAACCTTTCTTCACTCTATATTGGAGCAGCCAACAGTCTGAAGCCAAAA CAAGCAAAAAGAAAAATCGTCGCATATGTAGAAAGTTACGATGACATATCTTTTTGGCGTTCCCTTCTGGCCGAATA TGAGAATGACAAACGTTACTTTGAAGTCATGCTGCCATCCCGAAGTTCACTTGCCAAAGGGAAAAAATCAGTTTTGA TGAACGAACTGGGAAGTCGTCTGGGAGAAAATATGATTGCTTGTGTTGACAGTGATTACGATTATCTTCTTCAAGGA AGAACTCAAACCTCACGGTACATCATCAACAGCCCCTACGTATTACAGACGTATGCCTATGCCATAGAAAATTACCA CTGCTACGCCGAAGGTCTACACGAGGCATGTGTTACAGCTACGTTGAACGATCATAAACTGATCGATTTTCCAGCCT TCATGAAACTCTATTCAGAAATAGCCTATCCGCTTTTCATCTGGTCTGTATGGTTTTACCGTCACCATAATTTATCA GAATTTTCCTTACTGGATTTCTGCTCTTTCGTAAAGTTAGACCAAGTCAGTACACGCCATCCTGAAAAAAGTCTGGA AATCATGTCCAGAAAAGTCAACCGCAAACTTCATGAGTTAGAGAGACGACATGTAGAAGCCCTGGAAGAAATAGAGG ATATGAAAAAAGAGTTCCGTACATTGGGAGTGTACCCAGACAATACTTACATGTTCATTCAAGGGCATCACATTATG GACAATGTCATACTCCGTCTGCTGATTCCGGTATGCACAGTACTCCGTCGCGAACGTGAACAACAAATACACGATTT AGCTTTACACGACATTCAGTTACATAATGAGCTCACAGCCTATCAGCGTAGTCAGGTTGATATAGAAGTGGTTATCC GTAAAAATCCTCATTACCAGTCATCACCTCTTTATCAGATGATCCGGCGAGATATAGAAGCATTTCTGAAAATAGCC AAATAA
>marker 19
ATGAATTTCACTATTTTTGTAAACAAGAAAATGGAATCAATGAAAACAATAGTTATAACTGGCGGAGCT AAGGGCATTGGGCGCTGCCTGGTAGAGTATTTTGCATCGCAAGGTAATGCAGTCTATTTTATAGATATGGATGCAGA TGCCGTTGCAACTGTAACCGGAAAATTACGTGACAAACAGATGGATGTTCATGGCTTTACAGGAGATATTGCCGACG AATTAGTTTTGCAGAGGTTTGCGGCAAGAGTGATAGAAGAAACTCCGCAAGGCATACACTGTCTGATAAATAATGCT TGTCTGATGAAAGGAGGAGTCCTCAGTGGATGTAGTTACGATGATTTCCTGTATGTGCAGCGTGTAGGTGTTGCAGC TCCTTATCTGCTTAGCAAGCTCTTCATGAACCATTTTGCAGGTTTTGGTTCTATCGTTAATCTTTCATCAACTCGTG CGTTTCAGTCGCAGCCCGATACAGAAGCGTATACAGCAGCCAAAGGAGGCATCACCGCACTTACTCATGCACTTGCT GTGAGCCTCGCCGGCATAGCCCGTGTCAATGCTATTGCACCGGGATGGATTGATACAGGCAAGTTCCACGACGAAAG TTATCTTCCTGATTATAGCGAAGGGGATACCATGCAACATCCGTCGCAACGGGTCGGAGAGCCTGATGATATCGCCC GTGCCGTTGAGTTTCTGTGCGACGAACGCAATTCTTTCATCAACGGACAATGCCTAACCATCGACGGCGGAATGAGT AAGCTGATGGTTTATCACAACGATTGCGGCTGGAGAATAGAATAA
>marker 20
ATGAAAACGATGCTCAAACCCGACAGCCTGCGCAGGGCGCTGACTGATGCCGTCACGGTGCTGAAAACC AGTCCCGAGATGCTGCGGATATTCGTGGATAACGGGAGTATTGCCTCCACGCTGGCGACGTCGCTGTCATTCGAAAA ACGTTACACGCTCAATGTCATTGTGACCGACTTTACCGGTGATTTTGACCTGCTCATCGTGCCGGTGCTGGCGTGGC TGCGGGAAAATCAGCCCGACATCATGACCACCGACGAAGGCCAGAAAAAGGGCTTCACGTTTTATGCAGACATCAAC AATGACAGCAGCTTTGATATCAGTATCAGCCTGATGCTGACCGAGCGCACGCTGGTCAGTGAGGTGGACGGCGCACT GCATGTGAAGAATATCCCGGAACCCCCGCCGCCGGAGCCGGTCAACCGCCCGATGGAGCTTTATATCAATGGCGAAC TGGTGAGCAAGTGGGATGAATGA
>marker 21
ATGGTCTTTTCATTCCAGTCTGCGGAAGCCCCGGAAACGGAAGTTCCTGCGGAAAGCATGGAAAGGACC GGCAATAGAAAGTATTTCTTCCATCCATTTTTTGAAAAGGCAAAGAAACGCATGTTATATTATACGGGAAAAGAAGG CTGGAATTTTCTTGTAACAAGAATTCTTCTTTTCCCGCGCTCCTCCAGACCGGGAAAGGGACGCAAGGGCATATATG GTGCCATGACGGAAAAGAAAGCTGTGTTATCAATCATTTTTCTACCGTTATATACCATATATAGCATATGA
>marker 22
ATGGAAAGCATAAAAAGATCATTCTTTTTAACACTCCTTCTCATGGTGTGTTTGGTAGTGCAAGCACAA AGTTTGCAAGTGTCGGGAACTATCGTTTCCAAATCTGATGGACAGCCTATTATTGGAGCAACTATTCTTGAACAAGG TACGACCAATGGTACGATTACCGATTTTGATGGTAAGTTTTCTTTAACTGTAAAGCAGGGTGCGGAAATTTCTATTT CCTATATAGGTTTTAAAACTCAGGTCGTAAAAGCTCAAAATGTATTGGACATTGTGCTGGAAGAAGACACAGAAGTA CTGGATGAAGTTGTTGTAACAGGTTACACTACTCAGCGTAAGGCAGATTTGACCGGTGCCGTATCTGTTGTTAGTAT GGATGATTTATCTAAGCAGAATGAAAATAATCCGATGAAAGCTTTGCAAGGTCGTGTTCCTGGTATGAATATTACGG CCGATGGTAATCCTAGTGGTTCAACAACAATTCGTATTCGTGGTATTGGTACATTAAATAACAATGACCCTCTTTAT ATTATTGACGGGGTACCTACTAAAGGTGGAATGCATGAATTGAATGGTAATGATATAGAATCAATTCAAGTTTTAAA GGATGCGGCTTCAGCTTCTATTTATGGTTCTCGTGCTGCCAATGGTGTGATTATCATTACAACTAAAAAAGGAAAAG AAGGTCAGTTGAAAATAAATTTTGACGCGTCGGTATCTGCTTCTATGTATAATAATAAAATGGAGGTATTGAATGCA GAAGAATATGGTCAGGCAATGTGGCAGGCTTATGTAAATGGTGGTCAGGACCCAAATACTAACCCCTTGGGATATAA ATATGATTGGGGATATGATAGTAATGGCTATCCAAAATTGAACAGTATTAGTATGTCTCGTTTTCTTGACTCTAATA ACACAGTTCCGTCAGCAGACACAGATTGGTTTGATGAAACGACTCGTACAGGTGTTATTCAGCAATATAATGTATCA GTAAGTAATGGGTCGGAAAAAGGATCTTCTTTCTTTTCTCTTGGATATTATAAGAATCTGGGTGTTATAAAAGATAC TGATTTTGAGAGATTCTCTGCTCGTATGAATTCAGACTATAACCTGATTGGAAAAGTATTGACTGTTGGAGAGCATT TCACTTTGAACCGTACTTCTGAAGTACAGGCTCCAGAAGGTTTCTTGCAAAATGTATTACAGTTTAATCCTTCTTTA CCGGTTTATGATATTAACGGTAATTATGCGGGACCTGTAGGTGGTTATCCTGACCGTGAAAATCCTGTAGCGCGTTT GGATAGAAATTCGGATAATCGTTATACTTATTGGCGTATGTTCGGAGATGCATATATTAATTTAAATCCGTTTAAAG GTTTTAATATCCGTTCTACATTTGGTTTGGATTATTCTCAAAAGCAACAACGTATTTTTACTTATCCTATTACAGAA GGAAATGTTGCAAATGATAAAAATGCAGTAGAAGCAAAACAAGAACATTGGACAAAATGGATGTGGAACGCTGTTGC TACTTATAATTTAGAAGTAGGTAAACATCGTGGCGACGTGATGGTTGGTATGGAATTGAATCGTGAAGATGATAGTT GGTTCTCTGGTTATAAAGAAGATTATAGTATATTAAATCCTGATTATATGTGGCCTAATGCAGGTACAGGAACAGCT CAGGCTTATGGATCGGGTGAAGGTTATTCATTGGTTTCTTTCTTTGGTAAATTAAACTATACCTATGATGATAAATA TTTATTTTCTTTGACTGTTCGTCGTGATGGTTCCTCTCGTTTTGGTAAAAATAATCGTTATGCTACATTTCCGTCTG TTTCATTGGGATGGCGTATTAGCAATGAAAAATTCATGAAGGAGCTTACTTGGCTGAATAATTTAAAAATTCGTGCT TCATGGGGACAGACAGGTAATCAAGAAATTTCAAATATAGCTCGTTATACTATTTATGTTCCTAATTATGGTGTAAC TGAGTCTGGAGGACAAAGCTACGGAACTTCTTATGATATTGCTGGCACAAATGGAGGTAGTATTCTTCAGTCTGGAT TTAAACGTAACCAGATTGGAAACGATGATATTAAATGGGAAACTACCACTCAGACAAACTTGGGTTTTGATTTCTCT TTGTTTGATCAGACTTTATATGGATCATTCGACTGGTTCTATAAGAAAACAACAGATATTCTTGTTCAAATGGCAGG TATTGCAGCTATGGGTGAAGGCAGCACTCAGTGGATTAATGCCGGAGAAATGGAAAACAAAGGTTTCGAGTTGAACT TAGGTTATCGTAATACGACAGCATTTGGCTTGAAATACGATTTGAATGGTAATATCTCTGCTTATCGTAATAAGATA ACTGCTCTTCCTGCAACAGTTGCTGCAAACGGTACATTTGGAGGTAATGGGGTAGAAAGTGTAATTGGGCATCCCAA TGGTGCACAGGTAGGATATGTGGCAGATGGTATCTTCAAATCACAGGCAGAAATTGATAATCACGCAACTCAAGAGG GTGCTGGCTTGGGACGTATTCGCTGGCGTGATTTGGATGGTAATGGTGTTATCAACGAAAAGGATCAGCAGTGGATT TATAATCCGACACCAGCATTCAGTTATGGATTGAATATCTATTTAGAGTATAAAAATTTCGACTTGACAATGTTTTG GCAAGGCGTGCAGGGAGTTGATGTAATTAGTGATTTAAAAAAGGAAACAGACTTATGGAGTGGGCTGAATATTGGTT TCTTAAATAAAGGAAAACGAGTGCTTGATGCATGGACGCCGACTAATCCTGATTCGGATATTCCAGCATTGTCACGT GATGATGTCAATAATGAAAAACGTGTATCTACCTATTTTGTGGAAAATGGTTCATTCTTGAAATTACGTAACCTTCA GATAGGTTACAATGTTCCTCAGAATTTTGCGAAGAAGATGAAAATGGAACGTTTACGCCTTTATTTAAGTGCACAGA ACTTGCTTACTATTAAGAGTAAAGAATTTACAGGAGTTGATCCGGAAAATCCAAATTATGGTTATCCGATTCCTTTG AACTTAACTTTTGGTATTAATGTTAGCTTTTAA
>marker 23
ATGAACCGAATCATTTTAATCTCAATATTCAGCATTCTGACATTCAATGTTATGGCACAGGAAAAGATA GTACAGACAGCAGGGCGCGACCAATTAGGCGAGTTCGCTCCCAAGTTTGCGGAACTCAACGACGACGTCCTTTTCGG CGAGGTATGGAGCCGCACCGACAAGCTCGGTCTTCGCGACCGCAGCTTGGTAACGATTACCTCTCTTATCAGTCAAG GTATCACAGACAACTCGCTAATATATCATTTGCAGTCGGCGAAGAACAACGGCATCACCCGCACCGAAATCGCCGAA ATCATCACGCATATCGGTTTTTACGCGGGTTGGCCGAAGGCATGGGCGGCATTCCGTCTGGCCAAAGACGTATGGGC GGAAGAGACGACCGGCGAGGATGCAAAAGCTGCATTCCAACGTGAAATGATTTTCCCCATCGGCGAACCCAACACGG CGTATGCCAAGTATTTCACAGGCAATAGCTACCTGGCTCCGGTTTCGCGCGAGCAGGTGAATATTTCCAATGTCACT TTCGAACCGCGCTGTCGGAATAACTGGCATATCCATCATGCTACCGAGGGCGGCGGACAAATGCTTATCGGTGTGGC AGGACGCGGCTGGTATCAGGAAGAAGGCAAGCCCGCCGTGGAGATTCTCCCCGGCACGGTCATCCACATTCCTGCCG GCGTAAAGCACTGGCACGGTGCCGCAGCCGACGGCTGGTTCGCACACCTCGCATTCGAGATTGCAGGCGAGAACGCT TCCAACGAGTGGCTGGAGCCGGTTACGGATGAGGAATACGATCGGCTTCAAAAATGA
>marker 24
ATGGCTAAGAAAGCTAAGGGTAATAGAGTACAGGTAATCCTTGAATGCACAGAAATGAAGGATAGCGGT ATGCCGGGAACTTCTCGTTATATTACTACAAAAAACAGAAAAAATACCGCTGAAAGATTGGAATTGAAAAAGTACAA TCCTATTTTGAAAAGAGTAACAGTACATAAAGAAATTAAATAA
>marker 25
ATGGCACTAAATCAAAGAAGACTGAAAATATTGAATCTTATCCGAGAAGACGGACATGCAAAAGTACA GGACTTGAGCAAGATATTTAAAGTAACTGATGTCACTATCAGACAGGATTTAGAAGAATTAGAGAAAATGGGATAC ATAGAAAGAGAGCATGGAGGTGCTCTCTTAAAAGACGTAAGCTCATTTGCCAGAACTGGTCAGTTATTAAACCAGA AAAATATCGAAGAAAAAAGAGAAATTGCTAAGAAAGCAATTAATTTTATAAATGAAGGCGATTGTATCATCCTTGA TTCAGGTTCAACAACAACTGAGATAGCCAAACTGCTGGTTTCGTTTAAAAACCTGACAATAATAACTAATGCACTT AATATTGCTCTGATATTAGGTGAGAACCCAAATATCAGTCTGATAGTTACAGGTGGAGAATTTAAAGCTCCCACAT TGTCATTAACAGGGGAAATGGCTGCACAACCATTCAATAATTTACATGCCAACAAATTATTCTTAGCTACAGCTGG AATCTCTGCTAAAATGCAACTTACATACCCAAGCTTAAGCGACTTAGTTGTAAAATCGGCTATGATAAAATCATCT GATGAAGTGTTCCTTGTAGCAGATTCTTCGAAAATAGGAAATACATCTTTTGCAAGTTTAGGTAGTATTTCTTTAA TAAAGGCATTAATTACAGATAACAAAATATCATACGAAGATATTAAAAAGATAGAAGAACAAAACGTAAAAATTAT CTTCTAA。
The minimal redundancy maximal correlation algorithm includes following content:
I) reduce the redundancy of marker gene: the sum of correlation metric of alternate labels gene minimizes two-by-two;
Ii) reinforce the predictive ability of marker gene: otherness of the marker gene under Parkinson's disease group/health spouse's group refers to The sum of mark maximizes;
The preferentially Algorithms of Selecting includes following content:
A. first from significant gene sets, optimal the gene of predictive ability is selected, as selected feature;
B. a gene is selected from remaining significant gene sets again, be added in selected characteristic set, so that selected spy It is best to collect the predictive ability closed;
C. step b is repeated, until predictive ability no longer improves;Export selected characteristic set;
The predictive ability appraisal procedure includes following content:
Using linear discriminant analysis as sorting algorithm, with staying the Matthew related coefficient being calculated under a crosscheck Judgment basis as selected feature predictive ability.Wherein the formula of Matthew related coefficient MCC is as follows:
Wherein TP is the quantity of true positives, and TN is the quantity of true negative, and FP is the quantity of false positive, and FN is the number of false negative Amount;
S11: the verifying of marker separating capacity: the 25 significant difference genes obtained using above-mentioned screening construct training branch Vector machine model is held, Receiver Operating Characteristics (receiver operating characteristic, ROC) curve, tool are drawn Body result is as shown in Figure 3A.Area (area under ROC, AUC) reaches 0.895 (Fig. 3 A, 95% confidence under cross validation ROC Section: 83.1-96.1%), susceptibility 0.90, specificity 0.75.
In order to facilitate clinical application, 25 gene markers are integrated into a kind of more direct index, i.e. Parkinson's disease refers to Number (Parkinson ' s disease index, PDI).The PDI of each sample j, i.e. IjCalculation formula is as follows:
Wherein, AijIt is the relative abundance of i-th of marker of sample j, N and M are gene markers in corresponding Parkinson The enrichment subset of patient and healthy spouse's control group, in addition, | N | and | M | it is the size of the two subsets respectively;
Referred to by the PDI that Wilcoxon rank sum test compares after converting between Parkinsonian and healthy spouse's control group Number difference (P=2.46e-11), concrete outcome is as shown in Figure 3B.
Embodiment 2
Fluorescence real-time quantitative PCR verifies gene marker
Shotgun is subjected to 25 gene orders screened after metagenomics sequencing and carries out Real-timePCR primer Design, carries out primer optimization and screening according to the Crowds Distribute of product segment, Tm value and target gene with 5 software of primer, has Body primer is as shown in table 2.
The design of 2 target gene corresponding primer of table
2. standard items prepare
Plasmid construction containing target gene.Synthesize objective gene sequence.It is selected properly after synthesis according to gene order feature Restriction enzyme site is connected on protokaryon pET-28a (+) carrier, is sequenced by a generation and is guaranteed that plasmid sequence is correct.Construct plasmid signal Figure is as shown in Figure 4.
3. real time fluorescent quantitative Real-timePCR carries out target gene detection
3.1 Real-timePCR reaction systems
80 faeces DNAs for carrying out macro gene order-checking are subjected to target gene detection, test by Real-timePCR The amplification curve and solubility curve situation of reaction condition and corresponding each gene.WithPremix Ex TaqTM (TliRNaseH Plus) (Takara) carries out the detection of the standard items and sample of each gene marker, quantifies, in 384 orifice plates In reacted.Specific reaction system is as shown in table 3.
Table 3Real-time PCR detects reaction system
3.2 Real-time PCR reaction conditions
It is specific as shown in table 4
Table 4Real-time PCR reaction condition
Circulation includes 95 DEG C of denaturation 30 seconds, 95 DEG C of subsequent 40 circulations 5 seconds and 60 DEG C 30 seconds.Solubility curve reaction Condition be 95 DEG C 15 seconds, 60 DEG C 1 minute.The amplification of DNA is carried out on Applied Biosystems Q7 fluorescence quantitative PCR instrument. After the end of the program, check that amplification curve determines the circulation thresholding (threshold cycles, CT) of each hole detection, i.e., anti- Recurring number experienced when the thresholding that the fluorescence signal during answering in each reaction tube is set by background arrival.Corresponding purpose base Because 10 times of dilution series standard plasmids of segment are run together with respective sample, three multiple holes of each sample.Extend step it Afterwards, every circulation measures first order fluorescence, uses SYBR Green optical filter (492nm is excited, 530nm transmitting).Unify fluorescence number According to be converted to logarithmic scale, and threshold value is to calculate threshold period value.
3.3 calculate standard curve by standard items, and confirm the relationship of CT value and copy number
With Nanodrop spectrophotometric determination plasmid concentration.The copy number of each standard plasmid is calculated using formula: being copied Shellfish No/ μ l=plasmid concentration (mg/ μ l) × 6.022 × 1023/ recombinant plasmid length (bp) × 650, (650=1bp, 6.022 × 1023=Avogadro index).Dilution 1010To 103The DNA of section copy number carries out Real-timePCR, preparation mark as template Directrix curve.It establishes linear detection range and carries out sensitivity technique, batch interior, interassay coefficient of variation is calculated by repeated experiment.Hair The R for the standard curve now drawn out2Between 0.97-1, meet linear programming.Concrete outcome is as shown in table 5.
5 25 gene marker standard curves of table
The 3.4 original macro genome samples of detection
It will carry out 40 PD and 40 healthy control group excrement gene application Real-timePCR inspection of macro gene order-checking Survey 25 target gene.Gene abundance (the 0-1 of result (copy number) the macroer gene order-checking obtained due to Real-timePCR Between relative abundance) compare, existence form difference in data.Therefore common data transfer device is used, it is specific as follows:
NijFor the value after i-th of marker conversion of each sample j, CijIt is obtained for each i-th of marker measurment of sample j Copy number, CjminFor the copy number minimum value that all 25 marker measurments of each sample obtain, CjmaxIt is all for each sample The copy number maximum value that marker measurment obtains.In 40 PD patients and 40 controls, by constructing supporting vector machine model, Obtained AUC is 0.922 (95% confidence interval: 86.4-98.0%, Fig. 5 A), susceptibility 0.88, specificity 0.83.With In macro gene order-checking 25 markers obtain PD in healthy group discrimination, specificity and susceptibility it is suitable.After conversion There are significant difference (Fig. 5 B, P=5.01e-13, Wilcoxon rank sum tests) between PD and healthy group for obtained PDI index.
It can to sum up find the ROC curve obtained after original sample detects 25 gene markers with Real-time PCR method Analysis and PDI are suitable with shotgun result.
3.5 expand clinical sample detection
In addition 70 Parkinson's diseases and 64 normal healthy controls are included in, excrement gene is extracted and carries out Real-timePCR detection 25 A target gene.According to above-mentioned algorithm, obtained AUC is 0.869 (95% confidence interval 81.0-92.9%, Fig. 6 A), susceptibility 0.83, specificity 0.75, the PDI of PD group is significantly higher than healthy control group (P=3E-06, Wilcoxon rank sum test, Fig. 6 B).
To sum up, further determine that excrement gene marker is 25 gene markers filtered out.
The basic principles, main features and advantages of the present invention have been shown and described above.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this The principle of invention, various changes and improvements may be made to the invention without departing from the spirit and scope of the present invention, these changes Change and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and its Equivalent defines.
Sequence table
<110>Ruijin Hospital, Shanghai Jiao Tong University School of Medicine
<120>a kind of screening and application of excrement gene marker
<130> 2018
<141> 2018-03-20
<160> 26
<170> SIPOSequenceListing 1.0
<210> 1
<211> 1863
<212> DNA
<213> Artificial sequence
<220>

Claims (3)

1. a kind of screening technique of excrement gene marker, includes the following steps:
(1) excrement for collecting Parkinsonian and its healthy spouse, saves at -20~-80 DEG C;
(2) excrement gene DNA in (1) is extracted;
(3) analysis of biological information is carried out after shotgun sequencing, establishes and refers to gene set, from Parkinsonian with healthy spouse's Objective gene sequence is sifted out in differential gene as gene marker, carries out Receiver Operating Characteristics (receiver Operating characteristic, ROC) tracing analysis, its disease separating capacity is specified, specific method includes the following:
S1: the ultraviolet micro-spectrophotometer of ThermoNanoDrop 2000 and 1~3% fine jade the quality inspection of excrement gene DNA: are utilized Sepharose electrophoresis carries out total DNA quality inspection;
S2: gene DNA fragment:
1) in 1.5mL LoBind pipe, with the genomic DNA of 1X Low TE Buffer dilution 30ng~1000ng high quality To 120~150 μ L;
2) genomic DNA after diluting is shifted to miniature tube;
3) miniature tube is placed in Covaris Tube Holder, interrupts DNA with Covaris S2 system ultrasound, parameter is set As follows: duty ratio 10~15%, intensity 4~5, each cycle cycle-index 200~250 times, time 50-60 second, mode is frequency sweep, Temperature is 6~8 DEG C;
4) sample after transfer ultrasound is concentrated in vacuo to 50 μ L, obtains DNA fragmentation concentrate, and the DNA fragmentation length is 500~ 600bp;
S3: library construction and quality inspection: fragmentation is carried out to genomic DNA, end is repaired, 3 ' ends add A, jointing, enrichment Step, completes sequencing sample library construction, and built library uses2.0 Fluorometer detectable concentrations, The size in Agilent2100 detection library;
S4:DNA sequencing fragment: mating in Illumina HiSeq sequenator according to corresponding process shown in cBot User Guide CBot on complete Cluster generate and first to sequencing primer hybridize;Microarray dataset is Illumina HiSeq Ten, according to Illumina User Guide prepares sequencing reagent, by machine on the flow cell for carrying cluster, selects paired-end journey Sequence, carries out both-end sequencing, and the data collection software that sequencing procedure is provided by Illumina is controlled, gone forward side by side Row real-time data analysis removes the underproof reads of Adaptor joint sequence, and removal includes the reads of base number >=3 N, The end of sequence 3 ' is cut off, the base of mass value < 20 is removed, and filters and cuts off the former long reads of rear length < 60%;It is logical It crosses SOAPaligner and compares host genome, the reads of host's pollution is rejected, cleanreads is obtained;
S5: sequence assembly assembling and predictive genes: cleanreads is spliced using metaSPAdes software, using not having to The Kmer (21,33,55) of size assembles filtered data, inside the scaffolds at gap, by scaffolds weight It is newly broken into new scafting, removal length is less than the reads of 500bp, selects N50 most from the assembling result of different Kmer Big assembling to assembling result as a result, carry out the prediction of open reading window using software MetaGeneMark, then basis obtains .gff comment file be reduced into .fna gene order file, and be screened out from it the gene order that length is greater than 100bp, and turn over It is translated into corresponding amino acid sequence;
S6: gene set building: using CD-HIT by it is all predict come gene clusters, wherein degree of corroboration > 95%, coverage > 90%, it selects to remove remaining redundancy gene after longest gene order in every one kind, from ncbi database downloading from China east The macro genomic gene of the enteron aisle gene of portion and southern type-2 diabetes mellitus and cirrhosis project, constructs completely new gene set, removal Less than the gene that 2 reads are supported, nonredundancy gene set is obtained;
S7: gene preliminary screening: depicting the relational graph of gene dosage Yu non-zero sample quantity, according to gene dosage and non-zero sample The relational graph of this quantity carries out preliminary genetic screening after determining non-zero quantity, obtains gene sets;
S8: gene abundance and differential gene statistical check: by SOAPaligener by nonredundancy base in cleanreads comparison Because of collection, according to the length of reads item number and gene that each gene is compared, each gene can be calculated again in sample Relative abundance, then according to gene abundance, calculated by Wilcoxon rank sum test each gene in Parkinson's disease group and The difference of healthy spouse's control group obtains differential gene intersection, P < 0.05;
S9: difference in Parkinsonian and healthy spouse differential gene clustering: is obtained according to Wilcoxon rank sum test Gene sets, carry out gene clusters analysis (metagenomic species, MGS) cluster, calculate two-by-two gene in all samples The Pearson correlation coefficient of middle Abundances utilizes single-linkage clustering algorithm, it is desirable that related coefficient is not less than in its class 0.9, related coefficient is not more than 0.1 between class, obtains MGS cluster, MGS cluster according to: a) be not less than 50 comprising number of genes;b) According to the library genome annotation to same genus;C) genus annotation rate is greater than 90%, and screening obtains MGS group;
S10: the screening of gene marker: MGS group is screened to obtain significant gene using minimal redundancy maximal correlation algorithm, then is used Preferentially Algorithms of Selecting selects the gene that can be improved classifying quality, until classifying quality from remaining significant gene sets every time Until cannot being promoted, gene marker is obtained;
The minimal redundancy maximal correlation algorithm includes following content:
I) reduce the redundancy of marker gene: the sum of correlation metric of alternate labels gene minimizes two-by-two;
Ii) reinforce marker gene predictive ability: otherness index of the marker gene under Parkinson's disease group/health spouse's group it And maximization;
The preferentially Algorithms of Selecting includes following content:
A. first from significant gene sets, optimal the gene of predictive ability is selected, as selected feature;
B. a gene is selected from remaining significant gene sets again, be added in selected characteristic set, so that selected feature set The predictive ability of conjunction is best;
C. step b is repeated, until predictive ability no longer improves;Export selected characteristic set;
The predictive ability appraisal procedure includes following content:
Using linear discriminant analysis as sorting algorithm, use stay the Matthew related coefficient that is calculated under a crosscheck as The judgment basis of selected feature predictive ability.Wherein the formula of Matthew related coefficient MCC is as follows:
Wherein TP is the quantity of true positives, and TN is the quantity of true negative, and FP is the quantity of false positive, and FN is the quantity of false negative;
S11: the verifying of marker separating capacity: gene marker is constructed into Training Support Vector Machines model, draws ROC curve, together When gene marker testing result is integrated into a Parkinson's disease index (Parkinson ' sdisease index, PDI), often The PDI of a sample j, i.e. IjCalculation formula is as follows:
Wherein, AijIt is the relative abundance of i-th of marker of sample j, N and M are gene markers in corresponding Parkinson's sufferer The enrichment subset of person and healthy spouse's control group, in addition, | N | and | M | it is the size of the two subsets respectively;
Compare the PDI index difference after converting between Parkinsonian and healthy spouse's control group by Wilcoxon rank sum test It is different.
S12: fluorescence real time aggregation enzyme chain reaction (Real-time polymerase chain reaction, Real- are used Time PCR) technology, the clinic of original sample and enlarged sample is carried out to the gene marker sequence that filters out in claim 1 Value assessment.
2. screening technique according to claim 1, it is characterised in that: extract the method packet of excrement gene DNA in (2) It includes as follows:
Be added and 1~2mlinhibitor EX buffer and mix well in 200mg excrement, under 12000~15000rpm from 1~2min of the heart takes 500~600 μ l supernatants to be added in 25~30 μ l Proteinase Ks, adds 500~600 μ lbuffer AL and fill Divide and mix, 10~15min is incubated at 65~80 DEG C, add alcohol of 500~600 μ l volumetric concentrations greater than 95% and sufficiently mixes It is even, lysate is obtained, lysate is transferred to the QIAamp in QIAamp Fast DNA Stool Mini Kit kit and is inhaled Attached column is centrifuged 1~2min under 12000~15000rpm, removes waste liquid, takes above-mentioned processed QIAamp adsorption column, is added 500 ~600 μ l buffer AW1 are centrifuged 1~2min at 12000~15000rpm, remove waste liquid, take above-mentioned processed 500~600 μ l buffer AW2 are added in QIAamp adsorption column, and 3~4min is centrifuged at 12000~15000rpm, is taken above-mentioned Processed QIAamp adsorption column is centrifuged 3~4min at 12000~15000rpm, takes above-mentioned processed QIAamp absorption The ddH that 100~200 μ l are preheated at 65~80 DEG C is added in column2O stands 1~2min under greenhouse, in 12000~15000rpm 1~2min of lower centrifugation takes precipitating to obtain excrement gene DNA.
3. a kind of application of gene marker described in claim 1, it is characterised in that: the gene marker, which can be used for preparing, examines Disconnected or auxiliary diagnosis Parkinson's disease reagent, kit or biochip, or it is used to prepare screening treatment anti-parkinson drug Reagent, kit or biochip;The kit or biochip of detection reagent containing gene marker, can be used to detect The expression quantity of gene marker in fecal sample, for diagnosis or auxiliary diagnosis Parkinson's disease, or screening, preparation treatment Anti-parkinson drug.
CN201810227886.1A 2018-03-20 2018-03-20 Screening and application of fecal gene markers Active CN109658980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810227886.1A CN109658980B (en) 2018-03-20 2018-03-20 Screening and application of fecal gene markers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810227886.1A CN109658980B (en) 2018-03-20 2018-03-20 Screening and application of fecal gene markers

Publications (2)

Publication Number Publication Date
CN109658980A true CN109658980A (en) 2019-04-19
CN109658980B CN109658980B (en) 2023-05-09

Family

ID=66110180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810227886.1A Active CN109658980B (en) 2018-03-20 2018-03-20 Screening and application of fecal gene markers

Country Status (1)

Country Link
CN (1) CN109658980B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246544A (en) * 2019-05-17 2019-09-17 暨南大学 A kind of biomarker selection method and system based on confluence analysis
CN110299185A (en) * 2019-05-08 2019-10-01 西安电子科技大学 A kind of insertion mutation detection method and system based on new-generation sequencing data
CN111471778A (en) * 2020-01-22 2020-07-31 广州康泽医疗科技有限公司 Method for detecting abundance change and genotype distribution of Akk in intestinal tracts of patients with diseases such as Parkinson's disease
CN112852916A (en) * 2021-02-19 2021-05-28 王普清 Marker combination for intestinal microecology, auxiliary diagnosis model and application of marker combination
CN113981078A (en) * 2021-09-16 2022-01-28 北京肿瘤医院(北京大学肿瘤医院) Biomarker for predicting curative effect of anti-EGFR (epidermal growth factor receptor) targeted therapy of patient with advanced esophageal cancer and curative effect prediction test kit
CN114015761A (en) * 2021-09-24 2022-02-08 上海交通大学医学院附属瑞金医院 Application of excrement tyrDC gene abundance as biomarker for predicting levodopa curative effect
CN114480446A (en) * 2022-02-23 2022-05-13 复旦大学 Method for constructing lignin degrading enzyme gene set
CN116064913A (en) * 2022-11-16 2023-05-05 成都市第三人民医院 Gene library based on metagenome cholecystitis fungi, microbial markers and application
CN117995283A (en) * 2024-04-03 2024-05-07 吉林大学 Single-sample metagenome clustering method, system, terminal and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6146828A (en) * 1996-08-14 2000-11-14 Exact Laboratories, Inc. Methods for detecting differences in RNA expression levels and uses therefor
WO2008058399A1 (en) * 2006-11-17 2008-05-22 Emerillon Therapeutics Inc. Methods for diagnosis, prognosis or treatment of migraine and related disorders
US8405379B1 (en) * 2008-09-18 2013-03-26 Luc Montagnier System and method for the analysis of DNA sequences in biological fluids
CN103119179A (en) * 2010-07-23 2013-05-22 哈佛大学校长及研究员协会 Methods for detecting signatures of disease or conditions in bodily fluids
CN105368944A (en) * 2015-11-23 2016-03-02 广州基迪奥生物科技有限公司 Biomarker capable of detecting diseases and application of biomarker
CN105543369A (en) * 2016-01-13 2016-05-04 金锋 Mental disorder biomarker and application thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6146828A (en) * 1996-08-14 2000-11-14 Exact Laboratories, Inc. Methods for detecting differences in RNA expression levels and uses therefor
WO2008058399A1 (en) * 2006-11-17 2008-05-22 Emerillon Therapeutics Inc. Methods for diagnosis, prognosis or treatment of migraine and related disorders
US8405379B1 (en) * 2008-09-18 2013-03-26 Luc Montagnier System and method for the analysis of DNA sequences in biological fluids
CN103119179A (en) * 2010-07-23 2013-05-22 哈佛大学校长及研究员协会 Methods for detecting signatures of disease or conditions in bodily fluids
CN105368944A (en) * 2015-11-23 2016-03-02 广州基迪奥生物科技有限公司 Biomarker capable of detecting diseases and application of biomarker
CN105543369A (en) * 2016-01-13 2016-05-04 金锋 Mental disorder biomarker and application thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANG XIAODONG,QIAN YIWEI,XU SHAOQING,SONG YANYAN,XIAO QIN: "Longitudinal Analysis of Fecal Microbiome and Pathologic Processes in a Rotenone Induced Mice Model of Parkinson"s Disease", 《FRONTIERS IN AGING NEUROSCIENCE》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110299185B (en) * 2019-05-08 2023-07-04 西安电子科技大学 Insertion variation detection method and system based on new generation sequencing data
CN110299185A (en) * 2019-05-08 2019-10-01 西安电子科技大学 A kind of insertion mutation detection method and system based on new-generation sequencing data
CN110246544B (en) * 2019-05-17 2021-03-19 暨南大学 Biomarker selection method and system based on integration analysis
CN110246544A (en) * 2019-05-17 2019-09-17 暨南大学 A kind of biomarker selection method and system based on confluence analysis
CN111471778A (en) * 2020-01-22 2020-07-31 广州康泽医疗科技有限公司 Method for detecting abundance change and genotype distribution of Akk in intestinal tracts of patients with diseases such as Parkinson's disease
CN112852916A (en) * 2021-02-19 2021-05-28 王普清 Marker combination for intestinal microecology, auxiliary diagnosis model and application of marker combination
CN113981078A (en) * 2021-09-16 2022-01-28 北京肿瘤医院(北京大学肿瘤医院) Biomarker for predicting curative effect of anti-EGFR (epidermal growth factor receptor) targeted therapy of patient with advanced esophageal cancer and curative effect prediction test kit
CN113981078B (en) * 2021-09-16 2023-11-24 北京肿瘤医院(北京大学肿瘤医院) Biomarker for predicting curative effect of EGFR (epidermal growth factor receptor) -resistant targeted therapy of patients with advanced esophageal cancer and curative effect prediction kit
CN114015761A (en) * 2021-09-24 2022-02-08 上海交通大学医学院附属瑞金医院 Application of excrement tyrDC gene abundance as biomarker for predicting levodopa curative effect
CN114480446A (en) * 2022-02-23 2022-05-13 复旦大学 Method for constructing lignin degrading enzyme gene set
CN116064913A (en) * 2022-11-16 2023-05-05 成都市第三人民医院 Gene library based on metagenome cholecystitis fungi, microbial markers and application
CN117995283A (en) * 2024-04-03 2024-05-07 吉林大学 Single-sample metagenome clustering method, system, terminal and storage medium
CN117995283B (en) * 2024-04-03 2024-07-23 吉林大学 Single-sample metagenome clustering method, system, terminal and storage medium

Also Published As

Publication number Publication date
CN109658980B (en) 2023-05-09

Similar Documents

Publication Publication Date Title
CN109658980A (en) A kind of screening and application of excrement gene marker
CN106650312B (en) Device for detecting copy number variation of circulating tumor DNA
CN105506115B (en) DNA library for detecting and diagnosing genetic cardiomyopathy pathogenic genes and application thereof
CN114150066B (en) Application of exosomes CDA, HMGN1 and the like in lung cancer diagnosis
WO2017156739A1 (en) Isolated nucleic acid application thereof
CN115927614A (en) Early intestinal cancer screening detection primer, detection method and kit based on Alu repeat element
CN108660213A (en) The application of three kinds of non-coding RNA reagents of detection and kit
CN109072278A (en) Isolated nucleic acid and application
CN110396537B (en) Asthma biomarker and application thereof
CN118538416A (en) Method for predicting colorectal cancer distant metastasis state
CN118421792A (en) Application of lncRNA as biomarker in diagnosing bladder cancer
CN113846157A (en) Use of human SERPINA3 gene for wine dependence screening
CN113801936A (en) Kit, device and method for lung cancer diagnosis
WO2016049917A1 (en) Biomarkers for obesity related diseases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant