Build method and its application of drug reaction related gene standard type data base
Technical field
The present invention relates to field of gene detection, the gene type data base of more particularly to a kind of drug reaction related gene,
The methods of genotyping of drug reaction related gene, and the detection method of drug reaction effect.
Background technology
Drug reaction individual variation is clinical common problem.Many medicines clinically are only effective to some patientss, according to
Estimate, asthma, cardiovascular and psychotherapeutic drugses effective percentage are about 60%, and up to 40% patient's curative effect is undesirable even
It is invalid.Meanwhile, some patientss are for the easy generation untoward reaction of conventional treatment drug.U.S.'s epidemiological study shows, 6.7%
Patient once there is serious side reaction, wherein 0.32% is fatal, be the 4th~6 big cause of death of inpatient.
The factor of drug reaction individual variation is caused to have a many, including many aspects such as sex, age, body weight, wherein being most importantly
Inherited genetic factorss, including the genetic polymorphism of drug metabolism, transhipment and action target spot gene.
Into body medicine mainly liver, little enteral through I phase (oxidoreduction, hydrolysis) and II phase (with reference to
Reaction) metabolism heel row is except external.The enzyme for participating in I phase metabolic response is mainly CYP450 families, wherein study more being
CYP1, CYP2 and CYP3 subfamily.Encoding such enzymes gene polynorphisms significantly affect the activity of enzyme, so as to affect medicine to exist
Internal metabolism.If Propranolol is the important substrate of CYP2D6, the blood drug level in Different Individual can at most differ 20 times,
The mutation of CYP2D6*10 allele in Chinese population is up to 51.6%, is to cause in Chinese population under CYP2D6 metabolic activities
The main cause of drop.The enzyme for participating in phaseⅡreaction includes thiopurine methyltransferase (TPMT), N- acetyltransferases
(NATs), UDP-glucose glycosides acyltransferase (UGT1A1) etc..Wherein TPMT genes are the leukemia of pure and mild mutation
Patient, for the Ismipur of routine dose can produce serious toxicity, causes serious bone marrow depression and hepatic injury.Medicine
Transfer related protein mutation can cause body drug cumulative concentration too high, or reduce intracellular drug level.Research shows many
Want tolerance gene ABCB1 (MDR1) its mutation closely related with the resistance of various cancer therapy drugs.
According to Id detection come auxiliary direction clinical medicine dose or specific aim medication, can effectively reduce and keep away
Exempt from the generation of untoward reaction, reach optimal therapeutic effect.Currently for the related genetic polymorphism detection such as PCR/ of medicine
RFLP, probe hybridization etc., have that detection site is few, the low deficiency of flux, wherein for the newest technology of polymorphic detection mostly
It is the chip hybridization technology based on multiplex PCR, but there is also that detection site quantity is limited and detection site is necessary for known
Defect.
The content of the invention
It is an object of the invention to provide a kind of gene standard type data base of drug reaction related gene, and with the data
The method of the gene type of the quick detection drug reaction related gene based on storehouse and the detection method of drug reaction effect.
For achieving the above object, present invention employs technical scheme below:
The invention discloses it is a kind of build drug reaction related gene gene standard type data base method, including with
Lower step:By the genotype of drug reaction related gene other abrupt information corresponding particular sequence and mankind's full-length genome standard
Sequence is compared, and obtains the particular sequence and mankind's full-length genome standard sequence of drug reaction related gene in each base position
The corresponding relation put;According to the base corresponding relation for comparing acquisition, the gene type of drug reaction related gene is converted into
Relative to the genotype of mankind's full-length genome standard sequence, the normalized gene type of drug reaction related gene is obtained.
Preferably, drug reaction related gene include selected from ABCB1, ABCG2, ADRB1, APC, ARG1, ASL, ASS1,
BCHE、BRAF、CDKN2A、CPS1、CYP19A1、CYP1A2、CYP1B1、CYP2B6、CYP2C19、CYP2C9、CYP2D6、
CYP2E1、CYP3A4、CYP3A5、CYP3A7、CYP4F2、DPYD、EGFR、EGR1、ERBB2、F2、F5、G6PD、GSTA1、HLA-
B、KIT、KRAS、MTHFR、NAGS、NAT1、NAT2、NRAS、OTC、RNR1、SLCO1B1、SULT1A1、TPMT、TYMS、
At least one of UGT1A1, VKORC1, XRCC1 gene.
Further, said method also includes, drug reaction related gene is positioned at mankind's full-length genome standard sequence
On, determine the original position and final position of the related gene coded sequence of drug reaction, obtain the base of drug reaction related gene
Because of the corresponding particular sequence of type abrupt information.
Preferably, the corresponding particular sequence of gene type abrupt information of drug reaction related gene includes drug reaction phase
The 500bp regions of the original position upstream 5000bp of the coded sequence of correlation gene to coded sequence final position downstream.
Preferably, mankind's full-length genome standard sequence is hg19.
Another aspect of the present invention, also discloses the medicine constructed by a kind of database construction method provided by the present invention
The gene standard type data base of reaction related gene.
Further, in the gene standard type data base of drug reaction related gene, drug reaction related gene it is each
Normalized gene type is to there is the relevant information that drug reaction is acted on;Preferably, drug reaction related gene includes following 48
At least one in individual mankind's drug reaction related gene.
1 human medicine of table reacts related gene
Another aspect of the invention, discloses a kind of methods of genotyping of drug reaction related gene, methods described bag
Include:The exon sequence of sample to be tested drug reaction related gene is obtained, is sequenced using high-flux sequence platform and is carried out data
Analysis, the gene standard type data base of the drug reaction related gene that analysis result is provided with the present invention are compared, from
And obtain the gene type of sample to be tested.
Another aspect of the invention, discloses a kind of detection method of drug reaction effect, and methods described includes:Acquisition is treated
The exon sequence of test sample this drug reaction related gene, is sequenced and carries out data analysiss using high-flux sequence platform, will divide
The gene standard type data of the drug reaction related gene containing drug reaction effect information that analysis result is provided with the present invention
Storehouse is compared, and obtains the gene type of sample to be tested, obtains to be measured according to the corresponding drug reaction effect information of gene type
The drug reaction exercising result of sample.
In embodiments of the present invention, the process bag of the exon sequence of sample to be tested drug reaction related gene is obtained
Include,
A, preparation can capture the chip of drug reaction related gene exon sequence, containing anti-with medicine on the chip
Answer the oligonucleotide probe of related gene exon sequence reverse complemental;
B, prepare sequence capturing library with the genomic DNA of sample to be tested, including by sample to be tested genomic DNA interrupt for
The fragment of 200~500bp sizes, amplification after carrying out end-o f-pipe -control obtain sequence capturing library;
The chip hybridization of C, the sequence capturing library that step B is prepared and step A, so as to acquire sample to be tested
Drug reaction related gene exon library.
Wherein, contain can be anti-with all exon sequences of 48 drug reaction related genes respectively for the chip in step A
To complementary oligonucleotide probe, the length of oligonucleotide probe is 55-105bp;In step B, sample to be tested genomic DNA is beaten
The fragment broken as 200~300bp sizes.
Further, in step B, end-o f-pipe -control includes that carrying out end repairs the DNA fragmentation to form flat terminal phosphate, and
" A " base is added in the 3 ' ends of flat end DNA, and further connects label.
Further, in step C, by after the sequence capturing library mixing from multiple different samples to be tested before hybridization
Chip hybridization simultaneously with step A again, each library carries different Index base sequences and is mutually distinguishable, the Index alkali
Basic sequence length is preferably 6~8bp.
In embodiments of the present invention, the data analysiss after sequencing include,
I, the low quality sequencing sequence for filtering to remove excessively impact information analysiss,
Ii, with mankind's full-length genome standard sequence as reference sequences, the sequence that step i is obtained is compared with comparison software
It is right, compare software and preferably use SOAP or BWA;
Iii, selection compare the sequence of target area and carry out subsequent analysis, and the target area refers to that drug reaction is related
Gene extron subsequence region;
Iv, the qualified laggard row variation analysis of data Quality Control, the analysis of variance include at least one in below detection:It is single
Nucleotide polymorphisms SNP, insertion and deletion INDEL, structural variation SV, copy number variation CNV.
As a result of above technical scheme, the beneficial effect that the present invention possesses is made to be:
The method of the present invention provides the data sought unity of standard for the gene type of 48 drug reaction related genes
Storehouse, for the drug reaction related gene of known type, can fast and accurately provide corresponding gene type information, be clinical
Medication is provided and more accurately aids in foundation, is acted on good clinical guidance.At the same time it can also detect 48 medicines
The all unknown pleomorphism site of reaction related gene, can be that research finds new impact medicine as a kind of data accumulation
The pleomorphism site of reaction lays the foundation, and is the basic research of drug reaction related gene.
The chip that the method for the present invention is used captures the exon of 48 pharmaceutical relevant genes, and once detection is more simultaneously for experiment
Up to up to a hundred samples, the quantity of detection sample is not only increased, while being greatly reduced the testing cost of each sample.
The method of the present invention sets up the related data base of drug reaction by reference sequences of hg19 genomes, with reference to sequencing and
Bioinformatic analysis, on the basis of the mutating alkali yl type for accurately providing each pleomorphism site, can distinguish corresponding gene
Type simultaneously provides corresponding drug reaction information.
Description of the drawings
Exon trapping library construction flow charts of the Fig. 1 for an embodiment of the present invention;
Information analysiss flow charts of the Fig. 2 for an embodiment of the present invention.
Specific embodiment
In a specific embodiment of the present invention, the high-flux sequence with target area Jing after sequence capturing is as base
Plinth, comprises the following steps:
First, the structure in drug reaction related gene type standardized data storehouse
In the specific embodiment of the present invention, existing 48 all related to drug reaction functional genes are collected
(seeing above table 1), by BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) software is compared, with the mankind
Full-length genome standard sequence hg19 is reference sequences, by all gene type sequences and hg19 of 48 drug reaction related genes
Reference sequences are compared, and obtain the mutational site information relative to hg19 according to comparison result, by the institute of drug reaction related gene
There is gene type to be converted into unified form and standard.Annotation information according to gene on full-length genome, gene type is turned
It is changed to the type with hg19 as standard.Specifically include following steps:
1. the gene type related mutation and enzymatic activity information of drug reaction related gene are collected
Collect the other abrupt information of all genotype and type and enzymatic activity of existing 48 drug reaction related genes
Relevant information.These information mainly include the other title of genotype, the corresponding amino acid mutation information of gene type, gene type
Drug reaction information corresponding with the abrupt information of particular sequence, gene type, list of references etc..It should be noted that this Shen
Please described in " particular sequence ", refer to research employed in as reference DNA sequencing fragment or one section of cDNA sequence.It is right
The analyses of collection find that the abrupt information of each type is relative to what one of particular sequence was provided;Namely
Say, in different research datas, the other references object of 48 drug reaction related gene its genotype is different, and is directed to different
References object, the different genes type of same gene there is also difference.For in data with different form it is inconsistent, need to change
Into unified form, so as to follow-up arrangement.
2. CDS (coded sequence) region of the gene on particular sequence, and position of the gene on hg19 are collected
Collect data in, many gene type abrupt informations be relative to given particular sequence, also, be mutated
Site information is with 1998 gene mutation naming rule (the Recommendations for a nomenclature for announcing
System for human gene mutations.Nomenclature Working Group) for standard, with gene
CDS original positions be+1 standard providing mutated site.So for follow-up analysis, needing to find out all genes in spy
CDS original positions on sequencing row.Again because particular sequence is very long, multiple genes in some sequences, are included, so will
Determine which section region is the gene of the drug reaction correlation that we need.We are first to find out drug reaction related gene to exist
Position on hg19, then the 500bp from 5000bp to the CDS final positions downstream of CDS original positions upstream is anti-as medicine
The region of related gene is answered, but the other mutational site of some genotype is distant from CDS areas, beyond above-mentioned scope, for
These genes, we can be longerly fixed the region of this gene, to include above-mentioned mutational site as principle.
3.BLAST is compared
Particular sequence is carried out BLAST with hg19 to compare.If particular sequence is cDNA, we are compared with BLAT.
4. the abrupt information of particular sequence and hg19 is determined
In comparison result, particular sequence may compare multiple positions of hg19, select to compare best one position
The comparison result put, to the analysiss on each position, obtains particular sequence and hg19 on each position
Base corresponding relation.If it should be noted that comparing in the minus strand in dyeing, needed base transition into the alkali in normal chain
Base.
5. the gene type of all drug reaction related genes is changed
Situation compared according to particular sequence and hg19, by the gene type of all drug reaction related genes be converted to
Mutational site information of the hg19 for standard.In Coordinate Conversion, needs use the base of the CDS original positions of gene above and definition
Because of region.Mutational site information on some gene types is all minus strand, needs just to be converted to minus strand information in conversion
Chain.
6. arrange the document form and inspection
Arrange the document form, and the information of gene type enzymatic activity is also added into, and specific example is as listed by form 2.It
The correctness of result is reexamined afterwards.
The normalized gene type database information of 2 pharmaceutical relevant gene of table
2nd, the gene type detection of drug reaction related gene
1st, exon trapping probe and capture chip
In the specific embodiment of the present invention, according to 48 drug reaction related genes, with human genome hg19 it is
Reference sequences, choose whole exon regions of this 48 genes as target sequence, the total and about 160kb of target sequence length.Pin
To each exon sequence, design is captured with the length about oligonucleotide of 55-105bp of exon sequence reverse complemental and is visited
Pin.By the capture probe of design highdensity fixed synthesis on chip, formed and owned comprising 48 drug reaction related genes
The capture chip of exon trapping probe.The probe for designing is produced and synthesized by Roche-Nimblegen and is fixed on capture core
On piece.
The present embodiment middle probe sequence with reference to hg19 design, due between different plant species genome sequence exist it is certain
Difference, therefore the probe is preferably applied in the capture of human source gene group DNA, other are with the higher species of human genome homology
Genome can be suitable for, but capture effect may be preferable not as human source gene group.Different plant species can be set according to its reference sequences
Count with the similar probe of the present invention, be applied to the capture of different plant species target region.
2nd, prepared by sequence capturing library
It is prepared by step one fragment
There is no RNA, protein contamination and there is no the human gene group DNA for degrading as experiment material, using physics or change
DNA is broken into method the fragment of 200~300bp sizes, reclaims DNA fragmentation using related QIAquick Gel Extraction Kit.
DNA fragmentation after step 2 is interrupted is end modified
The fragmentation DNA of recovery purifying pass through T4DNA Polymerase, Klenow Fragment and
The effect of the enzymes such as T4Polynucleotide Kinase carries out end reparation by substrate specificity of dNTP, forms the end of filling-in
The DNA fragmentation of phosphorylation.DNA after end-filling after purification utilize Klenow Frgment (3 ' -5 ' exo-) polymerases and
DATP adds " A " base in 3 ' ends of filling-in sequence.
Step 3 DNA fragmentation adds Index Adapter
End adds the DNA fragmentation after " A " to be connected with Index Adapter under T4DNA Ligase effects after purification, and
Purification joint product is carried out with test kit.
PCR and product purification before step 4 hybridization
The DNA library after adjunction head is expanded with Index Adapter aligning primers, Jing after amplified production is purified
Agilent 2100 and NanoDrop is quantitative, the qualified rear library for being used for next step of Quality Control mixes.
The multiple sample library mixing of step 5
The library mixing of the multiple samples built up according to step one to four, in order to distinguish in sequencing from different samples
Library, the Index base sequence of the DNA ends containing different 6bp or 8bp in each library, each library DNA combined amount can
Equivalent or mix according to a certain percentage as needed.It should be noted that equivalent is needing each sample sequencing data amount phase
Meanwhile, each library hybrid dna amount is consistent;Some research difference sample sequencing data amounts may be different, and library usage amount is also
Difference, mixed proportion are determined according to the specific research purpose of those skilled in the art or design requirement.
3rd, chip hybridization
In step 5, the qualified DNA of Quality Control hybridizes standard operation explanation and chip hybridization according to Nimblegen solid phase chips.
DNA Jing eluting after hybridization, expand by primer of joint sequence after purification, amplified production Jing Agilent 2100 and Q-PCR matter
Upper machine sequencing after control is qualified.
4th, upper machine sequencing and data analysiss
The qualified library of Quality Control is measured using the sequence measurement being sequenced in synthesis using Hiseq2000 platforms.Number
According to analysis with human genome hg19 (UCSC) as reference sequences.Data analysiss after sequencing include several aspects.Information analysiss
Flow chart such as Fig. 2.
Step a pair of sequences is filtered
Removing first affects the low quality sequencing sequence of information analysiss:In sequence, each base corresponds to a sequencing matter respectively
Value, for one section of sequence of sequencing result, calculates the average mass values of this section of sequence, if the average mass values of this sequence are low
In conventional empirical value, this sequence can be filtered;On the other hand, sequencing sequence may be by the Adapter on machine
Joint pollutes, and the sequence that Adapter is contained in this part can also be filtered.
Step 2 sequence alignment
With hg19 (UCSC) as reference sequences, by comparison software of the sequence after step one is filtered (such as SOAP, BWA)
Carry out sequence alignment.These compare software for one section of sequence, can select an optimal comparison position.For comparison position
There are multiple repetitive sequences, software can select a position output, and add a label.
Step 3 chooses the sequence for comparing target area
The sequence of part nontarget area can be captured after chip hybridization, in step 2 using hg19 whole genome sequences as
Reference sequences, the sequence of nontarget area will compare corresponding position according to best match principle, without the mesh for comparing
Mark region.The sequence for comparing target area is chosen for subsequent analysis, it is ensured that the sequence of selection is all target area sequence.
Step 4 data Quality Control
Data Quality Control includes many aspects, such as compares the percentage ratio of upper sequence, unique reads (sequence and reference sequences
Only one of which optimal comparison position during comparison) percentage ratio, the ratio of duplication (identical sequence), be sequenced depth, mesh
The coverage in mark region etc..These Quality Controls will meet conventional empirical value can just carry out the analysis of next step, such as sequencing depth
Consistent with expection, single base depth coverage diagram obeys Poisson distribution.
Step 5 variation detection
Data Quality Control is qualified just to carry out analysis of variance, including detection SNP (single nucleotide polymorphism), INDEL (insertion and
Delete), SV (structural variation), CNV (copy number variation) etc..Every kind of variation detection can be come using different modes as needed
Realize.
Step 6 drug reaction related gene typing
After variation has been tested and analyzed, the mutational site information in each gene is arranged, therewith the good medicine of prefinishing
Corresponding gene type in reaction related gene standard database compares, and obtains the gene type of each sample.As people is two
Times body biology, type at most only two kinds of types of each gene, the genotyping result of last drug reaction related gene is a kind of
Homozygosis type or heterozygosis type.Some gene types have corresponding enzymatic activity information, so after passing through sample gene type, together
When can also obtain response situation of the sample to enzymatic activity.
At present for the technology of drug reaction associated gene mutation detection is focused primarily upon known to single or several genes
Mutation, for unknown mutation or great amount of samples detection have the limiting factors such as time-consuming, expense height.With prior art phase
Than the present invention has following advantage:
First, the present invention can 48 drug reaction related genes of one-time detection exon region, including respective regions are all
Known and unknown pleomorphism site.Except the known pleomorphism site according to detection is used as the foundation of adjuvant clinical medication, inspection
The unknown pleomorphism site measured can be as a kind of data accumulation, for finding the polymorphic position of new impact drug reaction
Point;Not only act on clinical guidance, also with certain Research Significance.
2nd, there is high-throughout property using the chip capture that this institute is used, once detection is up to gone up simultaneously for experiment
Hundred samples, not only increase the quantity of detection sample, while being greatly reduced the testing cost of each sample.
3rd, the present invention sets up the related data base of drug reaction by reference sequences of hg19 genomes, with reference to sequencing and life
Thing bioinformatics analysis, on the basis of the mutating alkali yl type for accurately providing each pleomorphism site, can distinguish corresponding gene type
Not and corresponding drug reaction information is not provided.
Further detailed description is done to the present invention below by specific embodiment.Following examples are used to explain this
It is bright, rather than limit the present invention.
Embodiment
The present embodiment experiment flow part is described as building one chip of storehouse hybridization, this reality including 50 samples including Yan Di and Huang Di, two legendary rulers of remote antiquity
The sample number in example is applied to explain the present invention, rather than limits the sample number that every chip can hybridize.
1st, experiment material
Reagent in the present embodiment is shown in Table 3, and other reagents, consumptive material and instrument and equipment do not indicate person in table 3, are and can lead to
Cross the universal product of market purchase.
3 the present embodiment agents useful for same of table
2nd, prepared by sequence capturing library
(1) genomic DNA fragment
The Yan Di and Huang Di, two legendary rulers of remote antiquity's genomic DNA degraded is polluted and is not had with 3 μ g without protein, RNA as material, it is super using Covaris-S2
Sound interrupts instrument (Covaris, US) instrument and enters Break Row.Interrupt parameter setting as follows:
Treatment1 (processes 1) |
Duty/cycle (%) (duty factor) |
10 |
|
Intensity (intensity) |
10 |
|
Cycle/burst (circulation/pulse) |
1000 |
|
Time (min) (time (second)) |
60 |
Treatment2 (processes 2) |
Time (s) (time (second)) |
0 |
Treatment (processes 3) |
Time (s) (time (second)) |
0 |
Treatment4 (processes 4) |
Time (s) (time (second)) |
0 |
Cycles (is circulated) |
|
4 |
Fragment after interrupting Jing after electrophoresis detection qualified (master tape is concentrated between 200bp-300bp), using QIAquick
PCR Purification Kit recovery purifyings, sample are dissolved in 75 μ L Elution Buffer.
(2) DNA fragmentation end is repaired
The DNA fragmentation according to the form below that rear recovery purifying obtains will be interrupted end is prepared in the centrifuge tube of 1.5mL and repair reaction
System, forms the DNA fragmentation of the terminal phosphate of filling-in.
Sample DNA |
75μL |
10×Polynucleotide Kinase Buffer |
10μL |
dNTP Solution Set(10mM each) |
4μL |
T4DNA Polymerase |
5μL |
Klenow Fragment |
1μL |
T4Polynucleotide Kinase |
5μL |
Total volume |
100μL |
After above-mentioned 100 μ L reactant mixtures are slightly mixed, 20 DEG C of temperature baths in Thermomixer (Eppendorf)
Purified with QIAquick PCR Purification Kit after 30min, DNA is most fully dissolved in 32 μ L ddH2O.
(3) 3 ' ends add " A " base modification
3 ' end of DNA fragmentation after end-filling is repaired adds " A " base, in order to next step Index Adapter
Joint connects.End adds " A " base reaction system such as following table.
DNA |
32μL |
10x blue buffer |
5μL |
dATP(1mM) |
10μL |
Klenow(3’-5’exo-) |
3μL |
Total volume |
50μL |
After above-mentioned 50 μ L reactant mixtures are slightly mixed, 37 DEG C of temperature baths in Thermomixer (Eppendorf)
Purified with QIAquick PCR Purification Kit after 30min, DNA is most fully dissolved in 15 μ L ddH2O.
(4) Index Adapter joints connection
End adds the DNA fragmentation after " A " to be connected with Index Adapter under T4DNA Ligase effects after purification.
Index Adapter coupled reaction systems are prepared in the centrifuge tube of 15ml:
Above-mentioned 50 μ L reactant mixture slight oscillatory mix homogeneously, is placed in Thermomixer after brief centrifugation
(Eppendorf) in, 20 DEG C of temperature bath 15min, carry out purification with MiniElute PCR Purification Kit after having reacted,
Sample is dissolved in into 25 μ L Elution Buffer finally.
(5) PCR and product purification before hybridizing
The DNA library after adjunction head is expanded with Index Adapter aligning primers, amplification system and condition are such as
Under:
PCR programs are 94 DEG C of 2min;94 DEG C of 15s, 62 DEG C of 30s, 72 DEG C of 30s of 4 circulations;72℃5min.PCR primer is used
QIAquick PCR Purification Kit purification, elution volume are 30 μ L.
(6) sample library mixing
Interrupt according to above-mentioned DNA, end is repaired, add the steps such as the front PCR of Index Adapter joints, hybridization, builds other
49 sample libraries, including Yan Di and Huang Di, two legendary rulers of remote antiquity's genomic DNA sample library amounts to 50 libraries and (includes 4 HapMap samples, 1 Yan Di and Huang Di, two legendary rulers of remote antiquity
Sample and 45 normal person's samples, wherein 45 normal person's samples are used to test the number of samples that a chip can hybridize), from this
The DNA that equivalent is taken in 50 libraries uniformly mixes.In order to distinguish in sequencing from the library of different samples, adding Index
During Adapter joints, the Index base sequence of the DNA ends containing different 6bp or 8bp in each library.Need explanation
It is that Index Adapter joints include two parts, respectively for distinguishing the Index base sequences and Index in each library
Adapter primer sequences.
4th, exon library construction
The structure in exon library includes sequence capturing library and the capture chip hybridization using preparation, will be 48 medicines anti-
Whole exons of related gene are answered to be enriched on capture chip, the capture chip after eluting hybridization, eluted product is exon
Sequence, processes to exon sequence amplification and obtains exon library, specific as follows:
(1) chip hybridization
A COT-1DNA, the 3 μ g of 450 μ g) are added in 1.5mL centrifuge tubes from DNA, the 1nmol in mixing library
Index-adpater1-block and Index-adpater2-block (Multiplexing Sample Preparation
Oligonucleotide Kit, Illumina), mixture is placed in SpeedVac (Thermo) and is evaporated, and temperature setting is 60
℃。
B 11.2 μ L pure water are added in the centrifuge tube being evaporated), the 2 × SC of 18.5 μ L after abundant dissolving DNA, is added
Mixture is transferred to hybridization instrument after fully mixing by the SC Hybridiation of Hybridiation Buffer and 7.3 μ L
(Nimblegen) DNA degeneration is made within 10 minutes in upper 95 DEG C of dry bath devices.
C it is placed on centrifuge after sample is taken out concussion) and is centrifuged at full speed 30 seconds, is placed in 42 on hybridization instrument (Nimblegen)
DEG C position, with exon trapping chip hybridization.
D) hybridizing method is with reference to NimbleGen company chip-hybridization method (NimbleGen Arrays User ' s
Guide,Version 3.1,7Jul 2009,Roche NimbleGen,Inc.).35 μ l of sample applied sample amount, 42 DEG C of hybridization 64-
72hr, hybridization are completed and after the hybridization post processing of chip, are enriched in the sequence on chip with 900 μ l 160mM NaOH eluting
Row, eluted product MinElute PCR Purification Kit are purified, final with 80 μ l Elution Buffer eluting.
(2) PCR amplifications after capturing
Performing PCR amplification is entered as template with the sequence eluted from capture chip, system is Phusion Mix150 μ l,
The each 4.2 μ l of upstream and downstream primer (Multiplexing Sequencing Primers and Phix Control Kit), it is above-mentioned
80 μ l elution samples add 85 μ l ddH20, after mixing, point 6 pipes enter performing PCR.94 DEG C of PCR reaction conditions, 1min;16 circulations
94 DEG C of 30s, 58 DEG C of 30s, 72 DEG C of 30s;72℃5min.6 pipes are mixed after PCR reactions and use QIAquick PCR
Purification Kit magnetic beads for purifying reclaims the fragment of 300-450bp sizes, and elution volume is 50 μ l.
(3) library detection:
Library fragments are detected using Bioanalyzer analysis system (Agilent, Santa Clara, USA)
Size and content;The concentration in Q-PCR accurate quantifications library.
5th, sequencing
The above-mentioned pcr amplification product qualified through purification and quality testing is sequenced, sequence measurement reference
Illumina companies HiSeq2000 operational approach (HiSeq 2000User Guide.Catalog#SY-940-1001Part#
15011190Rev B, Illumina).
6th, data analysiss
(1) sequencing data is filtered
Filtration of both carrying out to the data that sequencing is obtained, one is sequencing quality value, to whole piece sequence, calculates its base
Mass value, when the average mass values of whole piece sequence are less than 10, filters this out;Two is detection Adaper joint pollutions, if
Contain Adapter sequences in sequence, also filter this out.
Sequencing data filter result shows that the sequence being filtered accounts for 7%, remaining 93% be used for next step analysis.
(2) sequence alignment
With hg19 as reference sequences, software is compared to through data filtering with BWA (Burrows-Wheeler Aligner)
Sequence compare.During comparison, every sequence at most allows 5 mispairing, opens gap's (allow to insert during comparison and delete)
Compare, when a sequence has multiple optimal comparison positions, one position output of random selection, but have labelling.In this enforcement
In the test of example, the sequence that sample is compared accounts for about the 97% of all sequences compared.
(3) choose the sequence for comparing target area
After having compared, first, according to the result for comparing, remove non-unique reads, only retain those unique comparisons
Sequence in full-length genome;Duplication is removed again, for the pairing reads for comparing same position on reference sequences, is gone
Repeat arbitrarily to retain one pair of which reads, because the matched sequence for comparing same position is likely to what PCR processes caused.
After having processed above, according to the target area of drug reaction related gene chip design, retain those and compare ginseng
The sequence of the target area in sequence is examined, the analysis of next step is carried out.
(4) data Quality Control
Data Quality Control includes the data volume of sample, and the data volume size of filtration compares the ratio of upper sequence during sequence alignment,
Whether the mean depth of sample meets expection, and whether single base depth coverage diagram meets Poisson distribution, and the target area of sample is covered
Cover degree etc..
Statistic analysis result shows that 50 sample standard deviations of the present embodiment meet Quality Control requirement, and partial results are shown in Table 4.
Specifically, data Quality Control is included in terms of two, is seen whether than more consistent between each sample, if various kinds
Data between this are all similar, and expression meets the requirements, if data other most numerical examples differences of indivedual samples it is a lot,
Illustrate that this sample is likely to have problem;On the other hand it is each Quality Control data of each sample, these standards people in the art
Member can rule of thumb determine a general scope, different sequencing region may some changes, specifically, " number
According to surpluses after filtration " general more than 85%, the ratio (%) more than 90% of aligned sequences, remaining data amount after deduplication
The ratio that more than 60%, unique reads is accounted for is related to specific sequencing target area and more than 90%, and mean depth meets
Expected experimental design requires that coverage wants more than 95%, all can be acceptance.
4 data Quality Control result of table
(5) snp analysis
In the present embodiment, SNP is obtained with samtools, after the sequence for comparing target area is chosen, is used
After samtools format transformations, sequence, SNP Calling are carried out with mpileup orders therein.Original SNP can also enter
Capable some filtrations, depth, mass value including site etc..Generally, depth meets the requirements in 4-400, mass value be then by with
The method of statistics calculates the significance of mass value, and significance is filtered.
In the sample of the present embodiment, including 4 HapMap samples (a, b, c, d) and 1 Yan Di and Huang Di, two legendary rulers of remote antiquity's sample (this 5 sample
Have the genome and typing data of announcement), wherein Yan Di and Huang Di, two legendary rulers of remote antiquity's sample has been surveyed twice, and the SNP of this five samples is commented
Valency.4 HapMap samples are compared with existing HapMap data, the SNP of Yan Di and Huang Di, two legendary rulers of remote antiquity's sample and existing Yan Di and Huang Di, two legendary rulers of remote antiquity's sample
Genotyping sites compare, table 5 and table 6.
The snp analysis result of table 5HapMap samples
The snp analysis result of 6 Yan Di and Huang Di, two legendary rulers of remote antiquity's sample of table
(6) genotype of drug reaction related gene
After finishing variation detection, according to region of each gene on full-length genome, the mutation position of each gene is extracted
Point information.Carried out according to genotype data storehouse of these mutational site information with the drug reaction related gene for building before
Relatively, the gene type information and corresponding drug reaction information of sample are determined.The testing result of part sample such as table 7.
The genotype and drug reaction information of 7 drug reaction related gene of table
Genotyping result shows, the gene type information obtained using the method for the present embodiment and drug reaction effect information with
Existing known record is consistent.
Above content is with reference to specific embodiment further description made for the present invention, it is impossible to assert this
It is bright to be embodied as being confined to these explanations.For general technical staff of the technical field of the invention, do not taking off
On the premise of present inventive concept, some simple deduction or replace can also be made, should all be considered as belonging to the protection of the present invention
Scope.