CN103198238B

CN103198238B - Build method and its application of drug reaction related gene standard type data base

Info

Publication number: CN103198238B
Application number: CN201210002898.7A
Authority: CN
Inventors: 刘晓; 张伟; 徐怀前; 苏政; 王冠; 杨焕明
Original assignee: BGI Shenzhen Co Ltd
Current assignee: BGI Shenzhen Co Ltd
Priority date: 2012-01-06
Filing date: 2012-01-06
Publication date: 2017-04-05
Anticipated expiration: 2032-01-06
Also published as: CN103198238A; WO2013102442A1

Abstract

The invention discloses a kind of Reference genotype data base of drug reaction related gene and its construction method, first abrupt information corresponding particular sequence is compared with mankind's full-length genome standard sequence, the corresponding relation of particular sequence and mankind's full-length genome standard sequence is obtained；Then according to the corresponding relation, genotype is converted into into the Reference genotype relative to mankind's full-length genome standard sequence.On this basis, the invention discloses drug reaction associated genotype, pharmaceutically-active detection method.The data base of the present invention provides unified standard for drug reaction associated genotype, provides more accurate foundation for clinical application.Gene type and medicine effect detection method cover 48 mankind's drug reaction related genes, and can its all known mutational site and corresponding drug information be carried out with detection while multisample.Methods of genotyping can be also detected to unknown pleomorphism site, be that research finds that the pleomorphism site of new impact drug reaction lays the foundation.

Description

Build method and its application of drug reaction related gene standard type data base

Technical field

The present invention relates to field of gene detection, the gene type data base of more particularly to a kind of drug reaction related gene, The methods of genotyping of drug reaction related gene, and the detection method of drug reaction effect.

Background technology

Drug reaction individual variation is clinical common problem.Many medicines clinically are only effective to some patientss, according to Estimate, asthma, cardiovascular and psychotherapeutic drugses effective percentage are about 60%, and up to 40% patient's curative effect is undesirable even It is invalid.Meanwhile, some patientss are for the easy generation untoward reaction of conventional treatment drug.U.S.'s epidemiological study shows, 6.7% Patient once there is serious side reaction, wherein 0.32% is fatal, be the 4th～6 big cause of death of inpatient. The factor of drug reaction individual variation is caused to have a many, including many aspects such as sex, age, body weight, wherein being most importantly Inherited genetic factorss, including the genetic polymorphism of drug metabolism, transhipment and action target spot gene.

Into body medicine mainly liver, little enteral through I phase (oxidoreduction, hydrolysis) and II phase (with reference to Reaction) metabolism heel row is except external.The enzyme for participating in I phase metabolic response is mainly CYP450 families, wherein study more being CYP1, CYP2 and CYP3 subfamily.Encoding such enzymes gene polynorphisms significantly affect the activity of enzyme, so as to affect medicine to exist Internal metabolism.If Propranolol is the important substrate of CYP2D6, the blood drug level in Different Individual can at most differ 20 times, The mutation of CYP2D6*10 allele in Chinese population is up to 51.6%, is to cause in Chinese population under CYP2D6 metabolic activities The main cause of drop.The enzyme for participating in phaseⅡreaction includes thiopurine methyltransferase (TPMT), N- acetyltransferases (NATs), UDP-glucose glycosides acyltransferase (UGT1A1) etc..Wherein TPMT genes are the leukemia of pure and mild mutation Patient, for the Ismipur of routine dose can produce serious toxicity, causes serious bone marrow depression and hepatic injury.Medicine Transfer related protein mutation can cause body drug cumulative concentration too high, or reduce intracellular drug level.Research shows many Want tolerance gene ABCB1 (MDR1) its mutation closely related with the resistance of various cancer therapy drugs.

According to Id detection come auxiliary direction clinical medicine dose or specific aim medication, can effectively reduce and keep away Exempt from the generation of untoward reaction, reach optimal therapeutic effect.Currently for the related genetic polymorphism detection such as PCR/ of medicine RFLP, probe hybridization etc., have that detection site is few, the low deficiency of flux, wherein for the newest technology of polymorphic detection mostly It is the chip hybridization technology based on multiplex PCR, but there is also that detection site quantity is limited and detection site is necessary for known Defect.

The content of the invention

It is an object of the invention to provide a kind of gene standard type data base of drug reaction related gene, and with the data The method of the gene type of the quick detection drug reaction related gene based on storehouse and the detection method of drug reaction effect.

For achieving the above object, present invention employs technical scheme below：

The invention discloses it is a kind of build drug reaction related gene gene standard type data base method, including with Lower step：By the genotype of drug reaction related gene other abrupt information corresponding particular sequence and mankind's full-length genome standard Sequence is compared, and obtains the particular sequence and mankind's full-length genome standard sequence of drug reaction related gene in each base position The corresponding relation put；According to the base corresponding relation for comparing acquisition, the gene type of drug reaction related gene is converted into Relative to the genotype of mankind's full-length genome standard sequence, the normalized gene type of drug reaction related gene is obtained.

Preferably, drug reaction related gene include selected from ABCB1, ABCG2, ADRB1, APC, ARG1, ASL, ASS1, BCHE、BRAF、CDKN2A、CPS1、CYP19A1、CYP1A2、CYP1B1、CYP2B6、CYP2C19、CYP2C9、CYP2D6、 CYP2E1、CYP3A4、CYP3A5、CYP3A7、CYP4F2、DPYD、EGFR、EGR1、ERBB2、F2、F5、G6PD、GSTA1、HLA- B、KIT、KRAS、MTHFR、NAGS、NAT1、NAT2、NRAS、OTC、RNR1、SLCO1B1、SULT1A1、TPMT、TYMS、 At least one of UGT1A1, VKORC1, XRCC1 gene.

Further, said method also includes, drug reaction related gene is positioned at mankind's full-length genome standard sequence On, determine the original position and final position of the related gene coded sequence of drug reaction, obtain the base of drug reaction related gene Because of the corresponding particular sequence of type abrupt information.

Preferably, the corresponding particular sequence of gene type abrupt information of drug reaction related gene includes drug reaction phase The 500bp regions of the original position upstream 5000bp of the coded sequence of correlation gene to coded sequence final position downstream.

Preferably, mankind's full-length genome standard sequence is hg19.

Another aspect of the present invention, also discloses the medicine constructed by a kind of database construction method provided by the present invention The gene standard type data base of reaction related gene.

Further, in the gene standard type data base of drug reaction related gene, drug reaction related gene it is each Normalized gene type is to there is the relevant information that drug reaction is acted on；Preferably, drug reaction related gene includes following 48 At least one in individual mankind's drug reaction related gene.

1 human medicine of table reacts related gene

Another aspect of the invention, discloses a kind of methods of genotyping of drug reaction related gene, methods described bag Include：The exon sequence of sample to be tested drug reaction related gene is obtained, is sequenced using high-flux sequence platform and is carried out data Analysis, the gene standard type data base of the drug reaction related gene that analysis result is provided with the present invention are compared, from And obtain the gene type of sample to be tested.

Another aspect of the invention, discloses a kind of detection method of drug reaction effect, and methods described includes：Acquisition is treated The exon sequence of test sample this drug reaction related gene, is sequenced and carries out data analysiss using high-flux sequence platform, will divide The gene standard type data of the drug reaction related gene containing drug reaction effect information that analysis result is provided with the present invention Storehouse is compared, and obtains the gene type of sample to be tested, obtains to be measured according to the corresponding drug reaction effect information of gene type The drug reaction exercising result of sample.

In embodiments of the present invention, the process bag of the exon sequence of sample to be tested drug reaction related gene is obtained Include,

A, preparation can capture the chip of drug reaction related gene exon sequence, containing anti-with medicine on the chip Answer the oligonucleotide probe of related gene exon sequence reverse complemental；

B, prepare sequence capturing library with the genomic DNA of sample to be tested, including by sample to be tested genomic DNA interrupt for The fragment of 200～500bp sizes, amplification after carrying out end-o f-pipe -control obtain sequence capturing library；

The chip hybridization of C, the sequence capturing library that step B is prepared and step A, so as to acquire sample to be tested Drug reaction related gene exon library.

Wherein, contain can be anti-with all exon sequences of 48 drug reaction related genes respectively for the chip in step A To complementary oligonucleotide probe, the length of oligonucleotide probe is 55-105bp；In step B, sample to be tested genomic DNA is beaten The fragment broken as 200～300bp sizes.

Further, in step B, end-o f-pipe -control includes that carrying out end repairs the DNA fragmentation to form flat terminal phosphate, and " A " base is added in the 3 ' ends of flat end DNA, and further connects label.

Further, in step C, by after the sequence capturing library mixing from multiple different samples to be tested before hybridization Chip hybridization simultaneously with step A again, each library carries different Index base sequences and is mutually distinguishable, the Index alkali Basic sequence length is preferably 6～8bp.

In embodiments of the present invention, the data analysiss after sequencing include,

I, the low quality sequencing sequence for filtering to remove excessively impact information analysiss,

Ii, with mankind's full-length genome standard sequence as reference sequences, the sequence that step i is obtained is compared with comparison software It is right, compare software and preferably use SOAP or BWA；

Iii, selection compare the sequence of target area and carry out subsequent analysis, and the target area refers to that drug reaction is related Gene extron subsequence region；

Iv, the qualified laggard row variation analysis of data Quality Control, the analysis of variance include at least one in below detection：It is single Nucleotide polymorphisms SNP, insertion and deletion INDEL, structural variation SV, copy number variation CNV.

As a result of above technical scheme, the beneficial effect that the present invention possesses is made to be：

The method of the present invention provides the data sought unity of standard for the gene type of 48 drug reaction related genes Storehouse, for the drug reaction related gene of known type, can fast and accurately provide corresponding gene type information, be clinical Medication is provided and more accurately aids in foundation, is acted on good clinical guidance.At the same time it can also detect 48 medicines The all unknown pleomorphism site of reaction related gene, can be that research finds new impact medicine as a kind of data accumulation The pleomorphism site of reaction lays the foundation, and is the basic research of drug reaction related gene.

The chip that the method for the present invention is used captures the exon of 48 pharmaceutical relevant genes, and once detection is more simultaneously for experiment Up to up to a hundred samples, the quantity of detection sample is not only increased, while being greatly reduced the testing cost of each sample.

The method of the present invention sets up the related data base of drug reaction by reference sequences of hg19 genomes, with reference to sequencing and Bioinformatic analysis, on the basis of the mutating alkali yl type for accurately providing each pleomorphism site, can distinguish corresponding gene Type simultaneously provides corresponding drug reaction information.

Description of the drawings

Exon trapping library construction flow charts of the Fig. 1 for an embodiment of the present invention；

Information analysiss flow charts of the Fig. 2 for an embodiment of the present invention.

Specific embodiment

In a specific embodiment of the present invention, the high-flux sequence with target area Jing after sequence capturing is as base Plinth, comprises the following steps：

First, the structure in drug reaction related gene type standardized data storehouse

In the specific embodiment of the present invention, existing 48 all related to drug reaction functional genes are collected (seeing above table 1), by BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) software is compared, with the mankind Full-length genome standard sequence hg19 is reference sequences, by all gene type sequences and hg19 of 48 drug reaction related genes Reference sequences are compared, and obtain the mutational site information relative to hg19 according to comparison result, by the institute of drug reaction related gene There is gene type to be converted into unified form and standard.Annotation information according to gene on full-length genome, gene type is turned It is changed to the type with hg19 as standard.Specifically include following steps：

1. the gene type related mutation and enzymatic activity information of drug reaction related gene are collected

Collect the other abrupt information of all genotype and type and enzymatic activity of existing 48 drug reaction related genes Relevant information.These information mainly include the other title of genotype, the corresponding amino acid mutation information of gene type, gene type Drug reaction information corresponding with the abrupt information of particular sequence, gene type, list of references etc..It should be noted that this Shen Please described in " particular sequence ", refer to research employed in as reference DNA sequencing fragment or one section of cDNA sequence.It is right The analyses of collection find that the abrupt information of each type is relative to what one of particular sequence was provided；Namely Say, in different research datas, the other references object of 48 drug reaction related gene its genotype is different, and is directed to different References object, the different genes type of same gene there is also difference.For in data with different form it is inconsistent, need to change Into unified form, so as to follow-up arrangement.

2. CDS (coded sequence) region of the gene on particular sequence, and position of the gene on hg19 are collected

Collect data in, many gene type abrupt informations be relative to given particular sequence, also, be mutated Site information is with 1998 gene mutation naming rule (the Recommendations for a nomenclature for announcing System for human gene mutations.Nomenclature Working Group) for standard, with gene CDS original positions be+1 standard providing mutated site.So for follow-up analysis, needing to find out all genes in spy CDS original positions on sequencing row.Again because particular sequence is very long, multiple genes in some sequences, are included, so will Determine which section region is the gene of the drug reaction correlation that we need.We are first to find out drug reaction related gene to exist Position on hg19, then the 500bp from 5000bp to the CDS final positions downstream of CDS original positions upstream is anti-as medicine The region of related gene is answered, but the other mutational site of some genotype is distant from CDS areas, beyond above-mentioned scope, for These genes, we can be longerly fixed the region of this gene, to include above-mentioned mutational site as principle.

3.BLAST is compared

Particular sequence is carried out BLAST with hg19 to compare.If particular sequence is cDNA, we are compared with BLAT.

4. the abrupt information of particular sequence and hg19 is determined

In comparison result, particular sequence may compare multiple positions of hg19, select to compare best one position The comparison result put, to the analysiss on each position, obtains particular sequence and hg19 on each position Base corresponding relation.If it should be noted that comparing in the minus strand in dyeing, needed base transition into the alkali in normal chain Base.

5. the gene type of all drug reaction related genes is changed

Situation compared according to particular sequence and hg19, by the gene type of all drug reaction related genes be converted to Mutational site information of the hg19 for standard.In Coordinate Conversion, needs use the base of the CDS original positions of gene above and definition Because of region.Mutational site information on some gene types is all minus strand, needs just to be converted to minus strand information in conversion Chain.

6. arrange the document form and inspection

Arrange the document form, and the information of gene type enzymatic activity is also added into, and specific example is as listed by form 2.It The correctness of result is reexamined afterwards.

The normalized gene type database information of 2 pharmaceutical relevant gene of table

2nd, the gene type detection of drug reaction related gene

1st, exon trapping probe and capture chip

In the specific embodiment of the present invention, according to 48 drug reaction related genes, with human genome hg19 it is Reference sequences, choose whole exon regions of this 48 genes as target sequence, the total and about 160kb of target sequence length.Pin To each exon sequence, design is captured with the length about oligonucleotide of 55-105bp of exon sequence reverse complemental and is visited Pin.By the capture probe of design highdensity fixed synthesis on chip, formed and owned comprising 48 drug reaction related genes The capture chip of exon trapping probe.The probe for designing is produced and synthesized by Roche-Nimblegen and is fixed on capture core On piece.

The present embodiment middle probe sequence with reference to hg19 design, due between different plant species genome sequence exist it is certain Difference, therefore the probe is preferably applied in the capture of human source gene group DNA, other are with the higher species of human genome homology Genome can be suitable for, but capture effect may be preferable not as human source gene group.Different plant species can be set according to its reference sequences Count with the similar probe of the present invention, be applied to the capture of different plant species target region.

2nd, prepared by sequence capturing library

It is prepared by step one fragment

There is no RNA, protein contamination and there is no the human gene group DNA for degrading as experiment material, using physics or change DNA is broken into method the fragment of 200～300bp sizes, reclaims DNA fragmentation using related QIAquick Gel Extraction Kit.

DNA fragmentation after step 2 is interrupted is end modified

The fragmentation DNA of recovery purifying pass through T4DNA Polymerase, Klenow Fragment and The effect of the enzymes such as T4Polynucleotide Kinase carries out end reparation by substrate specificity of dNTP, forms the end of filling-in The DNA fragmentation of phosphorylation.DNA after end-filling after purification utilize Klenow Frgment (3 ' -5 ' exo-) polymerases and DATP adds " A " base in 3 ' ends of filling-in sequence.

Step 3 DNA fragmentation adds Index Adapter

End adds the DNA fragmentation after " A " to be connected with Index Adapter under T4DNA Ligase effects after purification, and Purification joint product is carried out with test kit.

PCR and product purification before step 4 hybridization

The DNA library after adjunction head is expanded with Index Adapter aligning primers, Jing after amplified production is purified Agilent 2100 and NanoDrop is quantitative, the qualified rear library for being used for next step of Quality Control mixes.

The multiple sample library mixing of step 5

The library mixing of the multiple samples built up according to step one to four, in order to distinguish in sequencing from different samples Library, the Index base sequence of the DNA ends containing different 6bp or 8bp in each library, each library DNA combined amount can Equivalent or mix according to a certain percentage as needed.It should be noted that equivalent is needing each sample sequencing data amount phase Meanwhile, each library hybrid dna amount is consistent；Some research difference sample sequencing data amounts may be different, and library usage amount is also Difference, mixed proportion are determined according to the specific research purpose of those skilled in the art or design requirement.

3rd, chip hybridization

In step 5, the qualified DNA of Quality Control hybridizes standard operation explanation and chip hybridization according to Nimblegen solid phase chips. DNA Jing eluting after hybridization, expand by primer of joint sequence after purification, amplified production Jing Agilent 2100 and Q-PCR matter Upper machine sequencing after control is qualified.

4th, upper machine sequencing and data analysiss

The qualified library of Quality Control is measured using the sequence measurement being sequenced in synthesis using Hiseq2000 platforms.Number According to analysis with human genome hg19 (UCSC) as reference sequences.Data analysiss after sequencing include several aspects.Information analysiss Flow chart such as Fig. 2.

Step a pair of sequences is filtered

Removing first affects the low quality sequencing sequence of information analysiss：In sequence, each base corresponds to a sequencing matter respectively Value, for one section of sequence of sequencing result, calculates the average mass values of this section of sequence, if the average mass values of this sequence are low In conventional empirical value, this sequence can be filtered；On the other hand, sequencing sequence may be by the Adapter on machine Joint pollutes, and the sequence that Adapter is contained in this part can also be filtered.

Step 2 sequence alignment

With hg19 (UCSC) as reference sequences, by comparison software of the sequence after step one is filtered (such as SOAP, BWA) Carry out sequence alignment.These compare software for one section of sequence, can select an optimal comparison position.For comparison position There are multiple repetitive sequences, software can select a position output, and add a label.

Step 3 chooses the sequence for comparing target area

The sequence of part nontarget area can be captured after chip hybridization, in step 2 using hg19 whole genome sequences as Reference sequences, the sequence of nontarget area will compare corresponding position according to best match principle, without the mesh for comparing Mark region.The sequence for comparing target area is chosen for subsequent analysis, it is ensured that the sequence of selection is all target area sequence.

Step 4 data Quality Control

Data Quality Control includes many aspects, such as compares the percentage ratio of upper sequence, unique reads (sequence and reference sequences Only one of which optimal comparison position during comparison) percentage ratio, the ratio of duplication (identical sequence), be sequenced depth, mesh The coverage in mark region etc..These Quality Controls will meet conventional empirical value can just carry out the analysis of next step, such as sequencing depth Consistent with expection, single base depth coverage diagram obeys Poisson distribution.

Step 5 variation detection

Data Quality Control is qualified just to carry out analysis of variance, including detection SNP (single nucleotide polymorphism), INDEL (insertion and Delete), SV (structural variation), CNV (copy number variation) etc..Every kind of variation detection can be come using different modes as needed Realize.

Step 6 drug reaction related gene typing

After variation has been tested and analyzed, the mutational site information in each gene is arranged, therewith the good medicine of prefinishing Corresponding gene type in reaction related gene standard database compares, and obtains the gene type of each sample.As people is two Times body biology, type at most only two kinds of types of each gene, the genotyping result of last drug reaction related gene is a kind of Homozygosis type or heterozygosis type.Some gene types have corresponding enzymatic activity information, so after passing through sample gene type, together When can also obtain response situation of the sample to enzymatic activity.

At present for the technology of drug reaction associated gene mutation detection is focused primarily upon known to single or several genes Mutation, for unknown mutation or great amount of samples detection have the limiting factors such as time-consuming, expense height.With prior art phase Than the present invention has following advantage：

First, the present invention can 48 drug reaction related genes of one-time detection exon region, including respective regions are all Known and unknown pleomorphism site.Except the known pleomorphism site according to detection is used as the foundation of adjuvant clinical medication, inspection The unknown pleomorphism site measured can be as a kind of data accumulation, for finding the polymorphic position of new impact drug reaction Point；Not only act on clinical guidance, also with certain Research Significance.

2nd, there is high-throughout property using the chip capture that this institute is used, once detection is up to gone up simultaneously for experiment Hundred samples, not only increase the quantity of detection sample, while being greatly reduced the testing cost of each sample.

3rd, the present invention sets up the related data base of drug reaction by reference sequences of hg19 genomes, with reference to sequencing and life Thing bioinformatics analysis, on the basis of the mutating alkali yl type for accurately providing each pleomorphism site, can distinguish corresponding gene type Not and corresponding drug reaction information is not provided.

Further detailed description is done to the present invention below by specific embodiment.Following examples are used to explain this It is bright, rather than limit the present invention.

Embodiment

The present embodiment experiment flow part is described as building one chip of storehouse hybridization, this reality including 50 samples including Yan Di and Huang Di, two legendary rulers of remote antiquity The sample number in example is applied to explain the present invention, rather than limits the sample number that every chip can hybridize.

1st, experiment material

Reagent in the present embodiment is shown in Table 3, and other reagents, consumptive material and instrument and equipment do not indicate person in table 3, are and can lead to Cross the universal product of market purchase.

3 the present embodiment agents useful for same of table

2nd, prepared by sequence capturing library

(1) genomic DNA fragment

The Yan Di and Huang Di, two legendary rulers of remote antiquity's genomic DNA degraded is polluted and is not had with 3 μ g without protein, RNA as material, it is super using Covaris-S2 Sound interrupts instrument (Covaris, US) instrument and enters Break Row.Interrupt parameter setting as follows：

Treatment1 (processes 1)	Duty/cycle (%) (duty factor)	10
				Intensity (intensity)	10
	Cycle/burst (circulation/pulse)	1000
				Time (min) (time (second))	60
Treatment2 (processes 2)	Time (s) (time (second))	0
			Treatment (processes 3)	Time (s) (time (second))	0
Treatment4 (processes 4)	Time (s) (time (second))	0
			Cycles (is circulated)		4

Fragment after interrupting Jing after electrophoresis detection qualified (master tape is concentrated between 200bp-300bp), using QIAquick PCR Purification Kit recovery purifyings, sample are dissolved in 75 μ L Elution Buffer.

(2) DNA fragmentation end is repaired

The DNA fragmentation according to the form below that rear recovery purifying obtains will be interrupted end is prepared in the centrifuge tube of 1.5mL and repair reaction System, forms the DNA fragmentation of the terminal phosphate of filling-in.

Sample DNA	75μL
		10×Polynucleotide Kinase Buffer	10μL
dNTP Solution Set(10mM each)	4μL
		T4DNA Polymerase	5μL
Klenow Fragment	1μL
		T4Polynucleotide Kinase	5μL
Total volume	100μL

After above-mentioned 100 μ L reactant mixtures are slightly mixed, 20 DEG C of temperature baths in Thermomixer (Eppendorf) Purified with QIAquick PCR Purification Kit after 30min, DNA is most fully dissolved in 32 μ L ddH2O.

(3) 3 ' ends add " A " base modification

3 ' end of DNA fragmentation after end-filling is repaired adds " A " base, in order to next step Index Adapter Joint connects.End adds " A " base reaction system such as following table.

DNA	32μL
		10x blue buffer	5μL
dATP(1mM)	10μL
		Klenow(3’-5’exo-)	3μL
Total volume	50μL

After above-mentioned 50 μ L reactant mixtures are slightly mixed, 37 DEG C of temperature baths in Thermomixer (Eppendorf) Purified with QIAquick PCR Purification Kit after 30min, DNA is most fully dissolved in 15 μ L ddH2O.

(4) Index Adapter joints connection

End adds the DNA fragmentation after " A " to be connected with Index Adapter under T4DNA Ligase effects after purification. Index Adapter coupled reaction systems are prepared in the centrifuge tube of 15ml：

Above-mentioned 50 μ L reactant mixture slight oscillatory mix homogeneously, is placed in Thermomixer after brief centrifugation (Eppendorf) in, 20 DEG C of temperature bath 15min, carry out purification with MiniElute PCR Purification Kit after having reacted, Sample is dissolved in into 25 μ L Elution Buffer finally.

(5) PCR and product purification before hybridizing

The DNA library after adjunction head is expanded with Index Adapter aligning primers, amplification system and condition are such as Under：

PCR programs are 94 DEG C of 2min；94 DEG C of 15s, 62 DEG C of 30s, 72 DEG C of 30s of 4 circulations；72℃5min.PCR primer is used QIAquick PCR Purification Kit purification, elution volume are 30 μ L.

(6) sample library mixing

Interrupt according to above-mentioned DNA, end is repaired, add the steps such as the front PCR of Index Adapter joints, hybridization, builds other 49 sample libraries, including Yan Di and Huang Di, two legendary rulers of remote antiquity's genomic DNA sample library amounts to 50 libraries and (includes 4 HapMap samples, 1 Yan Di and Huang Di, two legendary rulers of remote antiquity Sample and 45 normal person's samples, wherein 45 normal person's samples are used to test the number of samples that a chip can hybridize), from this The DNA that equivalent is taken in 50 libraries uniformly mixes.In order to distinguish in sequencing from the library of different samples, adding Index During Adapter joints, the Index base sequence of the DNA ends containing different 6bp or 8bp in each library.Need explanation It is that Index Adapter joints include two parts, respectively for distinguishing the Index base sequences and Index in each library Adapter primer sequences.

4th, exon library construction

The structure in exon library includes sequence capturing library and the capture chip hybridization using preparation, will be 48 medicines anti- Whole exons of related gene are answered to be enriched on capture chip, the capture chip after eluting hybridization, eluted product is exon Sequence, processes to exon sequence amplification and obtains exon library, specific as follows：

(1) chip hybridization

A COT-1DNA, the 3 μ g of 450 μ g) are added in 1.5mL centrifuge tubes from DNA, the 1nmol in mixing library Index-adpater1-block and Index-adpater2-block (Multiplexing Sample Preparation Oligonucleotide Kit, Illumina), mixture is placed in SpeedVac (Thermo) and is evaporated, and temperature setting is 60 ℃。

B 11.2 μ L pure water are added in the centrifuge tube being evaporated), the 2 × SC of 18.5 μ L after abundant dissolving DNA, is added Mixture is transferred to hybridization instrument after fully mixing by the SC Hybridiation of Hybridiation Buffer and 7.3 μ L (Nimblegen) DNA degeneration is made within 10 minutes in upper 95 DEG C of dry bath devices.

C it is placed on centrifuge after sample is taken out concussion) and is centrifuged at full speed 30 seconds, is placed in 42 on hybridization instrument (Nimblegen) DEG C position, with exon trapping chip hybridization.

D) hybridizing method is with reference to NimbleGen company chip-hybridization method (NimbleGen Arrays User ' s Guide,Version 3.1,7Jul 2009,Roche NimbleGen,Inc.).35 μ l of sample applied sample amount, 42 DEG C of hybridization 64- 72hr, hybridization are completed and after the hybridization post processing of chip, are enriched in the sequence on chip with 900 μ l 160mM NaOH eluting Row, eluted product MinElute PCR Purification Kit are purified, final with 80 μ l Elution Buffer eluting.

(2) PCR amplifications after capturing

Performing PCR amplification is entered as template with the sequence eluted from capture chip, system is Phusion Mix150 μ l, The each 4.2 μ l of upstream and downstream primer (Multiplexing Sequencing Primers and Phix Control Kit), it is above-mentioned 80 μ l elution samples add 85 μ l ddH20, after mixing, point 6 pipes enter performing PCR.94 DEG C of PCR reaction conditions, 1min；16 circulations 94 DEG C of 30s, 58 DEG C of 30s, 72 DEG C of 30s；72℃5min.6 pipes are mixed after PCR reactions and use QIAquick PCR Purification Kit magnetic beads for purifying reclaims the fragment of 300-450bp sizes, and elution volume is 50 μ l.

(3) library detection：

Library fragments are detected using Bioanalyzer analysis system (Agilent, Santa Clara, USA) Size and content；The concentration in Q-PCR accurate quantifications library.

5th, sequencing

The above-mentioned pcr amplification product qualified through purification and quality testing is sequenced, sequence measurement reference Illumina companies HiSeq2000 operational approach (HiSeq 2000User Guide.Catalog#SY-940-1001Part# 15011190Rev B, Illumina).

6th, data analysiss

(1) sequencing data is filtered

Filtration of both carrying out to the data that sequencing is obtained, one is sequencing quality value, to whole piece sequence, calculates its base Mass value, when the average mass values of whole piece sequence are less than 10, filters this out；Two is detection Adaper joint pollutions, if Contain Adapter sequences in sequence, also filter this out.

Sequencing data filter result shows that the sequence being filtered accounts for 7%, remaining 93% be used for next step analysis.

(2) sequence alignment

With hg19 as reference sequences, software is compared to through data filtering with BWA (Burrows-Wheeler Aligner) Sequence compare.During comparison, every sequence at most allows 5 mispairing, opens gap's (allow to insert during comparison and delete) Compare, when a sequence has multiple optimal comparison positions, one position output of random selection, but have labelling.In this enforcement In the test of example, the sequence that sample is compared accounts for about the 97% of all sequences compared.

(3) choose the sequence for comparing target area

After having compared, first, according to the result for comparing, remove non-unique reads, only retain those unique comparisons Sequence in full-length genome；Duplication is removed again, for the pairing reads for comparing same position on reference sequences, is gone Repeat arbitrarily to retain one pair of which reads, because the matched sequence for comparing same position is likely to what PCR processes caused.

After having processed above, according to the target area of drug reaction related gene chip design, retain those and compare ginseng The sequence of the target area in sequence is examined, the analysis of next step is carried out.

(4) data Quality Control

Data Quality Control includes the data volume of sample, and the data volume size of filtration compares the ratio of upper sequence during sequence alignment, Whether the mean depth of sample meets expection, and whether single base depth coverage diagram meets Poisson distribution, and the target area of sample is covered Cover degree etc..

Statistic analysis result shows that 50 sample standard deviations of the present embodiment meet Quality Control requirement, and partial results are shown in Table 4.

Specifically, data Quality Control is included in terms of two, is seen whether than more consistent between each sample, if various kinds Data between this are all similar, and expression meets the requirements, if data other most numerical examples differences of indivedual samples it is a lot, Illustrate that this sample is likely to have problem；On the other hand it is each Quality Control data of each sample, these standards people in the art Member can rule of thumb determine a general scope, different sequencing region may some changes, specifically, " number According to surpluses after filtration " general more than 85%, the ratio (%) more than 90% of aligned sequences, remaining data amount after deduplication The ratio that more than 60%, unique reads is accounted for is related to specific sequencing target area and more than 90%, and mean depth meets Expected experimental design requires that coverage wants more than 95%, all can be acceptance.

4 data Quality Control result of table

(5) snp analysis

In the present embodiment, SNP is obtained with samtools, after the sequence for comparing target area is chosen, is used After samtools format transformations, sequence, SNP Calling are carried out with mpileup orders therein.Original SNP can also enter Capable some filtrations, depth, mass value including site etc..Generally, depth meets the requirements in 4-400, mass value be then by with The method of statistics calculates the significance of mass value, and significance is filtered.

In the sample of the present embodiment, including 4 HapMap samples (a, b, c, d) and 1 Yan Di and Huang Di, two legendary rulers of remote antiquity's sample (this 5 sample Have the genome and typing data of announcement), wherein Yan Di and Huang Di, two legendary rulers of remote antiquity's sample has been surveyed twice, and the SNP of this five samples is commented Valency.4 HapMap samples are compared with existing HapMap data, the SNP of Yan Di and Huang Di, two legendary rulers of remote antiquity's sample and existing Yan Di and Huang Di, two legendary rulers of remote antiquity's sample Genotyping sites compare, table 5 and table 6.

The snp analysis result of table 5HapMap samples

The snp analysis result of 6 Yan Di and Huang Di, two legendary rulers of remote antiquity's sample of table

(6) genotype of drug reaction related gene

After finishing variation detection, according to region of each gene on full-length genome, the mutation position of each gene is extracted Point information.Carried out according to genotype data storehouse of these mutational site information with the drug reaction related gene for building before Relatively, the gene type information and corresponding drug reaction information of sample are determined.The testing result of part sample such as table 7.

The genotype and drug reaction information of 7 drug reaction related gene of table

Genotyping result shows, the gene type information obtained using the method for the present embodiment and drug reaction effect information with Existing known record is consistent.

Above content is with reference to specific embodiment further description made for the present invention, it is impossible to assert this It is bright to be embodied as being confined to these explanations.For general technical staff of the technical field of the invention, do not taking off On the premise of present inventive concept, some simple deduction or replace can also be made, should all be considered as belonging to the protection of the present invention Scope.

Claims

1. it is a kind of build drug reaction related gene gene standard type data base method, it is characterised in that including following Step：

Software is compared using BLAST, positioning drug reaction related gene determines that medicine is anti-on mankind's full-length genome standard sequence The original position and final position of related gene coded sequence are answered, the gene type abrupt information of drug reaction related gene is obtained Corresponding particular sequence；

By the corresponding particular sequence of gene type abrupt information and same mankind's full genome of all drug reaction related genes Group standard sequence is compared, and obtains the particular sequence and mankind's full-length genome standard sequence of drug reaction related gene at each Corresponding relation on base positions, mankind's full-length genome standard sequence are hg19；

According to the corresponding relation, the genotype of drug reaction related gene is converted into mankind's full-length genome standard sequence be The genotype of standard, obtains the normalized gene type of drug reaction related gene；

Wherein, the drug reaction related gene include selected from ABCB1, ABCG2, ADRB1, APC, ARG1, ASL, ASS1, BCHE、BRAF、CDKN2A、CPS1、CYP19A1、CYP1A2、CYP1B1、CYP2B6、CYP2C19、CYP2C9、CYP2D6、 CYP2E1、CYP3A4、CYP3A5、CYP3A7、CYP4F2、DPYD、EGFR、EGR1、ERBB2、F2、F5、G6PD、GSTA1、HLA- B、KIT、KRAS、MTHFR、NAGS、NAT1、NAT2、NRAS、OTC、RNR1、SLCO1B1、SULT1A1、TPMT、TYMS、 At least one of UGT1A1, VKORC1, XRCC1 gene；

The corresponding particular sequence of gene type abrupt information of the drug reaction related gene includes drug reaction related gene Coded sequence original position upstream 5000bp to coded sequence final position downstream 500bp regions.

2. a kind of methods of genotyping of drug reaction related gene, methods described include：Obtain sample to be tested drug reaction phase The exon sequence of correlation gene, is sequenced and carries out data analysiss using high-flux sequence platform, by analysis result and claim 1 The gene standard type data base of described drug reaction related gene is compared, so as to obtain the genotype of sample to be tested Not.

3. the detection method that a kind of drug reaction is acted on, methods described include：Obtain sample to be tested drug reaction related gene Exon sequence, is sequenced and carries out data analysiss using high-flux sequence platform, by analysis result and the medicine described in claim 1 The gene standard type data base of thing reaction related gene is compared, and obtains the gene type of sample to be tested, and according to gene The corresponding drug reaction effect information of type obtains the drug reaction exercising result of sample to be tested.

4. according to the method in claim 2 or 3, it is characterised in that：The acquisition sample to be tested drug reaction related gene The process of exon sequence include,

A, preparation can capture the chip of drug reaction related gene exon sequence, on the chip containing with drug reaction phase The oligonucleotide probe of correlation gene exon sequence reverse complemental；

B, sequence capturing library is prepared with the genomic DNA of sample to be tested, including sample to be tested genomic DNA is interrupted as 200 The fragment of～500bp sizes, amplification after carrying out end-o f-pipe -control obtain sequence capturing library；

The chip hybridization of C, the sequence capturing library that step B is prepared and step A, so as to acquire the medicine of sample to be tested Thing reaction related gene exon library.

5. method according to claim 4, it is characterised in that：In step A, the chip contain can respectively with 48 The oligonucleotide probe of all exon sequence reverse complementals of drug reaction related gene, the length of oligonucleotide probe is 55-105bp；

In step B, sample to be tested genomic DNA interrupts the fragment for 200～300bp sizes；

48 drug reaction related genes include ABCB1, ABCG2, ADRB1, APC, ARG1, ASL, ASS1, BCHE, BRAF、CDKN2A、CPS1、CYP19A1、CYP1A2、CYP1B1、CYP2B6、CYP2C19、CYP2C9、CYP2D6、CYP2E1、 CYP3A4、CYP3A5、CYP3A7、CYP4F2、DPYD、EGFR、EGR1、ERBB2、F2、F5、G6PD、GSTA1、HLA-B、KIT、 KRAS、MTHFR、NAGS、NAT1、NAT2、NRAS、OTC、RNR1、SLCO1B1、SULT1A1、TPMT、TYMS、UGT1A1、 VKORC1 and XRCC1 genes.

6. method according to claim 4, it is characterised in that：In step B, end-o f-pipe -control includes carrying out end reparation The DNA fragmentation of flat terminal phosphate is formed, and " A " base is added in the 3 ' ends of flat end DNA, and further connect label.

7. method according to claim 4, it is characterised in that：In step C, will treat from multiple differences before hybridization Chip hybridization after the sequence capturing library mixing of test sample sheet again simultaneously with step A, each library carries different Index bases Sequence and be mutually distinguishable, the Index base sequences length be 6～8bp.

8. according to the method in claim 2 or 3, it is characterised in that：Data analysiss after the sequencing include,

Ii, with mankind's full-length genome standard sequence as reference sequences, the sequence comparison software that step i is obtained is compared, Compare software SOAP or BWA；

Iii, selection compare the sequence of target area and carry out subsequent analysis, and the target area refers to drug reaction related gene Exon sequence region；

Iv, the qualified laggard row variation analysis of data Quality Control, the analysis of variance include at least one in below detection：Monokaryon glycosides Sour polymorphism SNP, insertion and deletion INDEL, structural variation SV, copy number variation CNV.