CN103198238A

CN103198238A - Drug related gene type database, gene typing and drug action detection method

Info

Publication number: CN103198238A
Application number: CN2012100028987A
Authority: CN
Inventors: 刘晓; 张伟; 徐怀前; 苏政; 王冠; 杨焕明
Original assignee: BGI Shenzhen Co Ltd
Current assignee: BGI Shenzhen Co Ltd
Priority date: 2012-01-06
Filing date: 2012-01-06
Publication date: 2013-07-10
Anticipated expiration: 2032-01-06
Also published as: CN103198238B; WO2013102442A1

Abstract

The invention discloses a standard gene type database of drug action related genes and a construction method of the standard gene type database. The construction method comprises the following steps: comparing the corresponding special sequence of mutation information with the human whole genome group standard sequence to obtain the corresponding relation between the special sequence and the human whole genome group standard sequence; and according to the corresponding relation, converting the gene type into the standard gene type corresponding to the human whole genome group standard sequence. On the basis, the invention discloses a drug action related gene typing method and a drug action detection method. The database provides a unified standard for drug action related gene typing and a more accurate basis for clinical medication. The gene typing and drug action detection method covers 48 human drug action related genes and can carry out multi-sample detection on all the known mutation sites and the corresponding drug information at the same time. The gene typing method can detect unknown polymorphic sites, and lays the foundation of researching and finding new polymorphic sites affecting the drug action.

Description

Medicine related gene type database, Genotyping and drug response detect

Technical field

The present invention relates to the genetic test field, particularly a kind of gene type database of drug response related gene, the methods of genotyping of drug response related gene, and the detection method of drug response effect.

Background technology

The drug response individual difference is clinical common problem.Many medicines clinically are only effective to the part patient, according to estimates, asthma, cardiovascular and psychotherapeutic drugs is efficient is about 60%, nearly patient's curative effect of 40% is undesirable even invalid.Simultaneously, the part patient is easy to generate bad reaction for the conventional therapy medicine.U.S.'s epidemiological study shows that serious subsidiary reaction once took place 6.7% patient, and wherein 0.32% is fatal, is inpatient's the 4th～6 big cause of death.It is many to cause the factor of drug response individual difference to have, and comprises many aspects such as sex, age, body weight, and wherein inherent cause most importantly comprises drug metabolism, transhipment and action target spot Gene Polymorphisms.

Enter the drug main of body will be in liver, small intestine through I phase (redox, hydrolysis reaction) and II mutually after (association reaction) metabolism eliminating external.The enzyme that participates in I phase metabolic response is mainly CYP450 family, wherein studies more be CYP1, CYP2 and CYP3 subfamily.The polymorphism of encoding such enzymes gene obviously influences the activity of enzyme, thereby influences medicine metabolism in vivo.Be the important substrate of CYP2D6 as Propranolol, the blood concentration in Different Individual can differ 20 times at most, and the allelic sudden change of CYP2D6*10 is to cause CYP2D6 metabolic activity main reasons for decrease in the Chinese population up to 51.6% in the Chinese population.The enzyme that participates in the II phase reaction comprises mercaptopurine transmethylase (TPMT), N-acetyltransferase (NATs), UDP-glucose glycosides acyltransferase (UGT1A1) etc.Wherein the TPMT gene is the leukaemic of pure and mild sudden change, can produce serious toxicity for the Ismipur of routine dose, causes serious bone marrow suppression and hepatic lesion.The sudden change of drug transport associated protein can cause body drug accumulation excessive concentration, or reduces intracellular drug concentration.Studies show that the resistances of its sudden change of tolerance gene ABCB1 (MDR1) and multiple cancer therapy drug of wanting closely related more.

Assist according to Id detection and to instruct clinical medicine dose or specific aim medication, can effectively reduce and avoid the generation of bad reaction, reach best result for the treatment of.Detect as PCR/RFLP, probe hybridization etc. at the relevant gene pleiomorphism of medicine at present, exist detection site few mostly, the deficiency that flux is low, wherein being used for polymorphism, to detect up-to-date technology be chip hybridization technology based on multiplex PCR, still also exists detection site restricted number and detection site to be necessary for known defective.

Summary of the invention

The gene standard type database that the purpose of this invention is to provide a kind of drug response related gene, and based on the method for the Genotyping of the fast detecting drug response related gene of this database and the detection method of drug response effect.

For achieving the above object, the present invention has adopted following technical scheme:

The invention discloses a kind of method that makes up the gene standard type database of drug response related gene, may further comprise the steps: particular sequence and the human full genome standard sequence of other sudden change information correspondence of genotype of drug response related gene are compared, and the particular sequence that obtains the drug response related gene and human full genome standard sequence are at the locational corresponding relation of each base; The base corresponding relation that obtains according to comparison converts the gene type of drug response related gene to respect to the full genome standard sequence of the mankind genotype, obtains the standardization gene type of drug response related gene.

Preferably, the drug response related gene comprises and is selected from ABCB1, ABCG2, ADRB1, APC, ARG1, ASL, ASS1, BCHE, BRAF, CDKN2A, CPS1, CYP19A1, CYP1A2, CYP1B1, CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP2E1, CYP3A4, CYP3A5, CYP3A7, CYP4F2, DPYD, EGFR, EGR1, ERBB2, F2, F5, G6PD, GSTA1, HLA-B, KIT, KRAS, MTHFR, NAGS, NAT1, NAT2, NRAS, OTC, RNR1, SLCO1B1, SULT1A1, TPMT, TYMS, UGT1A1, VKORC1,48 human drug response related genes such as XRCC1 at least a.

Further, said method also comprises, the drug response related gene is positioned to determine reference position and the final position of the related gene coded sequence of drug response on the human full genome standard sequence, obtains the particular sequence of the gene type sudden change information correspondence of drug response related gene.

Preferably, the particular sequence of the gene type of drug response related gene sudden change information correspondence comprises the reference position upstream 5000bp of coded sequence of drug response related gene to the 500bp zone in coded sequence final position downstream.

Preferably, human full genome standard sequence is hg19.

Another aspect of the present invention also discloses a kind of gene standard type database by the constructed drug response related gene of database construction method provided by the invention.

Further, in the gene standard type database of drug response related gene, each standardization gene type correspondence of drug response related gene has the relevant information of drug response effect; Preferably, the drug response related gene comprises at least a in following 48 human drug response related genes.

Table 1 human medicine reacting phase correlation gene

ABCB1

CYP1A2

EGFR

NAT1

ABCG2	CYP1B1	EGR1	NAT2
				ADRB1	CYP2B6	ERBB2	NRAS
APC	CYP2C19	F2	OTC
				ARG1	CYP2C9	F5	RNR1
ASL	CYP2D6	G6PD	SLCO1B1
				ASS1	CYP2E1	GSTA1	SULT1A1
BCHE	CYP3A4	HLA-B	TPMT
				BRAF	CYP3A5	KIT	TYMS
CDKN2A	CYP3A7	KRAS	UGT1A1
				CPS1	CYP4F2	MTHFR	VKORC1
CYP19A1	DPYD	NAGS	XRCC1

One side more of the present invention, a kind of methods of genotyping of drug response related gene is disclosed, described method comprises: the exon sequence that obtains sample to be tested drug response related gene, adopt the high-flux sequence platform order-checking line data analysis of going forward side by side, the gene standard type database of analysis result and drug response related gene provided by the invention is compared, thereby obtain the gene type of sample to be tested.

One side more of the present invention, a kind of detection method of drug response effect is disclosed, described method comprises: the exon sequence that obtains sample to be tested drug response related gene, adopt the high-flux sequence platform order-checking line data analysis of going forward side by side, analysis result and the gene standard type database that contains the drug response related gene of drug response effect information provided by the invention are compared, obtain the gene type of sample to be tested, according to the drug response exercising result of the drug response effect information acquisition sample to be tested of gene type correspondence.

In the embodiments of the present invention, the process of obtaining the exon sequence of sample to be tested drug response related gene comprises,

A, preparation can be caught the chip of drug response related gene exon sequence, contain the oligonucleotide probe with drug response related gene exon sequence reverse complemental on the described chip;

B, prepare the sequence capturing library with the genomic DNA of sample to be tested, comprise that it is the fragment of 200～500bp size that the sample to be tested genomic DNA is interrupted, carry out end and handle the back amplification and obtain the sequence capturing library;

C, the sequence capturing library that step B is prepared and the chip hybridization of steps A, thus acquire the drug response related gene extron library of sample to be tested.

Wherein, the chip in the steps A contains can be respectively and the oligonucleotide probe of all exon sequence reverse complementals of 48 drug response related genes, and the length of oligonucleotide probe is 55-105bp; Among the step B, it is the fragment of 200～300bp size that the sample to be tested genomic DNA interrupts.

Further, among the step B, the terminal processing comprises the dna fragmentation that carries out the flat terminal phosphateization of terminal reparation formation, and adds " A " base at the 3 ' end of flat terminal DNA, and further connects label.

Further, among the step C, after will mixing from the sequence capturing library of a plurality of different samples to be tested before the hybridization more simultaneously with the chip hybridization of steps A, each library has different Index base sequences and difference mutually, described Index base sequence length is preferably 6～8bp.

In the embodiments of the present invention, the data analysis after the order-checking comprises,

I, mistake filter to remove and influence the inferior quality of information analysis sequencing sequence,

Ii, be reference sequences with the full genome standard sequence of the mankind, the sequence that step I obtains compared with comparison software that comparison software is preferably with SOAP or BWA;

Iii, choose the comparison carry out subsequent analysis to the sequence of target area, described target area refers to drug response related gene exon sequence region;

Iv, the qualified laggard row variation analysis of data Quality Control, described analysis of variance comprise and detect at least a in following: single nucleotide polymorphism SNP, insertion and deletion INDEL, structural variation SV, copy number variation CNV.

Owing to adopted above technical scheme, the beneficial effect that the present invention is possessed is:

Method of the present invention provides the database of a unified standard for the Genotyping of 48 drug response related genes, drug response related gene for the known type, can provide corresponding gene type information fast and accurately, for clinical application provides auxiliary foundation more accurately, has the good clinical directive function.Simultaneously, can also detect all unknown pleomorphism sites of 48 drug response related genes, can be used as a kind of data accumulation, lay the foundation for discovering the new pleomorphism site that influences drug response, be the fundamental research of drug response related gene.

The chip that method of the present invention is used is caught the extron of 48 medicine related genes, and once experiment detects simultaneously and reaches up to a hundred samples, has not only improved the quantity that detects sample, also greatly reduces the testing cost of each sample simultaneously.

Method of the present invention is that reference sequences is set up the relevant database of drug response with the hg19 genome, in conjunction with order-checking and bioinformatic analysis, on the basis of the mutating alkali yl type that provides each pleomorphism site accurately, can distinguish the corresponding gene type and provide corresponding drug response information.

Description of drawings

Fig. 1 is the exon trapping library construction process flow diagram of an embodiment of the present invention;

Fig. 2 is the information analysis process flow diagram of an embodiment of the present invention.

Embodiment

In a concrete embodiment of the present invention, based on the high-flux sequence of target area behind sequence capturing, may further comprise the steps:

One, the structure in drug response related gene type standardized data storehouse

In the specific embodiment of the present invention, collect existing all 48 functional genes (see above table 1) relevant with drug response, by BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) comparison software, hg19 is reference sequences with the full genome standard sequence of the mankind, all gene type sequences and the comparison of hg19 reference sequences with 48 drug response related genes, obtain mutational site information with respect to hg19 according to comparison result, convert all gene types of drug response related gene to unified format and standard.According to the annotation information of gene on full genome, it is the type of standard that the gene type is converted to hg19.Specifically may further comprise the steps:

1. collect the relevant sudden change of gene type and the enzymatic activity information of drug response related gene

Collect other sudden change information of all genotype and type and the enzymatic activity relevant information of existing 48 drug response related genes.These information spinners will comprise amino acid mutation information, the gene type of other title of genotype, the gene type correspondence drug response information corresponding with sudden change information, the gene type of particular sequence, list of references etc.Need to prove " particular sequence " described in the application, the dna sequencing fragment as a reference that adopts in referring to study or one section cDNA sequence.The analysis of collecting is found that the sudden change information of each type provides with respect to one of them particular sequence; That is to say, in the different research datas, 48 other references object differences of its genotype of drug response related gene, and at different references object, the different genes type of same gene also there are differences.Inconsistent for form on the data with different need be made unified format into, so that follow-up arrangement.

2. collect the CDS zone of gene on particular sequence, and the position of gene on hg19

In the data of collecting, a lot of gene type sudden change information are with respect to given particular sequence, and, mutational site information is be standard with the 1998 gene mutation naming rules of announcing (Recommendations for a nomenclature system for human gene mutations.Nomenclature Working Group), provides the position that suddenlys change with CDS (coded sequence) reference position of gene for+1 standard.So for follow-up analysis, need find out the CDS reference position of all genes on particular sequence.Because particular sequence length has very comprised a plurality of genes on some sequence, be the relevant gene of drug response that we need so will determine which section zone again.We find out the position of drug response related gene on hg19 earlier, 500bp from the 5000bp of CDS reference position upstream to CDS final position downstream is as the zone of drug response related gene then, but other mutational site of some genotype is distant from the CDS district, exceeded above-mentioned scope, for these genes, we can be more longerly fixed the zone of this gene, is principle to include the said mutation site.

3.BLAST comparison

Particular sequence and hg19 are carried out the BLAST comparison.If particular sequence is cDNA, we compare with BLAT.

4. determine the sudden change information of particular sequence and hg19

In comparison result, particular sequence may be compared a plurality of positions of hg19, selects the comparison result of a best position of comparison, and each locational base is analyzed, and obtains particular sequence and hg19 at each locational base corresponding relation.It should be noted that if compare dyeing on minus strand on, base need be converted to the base on the normal chain.

5. change the gene type of all drug response related genes

According to the comparison situation of particular sequence and hg19, it is the mutational site information of standard that the gene type of all drug response related genes is converted to hg19.When carrying out coordinate conversion, need use the CDS reference position of top gene and the gene region of definition.Mutational site information on some gene type all is minus strand, need be normal chain with the minus strand information translation when conversion.

6. form and inspection arrange the document

The form that arranges the document also adds the information of gene type enzymatic activity come in, and object lesson such as form 2 are listed.Reexamine result's correctness afterwards.

The standardization genotype data library information of table 2 medicine related gene

Two, the gene type of drug response related gene detects

1, exon trapping probe and catch chip

In the specific embodiment of the present invention, according to 48 drug response related genes, be reference sequences with human genome hg19, choose whole exon regions of these 48 genes as target sequence, total and about 160kb of target sequence length.At each exon sequence, the length of design and exon sequence reverse complemental is about the oligonucleotide capture probe of 55-105bp.With highdensity fixedly the synthesizing on chip of capture probe of design, form the chip of catching that comprises 48 all exon trapping probes of drug response related gene.The probe that designs is produced by Roche-Nimblegen and synthetic being fixed on caught on the chip.

Present embodiment middle probe sequence designs with reference to hg19, because there is certain difference in genome sequence between different plant species, therefore this probe is preferably applied in human source gene group DNA and catches, other genome with the higher species of human genome homology can be suitable for, but capture effect may be not so good as human source gene group ideal.Different plant species can be applied to catching of different plant species target region according to its reference sequences design with the similar probe of the present invention.

2, sequence capturing library preparation

The step 1 produced in fragments

As experiment material, utilize the method for physics or chemistry that DNA is broken into the fragment of 200～300bp size with the human gene group DNA that do not have RNA, protein contamination and do not have degraded, use the relevant kit that reclaims to reclaim dna fragmentation.

Dna fragmentation after step 2 interrupts is end modified

The fragmentation DNA of recovery purifying carries out end repair, the dna fragmentation of terminal phosphateization that formation benefit flat in order to dNTP for the effect substrate by the work of enzymes such as T4DNA Polymerase, Klenow Fragment and T4Polynucleotide Kinase.Utilize Klenow Frgment (3 '-5 ' exo-) polymerase and dATP to add " A " base at the 3 ' end of mending flat sequence behind the DNA purifying behind the end-filling.

The step 3 dna fragmentation adds Index Adapter

End is connected with Index Adapter under T4DNA Ligase effect after adding dna fragmentation purifying after " A ", and carries out purifying joint product with kit.

PCR and product purification before the step 4 hybridization

With Index Adapter aligning primer increased in the DNA library that adds behind the joint, purified, the Quality Control qualified back quantitative by Agilent 2100 and NanoDrop of amplified production is used for next step library and mixes.

Mix in a plurality of samples of step 5 library

Mix in the library of a plurality of samples that will build up according to step 1 to four, in order in order-checking, to distinguish the library from different samples, the DNA in each library is when adding Index Adapter joint, the Index base sequence that all contains different 6bp or 8bp in its joint, each library DNA combined amount be equivalent or mixing according to a certain percentage as required.Need to prove, equivalent namely when each sample sequencing data amount of needs is identical, each library hybrid dna amount unanimity; The different sample sequencing data of the research that has amount may be different, and the library use amount is also just different, and blending ratio is determined according to those skilled in the art concrete research purpose or designing requirement.

3, chip hybridization

The qualified DNA of Quality Control is hybridized standard operation explanation and chip hybridization according to the Nimblegen solid phase chip in the step 5.DNA after the hybridization is primer amplification with the joint sequence behind wash-out, purifying, and amplified production is gone up the machine order-checking after Agilent 2100 and Q-PCR Quality Control are qualified.

4, go up machine order-checking and data analysis

The qualified library of Quality Control uses the Hiseq2000 platform to adopt the sequence measurement of the order-checking while synthesizing to measure.Data analysis is reference sequences with human genome hg19 (UCSC).Data analysis after the order-checking comprises several aspects.The process flow diagram of information analysis such as Fig. 2.

The step pair of sequences is filtered

At first removing influences the inferior quality of information analysis sequencing sequence: respectively corresponding sequencing quality value of each base in the sequence, one section sequence for sequencing result, calculate the average quality value of this section sequence, if the average quality value of this sequence is lower than conventional empirical value, this sequence can be filtered; On the other hand, sequencing sequence may be polluted by the Adapter joint on the machine, and this part sequence that contains Adapter also can be filtered.

The step 2 sequence alignment

Be reference sequences with hg19 (UCSC), the sequence after will filtering through step 1 (as SOAP, BWA) be carried out sequence alignment with comparison software.These comparison softwares can be selected the comparison position an of the best for one section sequence.For the comparison position a plurality of repetitive sequences is arranged, software can be selected a position output, and adds a label.

Step 3 is chosen comparison to the sequence of target area

Can capture the sequence of part nontarget area behind the chip hybridization, in the step 2 with the hg19 whole genome sequence as the reference sequence, the sequence of nontarget area will be compared corresponding position according to the optimum matching principle, and the target area that can not compare.Choose the sequence of comparing the target area and be used for subsequent analysis, guarantee that the sequence of choosing all is the target area sequence.

The Quality Control of step 4 data

The data Quality Control comprises many aspects, as compare the number percent of sequence, the number percent of unique reads (having only a best comparison position when sequence and reference sequences comparison), the ratio of duplication (identical sequence), the order-checking degree of depth, the coverage of target area etc.These Quality Controls will meet conventional empirical value just can carry out next step analysis, and consistent with expection as the order-checking degree of depth, single base degree of depth coverage diagram is obeyed Poisson distribution.

The step 5 variation detects

The data Quality Control is qualified just can carry out analysis of variance, comprises detecting SNP (single nucleotide polymorphism) INDEL (inserting and deletion), SV (structural variation), CNV (copy number variation) etc.Every kind of variation detects can use different modes to realize as required.

Step 6 drug response related gene somatotype

After the variation detection has been analyzed, put the mutational site information in each gene in order, the corresponding gene type in the drug response related gene standard database that prefinishing is good with it relatively obtains the gene type of each sample.Because the people is diplont, the type of each gene has only two kinds of types at most, and the somatotype result of last drug response related gene is a kind of type or heterozygosis type of isozygotying.Some gene types have corresponding enzymatic activity information, so by after the sample Genotyping, also can obtain sample to the response situation of enzymatic activity simultaneously.

The technology that detects for the drug response associated gene mutation mainly concentrates on the known mutations of single or several genes at present, all has limiting factors such as length consuming time, expense height for unknown mutation or great amount of samples detection.Compared with prior art, the present invention has following advantage:

But the exon region of 48 drug response related genes of one, the present invention one-time detection comprises all known and unknown pleomorphism sites of respective regions.Except the foundation according to the auxiliary clinical application of known pleomorphism site conduct that detects, the pleomorphism site of detected the unknown can be used as a kind of data accumulation, is used for finding the new pleomorphism site that influences drug response; Not only have the clinical guidance effect, also have certain Research Significance.

Two, utilize this to study employed chip and catch and have high-throughout character, once experiment detects simultaneously and reaches up to a hundred samples, has not only improved the quantity that detects sample, also greatly reduces the testing cost of each sample simultaneously.

Three, the present invention is that reference sequences is set up the relevant database of drug response with the hg19 genome, in conjunction with order-checking and bioinformatic analysis, on the basis of the mutating alkali yl type that provides each pleomorphism site accurately, can distinguish the corresponding gene type and provide corresponding drug response information.

Below by embodiment the present invention is done further detailed description.Following examples are used for explanation the present invention, and are not used in restriction the present invention.

Embodiment

The present embodiment experiment flow is described as partly comprising that 50 samples of Yan Di and Huang Di, two legendary rulers of remote antiquity build chip of storehouse hybridization, and the sample number in the present embodiment is in order to explaining the present invention, rather than limits the sample number that every chip can be hybridized.

1, experiment material

Reagent in the present embodiment sees Table 3, and other reagent, consumptive material and instrument and equipment indicate the person in table 3, and being can be by the universal product of market purchase.

Table 3 present embodiment agents useful for same

2, sequence capturing library preparation

(1) genomic DNA fragmentization

Not having with 3 μ g that protein, RNA pollutes and not have Yan Di and Huang Di, two legendary rulers of remote antiquity's genomic DNA of degraded is material, and use Covaris-S2 is ultrasonic to be interrupted instrument (Covaris, US) instrument interrupts.Interrupting parameter arranges as follows:

Fragment after interrupting uses QIAquick PCR Purification Kit to reclaim purifying after electrophoresis detection qualified (master tape concentrates between the 200bp-300bp), and sample is dissolved among the 75 μ L Elution Buffer.

(2) dna fragmentation is terminal repairs

The dna fragmentation according to the form below that recovery purifying after interrupting obtains is prepared the terminal reaction system of repairing in the centrifuge tube of 1.5mL, form the dna fragmentation of the flat terminal phosphateization of benefit.

Sample DNA	75μL
		10×Polynucleotide Kinase Buffer	10μL
dNTP Solution Set(10mM each)	4μL
		T4DNA Polymerase	5μL
Klenow Fragment	1μL
		T4Polynucleotide Kinase	5μL
Total volume	100μL

Behind the slight mixing of above-mentioned 100 μ L reaction mixtures, 20 ℃ of temperature are used QIAquick PCR Purification Kit purifying after bathing 30min in Thermomixer (Eppendorf), DNA fully dissolving in 32 μ L ddH2O at last.

(3) 3 ' ends add " A " base modification

Mend the dna fragmentation 3 ' end of putting down after repairing endways and add " A " base, so that next step Index Adapter joint connects.End adds " A " base reaction system such as following table.

DNA	32μL
		10x blue buffer	5μL
dATP(1mM)	10μL
		Klenow(3’-5’exo-)	3μL
Total volume	50μL

Behind the slight mixing of above-mentioned 50 μ L reaction mixtures, 37 ℃ of temperature are used QIAquick PCR Purification Kit purifying after bathing 30min in Thermomixer (Eppendorf), DNA fully dissolving in 15 μ L ddH2O at last.

(4) Index Adapter joint connects

End is connected with Index Adapter under T4DNA Ligase effect after adding dna fragmentation purifying after " A ".Preparation Index Adapter coupled reaction system in the centrifuge tube of 1.5ml:

The slight vibration of above-mentioned 50 μ L reaction mixtures mixes, instantaneous centrifugal 20 ℃ of temperature bath 15min among the Thermomixer (Eppendorf) that are placed on, react the back and carried out purifying with MiniElute PCR Purification Kit, at last sample has been dissolved in 25 μ L Elution Buffer.

(5) PCR and product purification before the hybridization

With Index Adapter aligning primer increased in the DNA library that adds behind the joint, amplification system and condition are as follows:

The PCR program is 94 ℃ of 2min; 94 ℃ of 15s of 4 circulations, 62 ℃ of 30s, 72 ℃ of 30s; 72 ℃ of 5min.PCR product QIAquick PCR Purification Kit purifying, elution volume is 30 μ L.

(6) mix in the sample library

According to above-mentioned DNA interrupt, the terminal steps such as PCR of repairing, add before Index Adapter joint, the hybridization, make up other 49 sample libraries, comprise that Yan Di and Huang Di, two legendary rulers of remote antiquity's genomic DNA sample library amounts to 50 libraries and (comprises 4 HapMap samples, 1 Yan Di and Huang Di, two legendary rulers of remote antiquity's sample and 45 normal person's samples, wherein 45 normal person's samples are used for number of samples that chip can be hybridized of test), the DNA that gets equivalent from these 50 libraries evenly mixes.For difference is from the library of different samples in order-checking, when adding Index Adapter joint, the DNA end in each library contains the Index base sequence of different 6bp or 8bp.Need to prove that Index Adapter joint comprises two parts, be respectively for Index base sequence and the Index Adapter primer sequence of distinguishing each library.

4, extron library construction

The structure in extron library comprises the sequence capturing library of adopting preparation and catches chip hybridization, whole extrons of 48 drug response related genes are enriched to catch on the chip, catch chip after the wash-out hybridization, eluted product is exon sequence, amplification is handled and obtained the extron library to exon sequence, and is specific as follows:

(1) chip hybridization

A) in the 1.5mL centrifuge tube, add COT-1DNA, the 3 μ g of 450 μ g from DNA, the 1nmol Index-adpater1-block and Index-adpater2-block (the Multiplexing Sample Preparation Oligonucleotide Kit that mix the library, Illumina), potpourri places SpeedVac (Thermo) evaporate to dryness, and temperature is set to 60 ℃.

B) in the centrifuge tube of evaporate to dryness, add 11.2 μ L pure water, fully add 2 * SC Hybridiation Buffer of 18.5 μ L and the SC Hybridiation of 7.3 μ L behind the dissolving DNA, fully behind the mixing potpourri is transferred in the last 95 ℃ of dried baths of hybridization instrument (Nimblegen) and made the DNA sex change in 10 minutes.

C) sample is taken out concussion and be placed on the hydro-extractor centrifugal 30 seconds at full speed, place the last 42 ℃ of positions of hybridization instrument (Nimblegen), with the exon trapping chip hybridization.

D) hybridizing method with reference to NimbleGen company chip hybridization method (NimbleGenArrays User ' s Guide, Version 3.1,7Jul 2009, Roche NimbleGen, Inc.).Sample applied sample amount 35 μ l, 42 ℃ of hybridization 64-72hr, hybridization is finished and through after the hybridization aftertreatment of chip, be enriched in sequence on the chip with 900 μ l 160mM NaOH wash-outs, eluted product is finally used 80 μ l Elution Buffer wash-outs with MinElute PCR Purification Kit purifying.

(2) catch the back pcr amplification

Being that template is carried out pcr amplification from catching the sequence that chip elutes, system is Phusion Mix150 μ l, each 4.2 μ l (Multiplexing Sequencing Primers and Phix Control Kit) of upstream and downstream primer, 80 above-mentioned μ l elution samples add 85 μ l ddH20, mix the back and divide 6 pipes to carry out PCR.94 ℃ of PCR reaction conditionss, 1min; 94 ℃ of 30s of 16 circulations, 58 ℃ of 30s, 72 ℃ of 30s; 72 ℃ of 5min.6 pipes are mixed in PCR reaction back and with the fragment of QIAquick PCR Purification Kit magnetic beads for purifying recovery 300-450bp size, elution volume is 50 μ l.

(3) library is detected:

(Agilent, Santa Clara USA) detect library clip size and content to adopt Bioanalyzer analysis system; The concentration in Q-PCR accurate quantification library.

5, sequencing

Check order through purifying and the qualified pcr amplification product of quality testing to above-mentioned, sequence measurement with reference to the HiSeq2000 of Illumina company method of operating (HiSeq 2000User Guide.Catalog#SY-940-1001Part#15011190Rev B, Illumina).

6, data analysis

(1) sequencing data filters

The data that order-checking obtains are carried out the filtration of two aspects, the one, the sequencing quality value to the whole piece sequence, is calculated its base mass value, when the average quality value of whole piece sequence is lower than 10, it is filtered out; The 2nd, detect the Adapter joint and pollute, as containing the Adapter sequence in the infructescence, also it is filtered out.

The sequencing data filter result shows that the sequence that is filtered accounts for 7%, all the other 93% analyses for next step.

(2) sequence alignment

Be reference sequences with hg19, with BWA (Burrows-Wheeler Aligner) comparison software the sequence of filtering through data compared.Every sequence allows 5 mispairing at most during comparison, opens the comparison of gap (allowing during comparison to insert and deletion), when a sequence has a plurality of best comparisons position, selects a position output at random, but can be underlined.In the test of present embodiment, the sequence in the sample comparison accounts for about 97% of all sequences of comparing.

(3) choose comparison to the sequence of target area

After having compared, at first, the result according to comparison removes non-unique reads, only keeps the sequence of those unique comparisons in the full genome; Remove duplication again, for the pairing reads that compares same position on the reference sequences, go repetition to keep wherein a pair of reads arbitrarily, be likely what the PCR process caused because compare the matched sequence of same position.

After handling above, according to the target area of drug response related gene chip design, keep the sequence that those compare the target area on the reference sequences, carry out next step analysis.

(4) data Quality Control

The data Quality Control comprises the data volume of sample, the data volume size of filtration, and the ratio of sequence is gone up in comparison during sequence alignment, and whether the mean depth of sample meets expection, and whether single base degree of depth coverage diagram meets Poisson distribution, the target area coverage of sample etc.

Statistic analysis result shows that 50 sample standard deviations of present embodiment meet the Quality Control requirement, and partial results sees Table 4.

Particularly, the data Quality Control comprises two aspects, is on the one hand to see whether compare unanimity between each sample, if the data between each sample are all similar, expression meets the requirements, if there are other most of samples of data of individual samples mutually far short of what is expected, illustrates that this sample has problem probably; Be each Quality Control data of each sample on the other hand, these master copies those skilled in the art can rule of thumb come to determine a general scope, different order-checking zones may some variation, specifically, " data are filtered the back surplus " is generally more than 85%, the ratio of aligned sequences (%) is more than 90%, the remaining data amount is more than 60% after going repetition, the ratio that unique reads accounts for is relevant with concrete order-checking target area and more than 90%, mean depth meets the experimental design requirement of expection, coverage is wanted more than 95%, all is acceptable.

Table 4 data Quality Control result

(5) snp analysis

In the present embodiment, SNP obtains with samtools, when choosing comparison after the sequence of target area, after samtools format transformation, ordering, carries out SNP Calling with mpileup order wherein.Original SNP also can carry out some and filter, and comprises the degree of depth, mass value in site etc.Usually, the degree of depth meets the requirements at 4-400, and mass value then is by the conspicuousness with the method calculated mass value of adding up, and conspicuousness is filtered.

In the sample of present embodiment, comprise 4 HapMap samples (a, b, c, d) and 1 Yan Di and Huang Di, two legendary rulers of remote antiquity's sample (these 5 samples have had genome and the somatotype data of announcement), wherein Yan Di and Huang Di, two legendary rulers of remote antiquity's sample has surveyed twice, and the SNP of these five samples is estimated.4 HapMap samples and existing HapMap data compare, and the SNP of Yan Di and Huang Di, two legendary rulers of remote antiquity's sample and existing Yan Di and Huang Di, two legendary rulers of remote antiquity's sample Genotyping site compare table 5 and table 6.

The snp analysis result of table 5 HapMap sample

The snp analysis result of table 6 Yan Di and Huang Di, two legendary rulers of remote antiquity sample

(6) genotype of drug response related gene

After finishing the variation detection, according to the zone of each gene on full genome, extract the mutational site information of each gene.Compare according to the Genotyping database of these mutational site information with the drug response related gene that builds before, determine gene type information and the corresponding drug response information of sample.Testing result such as the table 7 of part sample.

The genotype of table 7 drug response related gene and drug response information

The somatotype result shows that the gene type information that the method for employing present embodiment obtains and drug response effect information are consistent with existing known record.

Above content be in conjunction with concrete embodiment to further describing that the present invention does, can not assert that concrete enforcement of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims

1. a method that makes up the gene standard type database of drug response related gene is characterized in that, may further comprise the steps:

The particular sequence of the gene type of drug response related gene sudden change information correspondence is compared with human full genome standard sequence, and the particular sequence that obtains the drug response related gene and human full genome standard sequence are at the locational corresponding relation of each base;

According to described corresponding relation, with the genotype of the drug response related gene genotype that to convert to the full genome standard sequence of the mankind be standard, obtain the standardization gene type of drug response related gene;

Wherein, randomly, described drug response related gene comprises and is selected from ABCB1, ABCG2, ADRB1, APC, ARG1, ASL, ASS1, BCHE, BRAF, CDKN2A, CPS1, CYP19A1, CYP1A2, CYP1B1, CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP2E1, CYP3A4, CYP3A5, CYP3A7, CYP4F2, DPYD, EGFR, EGR1, ERBB2, F2, F5, G6PD, GSTA1, HLA-B, KIT, KRAS, MTHFR, NAGS, NAT1, NAT2, NRAS, OTC, RNR1, SLCO1B1, SULT1A1, TPMT, TYMS, UGT1A1, VKORC1,48 human drug response related genes such as XRCC1 at least a;

Randomly, further comprise step, drug response related gene in location is determined reference position and the final position of the related gene coded sequence of drug response on the full genome standard sequence of the mankind, obtain the particular sequence of the gene type sudden change information correspondence of drug response related gene;

Randomly, the particular sequence of the gene type of described drug response related gene sudden change information correspondence comprises the reference position upstream 5000bp of coded sequence of drug response related gene to the 500bp zone in coded sequence final position downstream;

Randomly, the full genome standard sequence of the described mankind is hg19.

2. the gene standard type database of a drug response related gene is characterized in that, described database adopts the described method of claim 1 to make up.

3. the gene standard type database of drug response related gene according to claim 1, it is characterized in that: in the gene standard type database of described drug response related gene, each standardization gene type correspondence of drug response related gene has the relevant information of drug response effect;

Randomly, described drug response related gene comprises and is selected from ABCB1, ABCG2, ADRB1, APC, ARG1, ASL, ASS1, BCHE, BRAF, CDKN2A, CPS1, CYP19A1, CYP1A2, CYP1B1, CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP2E1, CYP3A4, CYP3A5, CYP3A7, CYP4F2, DPYD, EGFR, EGR1, ERBB2, F2, F5, G6PD, GSTA1, HLA-B, KIT, KRAS, MTHFR, NAGS, NAT1, NAT2, NRAS, OTC, RNR1, SLCO1B1, SULT1A1, TPMT, TYMS, UGT1A1, VKORC1, XRCC1 gene at least a.

4. the methods of genotyping of a drug response related gene, described method comprises: the exon sequence that obtains sample to be tested drug response related gene, adopt the high-flux sequence platform order-checking line data analysis of going forward side by side, the gene standard type database of analysis result and any described drug response related gene of claim 2 is compared, thereby obtain the gene type of sample to be tested.

5. the detection method of a drug response effect, described method comprises: the exon sequence that obtains sample to be tested drug response related gene, adopt the high-flux sequence platform order-checking line data analysis of going forward side by side, the gene standard type database of analysis result and the described drug response related gene of claim 3 is compared, obtain the gene type of sample to be tested, and according to the drug response exercising result of the drug response effect information acquisition sample to be tested of gene type correspondence.

6. according to claim 4 or 5 described methods, it is characterized in that: the described process of obtaining the exon sequence of sample to be tested drug response related gene comprises,

7. method according to claim 6, it is characterized in that: in the described steps A, described chip contains can be respectively and the oligonucleotide probe of all exon sequence reverse complementals of 48 human drug response related genes, and the length of oligonucleotide probe is 55-105bp;

Among the described step B, it is the fragment of 200～300bp size that the sample to be tested genomic DNA interrupts.

8. method according to claim 6 is characterized in that: among the described step B, the terminal processing comprise and carry out the terminal dna fragmentation that forms flat terminal phosphateization of repairing, and add " A " base at the 3 ' end of flat terminal DNA, and further connect label.

9. method according to claim 6, it is characterized in that: among the described step C, after will mixing from the sequence capturing library of a plurality of different samples to be tested before the hybridization more simultaneously with the chip hybridization of steps A, each library has different Index base sequences and difference mutually, and described Index base sequence length is preferably 6～8bp.

10. according to claim 4 or 5 described methods, it is characterized in that: the data analysis after the described order-checking comprises,