WO2013102442A1 - Medicament-related genotype database, method for genotyping and for detecting medicament reaction - Google Patents

Medicament-related genotype database, method for genotyping and for detecting medicament reaction Download PDF

Info

Publication number
WO2013102442A1
WO2013102442A1 PCT/CN2013/070081 CN2013070081W WO2013102442A1 WO 2013102442 A1 WO2013102442 A1 WO 2013102442A1 CN 2013070081 W CN2013070081 W CN 2013070081W WO 2013102442 A1 WO2013102442 A1 WO 2013102442A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
drug
gene
related gene
drug reaction
Prior art date
Application number
PCT/CN2013/070081
Other languages
French (fr)
Chinese (zh)
Inventor
刘晓
张伟
徐怀前
苏政
王冠
Original Assignee
深圳华大基因科技有限公司
深圳华大基因研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司, 深圳华大基因研究院 filed Critical 深圳华大基因科技有限公司
Publication of WO2013102442A1 publication Critical patent/WO2013102442A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the invention relates to the field of gene detection, in particular to a gene standard type database of a drug reaction related gene and a construction method thereof, a genotyping method of a drug reaction related gene, and a detection method of a drug reaction. Background technique
  • the drugs that enter the body are mainly excreted in the liver and small intestine through phase I (redox, hydrolysis) and phase II (binding) metabolism.
  • the enzymes involved in the phase I metabolic reaction are mainly the CYP450 family, among which the more studied are the CYP1, CYP2 and CYP3 subfamilies. Polymorphisms encoding these enzyme genes significantly affect the activity of the enzyme, thereby affecting the metabolism of the drug in the body. For example, propranolol is an important substrate for CYP2D6.
  • the blood concentration in different individuals can differ by up to 20 times.
  • CYP2D6* 10 allele in Chinese population is as high as 51.6%, which leads to the decrease of CYP2D6 metabolic activity in Chinese population.
  • Enzymes involved in the phase II reaction include guanidine methyltransferase (TPMT), N-acetyltransferase (NATs), and uridine nucleoside diphosphate glucosyltransferase (UGT1A1).
  • TPMT guanidine methyltransferase
  • NATs N-acetyltransferase
  • UGT1A1 uridine nucleoside diphosphate glucosyltransferase
  • an object of the present invention is to provide a gene standard type database for a drug reaction-related gene and a method for constructing the same, and a method for rapidly detecting a genotyping of a drug-related gene based on the database and a drug reaction Detection method.
  • the present invention provides a method of constructing a gene standard type database of a drug response related gene.
  • the method comprises the steps of: comparing a specific sequence corresponding to the mutation information of the genotype of the drug reaction-related gene with a human whole genome standard sequence, and obtaining a specific sequence of the drug reaction-related gene and the human whole Correspondence of the genomic standard sequence at each base position; according to the base correspondence obtained by the alignment, the genotype of the drug reaction-related gene is converted into a genotype relative to the human genome-wide standard sequence, and the drug reaction is correlated The standardized genotype of the gene.
  • the gene standard type database for constructing a drug reaction-related gene can be efficiently constructed by using the gene standard type database of the drug reaction-related gene of the present invention, and the database can be a gene for a drug reaction-related gene.
  • the classification provides a uniform standard, whereby, based on the database, the drug response-related genes of known genotypes can quickly and accurately determine the genotype of the sample to be tested, and further, can assist in guiding the clinical analysis of the patient from which the sample is to be measured.
  • the dosage or the targeted medication can effectively reduce and avoid the adverse drug reaction of the patient and achieve the best therapeutic effect.
  • the present invention also provides a gene standard type database of a drug reaction-related gene, which is a method for constructing a gene standard type database of a drug reaction-related gene of the present invention as described above. Constructed.
  • the inventors have found that the gene standard type database of the drug reaction-related gene of the present invention can provide a unified standard for genotyping of a drug reaction-related gene, and thus, based on the database, a drug reaction-related gene of a known genotype, It can quickly and accurately determine the genotype of the sample to be tested, and further, can assist in guiding the clinical dose or targeted medication of the patient from which the sample is to be measured, thereby effectively reducing and avoiding adverse drug reactions in patients and achieving optimal treatment. effect.
  • the present invention also provides a method for genotyping a drug reaction-related gene.
  • the method comprises: obtaining an exon sequence of a drug reaction-related gene of a sample to be tested, sequencing by a high-throughput sequencing platform, and performing data analysis, and reacting the analysis result with the drug of the present invention described above
  • the gene standard type databases of related genes are compared to obtain the genotype of the sample to be tested.
  • the genotypes of the drug-response related genes of the known genotype can quickly and accurately determine the genotype of the sample to be tested, and further, can assist the clinical dosage or the targeted drug for guiding the patient of the sample to be tested, thereby enabling Effectively reduce and avoid adverse drug reactions in patients to achieve the best therapeutic effect.
  • the present invention also provides a method for detecting a drug reaction.
  • the method comprises: obtaining an exon sequence of a drug reaction-related gene of a sample to be tested, sequencing by a high-throughput sequencing platform, and performing data analysis, and the analysis result is provided by the present invention provided by the present invention. Comparing the gene standard type database of the drug reaction related genes of the drug reaction information, obtaining the genotype of the sample to be tested, and obtaining the drug reaction of the sample to be tested according to the drug reaction information corresponding to the genotype of the sample to be tested result.
  • the drug response-related genes of known genotypes can quickly and accurately determine the genotype of the sample to be tested and the corresponding drug reaction results, and further, can assist in guiding the clinical drug use of the sample source of the sample to be tested. Dosage or targeted medication can effectively reduce and avoid adverse drug reactions in patients to achieve the best therapeutic effect.
  • FIG. 1 is a schematic flow chart showing the steps of obtaining an exon sequence of a drug reaction-related gene of a sample to be tested in the method for detecting genotyping and drug reaction of a drug reaction-related gene according to an embodiment of the present invention
  • Fig. 2 is a flow chart showing the steps of data analysis in the method for detecting genotyping and drug reaction of a drug reaction-related gene according to an embodiment of the present invention. Detailed description of the invention
  • the invention provides a method of constructing a gene standard type database for a drug response related gene.
  • the method comprises the steps of: comparing a specific sequence corresponding to the mutation information of the genotype of the drug reaction-related gene with a human whole genome standard sequence, and obtaining a specific sequence of the drug reaction-related gene and the human whole Correspondence of the genomic standard sequence at each base position; according to the base correspondence obtained by the alignment, the genotype of the drug reaction-related gene is converted into a genotype relative to the human genome-wide standard sequence, and the drug reaction is correlated The standardized genotype of the gene.
  • the gene standard type database for constructing a drug reaction-related gene can be efficiently constructed by using the gene standard type database of the drug reaction-related gene of the present invention, and the database can be a gene for a drug reaction-related gene.
  • the classification provides a uniform standard, whereby, based on the database, the drug response-related genes of known genotypes can quickly and accurately determine the genotype of the sample to be tested, and further, can assist in guiding the clinical analysis of the patient from which the sample is to be measured.
  • the dosage or the targeted medication can effectively reduce and avoid the adverse drug reaction of the patient and achieve the best therapeutic effect.
  • the drug reaction-related gene comprises a group selected from the group consisting of ABCB 1, ABCG2, ADRB 1, APC, AG ASL, ASS1, BCHE, BRAF, CDKN2A, CPS1, CYP19A CYP1A2, CYP1B CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP2E CYP3A4, CYP3A5, CYP3A7, CYP4F2, DPYD, EGFR, EG ERBB2, F2, F5, G6PD, GSTA HLA-B, KIT, K AS, MTHFR, NAGS, NAT 1, NAT2, NRAS, OTC, RN SLCOIB SULT1A1, TPMT
  • At least one of 48 human drug-related genes such as TYMS, UGT1A1, VKO CX CC1, and the like.
  • the method further comprises locating a drug reaction-related gene on a human genome-wide standard sequence, determining a start position and a termination position of a coding sequence of the drug-related gene, and obtaining a genotype of the drug-related gene. Do not mutate the specific sequence corresponding to the information.
  • the specific sequence corresponding to the genotype mutation information of the drug reaction-related gene comprises 5000 bp upstream from the start position of the coding sequence of the drug reaction-related gene to a 500 bp region downstream of the end position of the coding sequence.
  • the human whole genome standard sequence is hgl9.
  • the present invention also provides a gene standard type database of a drug reaction-related gene, which is a method for constructing a gene standard type database of a drug reaction-related gene of the present invention as described above. Constructed.
  • the inventors have found that the gene standard type database of the drug reaction-related gene of the present invention can provide a unified standard for genotyping of a drug reaction-related gene, and thus, based on the database, a drug reaction-related gene of a known genotype, It can quickly and accurately determine the genotype of the sample to be tested, and further, can assist in guiding the clinical dose or targeted medication of the patient from which the sample is to be measured, thereby effectively reducing and avoiding adverse drug reactions in patients and achieving optimal treatment. effect.
  • each of the standardized genotypes of the drug-reactive gene corresponds to information on the drug response.
  • the drug response-related gene comprises at least one of the 48 human drug reaction-related genes described in Table 1 below.
  • the present invention also provides a method for genotyping a drug reaction-related gene.
  • the method comprises: obtaining an exon sequence of a drug reaction-related gene of a sample to be tested, sequencing by a high-throughput sequencing platform, and performing data analysis, and reacting the analysis result with the drug of the present invention described above
  • the gene standard type databases of related genes are compared to obtain the genotype of the sample to be tested.
  • the gene response-related genes of known genotypes can quickly and accurately determine the genotype of the sample to be tested, and further, It can help to guide the clinical dose or targeted medication of the patients from the source of the sample to be tested, so as to effectively reduce and avoid the adverse drug reaction of the patient and achieve the best therapeutic effect.
  • the present invention also provides a method for detecting a drug reaction.
  • the method comprises: obtaining an exon sequence of a drug reaction-related gene of a sample to be tested, sequencing by a high-throughput sequencing platform, and performing data analysis, and the analysis result is provided by the present invention provided by the present invention. Comparing the gene standard type database of the drug reaction related genes of the drug reaction information, obtaining the genotype of the sample to be tested, and obtaining the drug reaction of the sample to be tested according to the drug reaction information corresponding to the genotype of the sample to be tested result.
  • the drug response-related genes of known genotypes can quickly and accurately determine the genotype of the sample to be tested and the corresponding drug reaction results, and further, can assist in guiding the clinical dose of the patient from which the sample is to be tested. Or targeted medication, which can effectively reduce and avoid adverse drug reactions in patients, to achieve the best therapeutic effect.
  • obtaining the exon sequence of the drug reaction-related gene of the sample to be tested is achieved by the following steps:
  • step B The sequence capture library prepared in step B is hybridized with the chip of step A to obtain a drug reaction-related gene exon library of the sample to be tested.
  • the chip in the above step A, contains an oligonucleotide probe which can be inversely complementary to all exon sequences of 48 drug reaction-related genes, respectively, of the oligonucleotide probe
  • the length is 55-105 bp.
  • the genomic DNA of the sample to be tested is interrupted into fragments of 200 to 300 b in size.
  • the terminal treatment comprises performing a terminal repair to form a blunt-end phosphorylated DNA fragment, and adding a base "A" at the 3' end of the blunt-ended phosphorylated DNA fragment. " and further connect the tags.
  • the sequence capture libraries from the plurality of different samples to be tested are mixed and then hybridized with the chip of the step A, each library having a different The base sequence of the label is different from each other, and the length of the base sequence of the tag is preferably 6 to 8 bp.
  • the data analysis may further include:
  • the sequence obtained in step i is compared with the comparison software, and the comparison software preferably uses SOAP or BWA;
  • Iii Performing a subsequent analysis on the sequence aligned to the target region, where the target region refers to the region of the exon sequence of the drug-related gene; Iv. Perform variation analysis after passing the data quality control, and the variation analysis includes detecting at least one of the following: a single nucleotide polymorphism SNP, an insertion and deletion INDEL, a structural variation SV, and a copy number variation CNV.
  • a method for constructing a gene standard type database for a drug response-related gene of the present invention which provides a unified standard database for genotyping of 48 drug reaction related genes. Based on the database, the drug-related genes of known genotypes can quickly and accurately give the corresponding genotype information of the sample to be tested, thereby providing a more accurate auxiliary basis for clinical drug use, and having good clinical guidance. effect.
  • the gene standard type database of the drug reaction related gene of the present invention all unknown polymorphic sites of 48 drug reaction related genes can be detected, which can be used as a kind of data accumulation, and new effects are found for the research.
  • the basis for the polymorphic site of drug response is the basic research on genes related to drug response.
  • the genotyping method of the drug reaction-related gene of the present invention uses a chip to capture the exons of 48 drug-related genes, and an experiment can simultaneously detect up to hundreds of samples, which not only increases the number of detection samples, but also The cost of testing each sample is greatly reduced.
  • the method for detecting a drug reaction of the present invention using a hgl9 genome as a reference sequence to establish a drug reaction-related database, combined with sequencing and bioinformatics analysis, accurately giving a mutated base type of each polymorphic site On the basis, the corresponding genotypes can be distinguished and corresponding drug response information can be given.
  • the present invention also provides an overall technical scheme for constructing a gene standard type database of a drug reaction-related gene, genotyping of a drug reaction-related gene, and detection of a drug reaction, which is a Qualcomm after sequence capture of a target region. Based on the sequencing, specifically, the following steps may be included:
  • hgl9 is the reference sequence ⁇
  • the information mainly includes the genotype name, the amino acid mutation information corresponding to the genotype, the mutation information of the genotype and the specific sequence, the drug reaction information corresponding to the genotype, and the reference literature.
  • the "specific sequence” as used in the present application refers to a DNA sequence fragment or a cDNA sequence used as a reference in the study. Analysis of the collected data revealed that each type of mutation information was given relative to one of the specific sequences; that is, 48 drug-response related genes had different genotype reference objects in different research data. There are also differences in the different genotypes of the same gene for different reference objects. For the inconsistency of the format on different data, it needs to be changed to a uniform format for subsequent collation.
  • the inventors first found the position of the drug-response-related gene on hgl9, and then from 5000 bp upstream of the CDS start position to 500 bp downstream of the CDS stop position as a region of drug-related genes, but some genotypes were separated from each other. The CDS area is farther away from the above range. For these genes, the inventors will set the region of this gene to be longer, in order to include the above mutation sites.
  • a specific sequence was BLAST aligned with hgl9. If the specific sequence is cDNA, the inventors used BLAT for alignment.
  • the specific sequence may be aligned with multiple positions on hgl9, selecting the alignment of the best position, analyzing the bases at each position, and obtaining a specific sequence with hgl9 at each Base correspondence at one position. It should be noted that if the alignment is on the negative strand of the stain, the base needs to be converted to a « on the positive strand.
  • genotypes of all drug-reactive genes were converted to hgl9-based mutation site information based on the alignment of specific sequences with hgl9.
  • the CDS starting position and the defined gene region of the above gene are required.
  • the mutation site information on some genotypes is negatively linked, and the negative strand information needs to be converted to a positive strand during the conversion.
  • TPMP activity may be
  • P*8 5 G>A ⁇ p order, 11 meters, increased risk of side effects.
  • TCAT NAT2 enzyme activity is positive, early isoniazid,
  • the exon sequence of the drug-related gene of the sample to be tested is determined, specifically:
  • the inventors used the human genome hg 19 as a reference sequence based on 48 drug reaction-related genes, and selected all exon regions of the 48 genes as target sequences, and the total length of the target sequences was about 160 kb.
  • an oligonucleotide capture probe of approximately 55-105 bp in length that is inversely complementary to the exon sequence is designed.
  • a high density of immobilized capture probes was immobilized on the chip to form a capture chip containing all of the exon capture probes of 48 drug response related genes.
  • the designed probe was produced by Roche-Nimblegen and assembled and fixed on the capture chip.
  • probe sequences are designed with reference to hgl9. Because of the differences in genomic sequences among different species, the probes are preferentially suitable for human genomic DNA capture, and other genomes of species with higher homology to the human genome can be applied. But the capture effect may not be as good as the human genome. Different species can design probes similar to the present invention based on their reference sequences for capture in target regions of different species.
  • the purified and fragmented DNA is recovered by the action of an enzyme such as T4 DNA polymerase, Klenow fragment and T4 polynucleotide kinase using dNTP as a substrate to form a blunt-ended terminal phosphorylated DNA fragment, which is then purified.
  • an enzyme such as T4 DNA polymerase, Klenow fragment and T4 polynucleotide kinase using dNTP as a substrate to form a blunt-ended terminal phosphorylated DNA fragment, which is then purified.
  • the Klenow fragment (3'-5'60-) polymerase and (1 D?) were used to add the base "A" to the 3' end of the purified terminal phosphorylated DNA fragment, followed by purification.
  • the purified DNA fragment with the end "A” was ligated to the tag linker using T4 DNA ligase, and the adaptor ligation product was purified using a kit.
  • the PCR product, the linker blocking sequence, and the Cotl DNA obtained above were mixed to constitute an exon capture library.
  • the libraries of the constructed multiple samples are mixed, in order to distinguish the libraries from different samples in the sequencing, each
  • the linker contains a different 6 bp or 8 bp Index base sequence, and the DNA mix amount of each library can be mixed in equal amounts or in a certain ratio as needed.
  • the mixing ratio is in accordance with the field. The specific research purpose or design requirements of the technician are determined.
  • the probe hybridizes to the target area
  • the hybrid library of the above plurality of samples was hybridized with the chip according to the Nimblegen solid phase chip hybridization standard operating instructions.
  • the hybridized DNA is eluted, purified, and then amplified using a linker sequence as a primer to obtain an amplified product.
  • the amplification products obtained above were subjected to quality control using Agilent 2100 and Q-PCR, and were ready for quality control.
  • Each of the amplified products constitutes a sequencing library.
  • the data analysis step can include the following steps:
  • Step one Filter Remove sequences with low mass values and contamination with sequencing primers
  • each base in the sequence corresponds to a sequencing quality value
  • the average quality value of the sequence is calculated, if the average quality value of the sequence is lower than The conventional empirical threshold, this sequence will be filtered out; on the other hand, the sequencing sequence may be contaminated by the Adapter connector on the machine, and the sequence containing the connector will also be filtered out.
  • the sequence filtered by step 1 is aligned with alignment software (such as SOAP, BWA). These alignment software is able to select an optimal alignment position for a sequence. For multiple repeat sequences in the alignment position, the software selects a position output and adds a label.
  • alignment software such as SOAP, BWA
  • Step 3 Select the sequence that is aligned to the target area.
  • step 2 the whole genome sequence of hgl9 is used as the reference sequence, and the sequence of the non-target region is compared to the corresponding position according to the best matching principle, and the comparison is not performed. target area.
  • the sequence aligned to the target area is selected for subsequent analysis, ensuring that the selected sequences are all target region sequences.
  • Data control includes multiple aspects, such as the percentage of aligned sequences, the percentage of unique reads (only one optimal alignment position when the sequence is aligned with the reference sequence), the ratio of duplication (same sequence), sequencing Depth, coverage of the target area, etc. These quality controls must meet the conventional empirical thresholds for further analysis. For example, the sequencing depth is consistent with expectations, and the single base depth overlay is subject to the Poisson distribution. Step 5 mutation detection
  • the mutation analysis can be performed, including detection of SNP (single nucleotide polymorphism), INDEL (insertion and deletion), SV (structural variation), and CNV (copy number variation). Each variation detection can be implemented in different ways as needed.
  • the mutation site information in each gene is sorted, and the genotype of each sample is obtained by comparing with the corresponding genotypes in the genotype standard database of the previously prepared drug reaction-related genes. Since humans are diploid organisms, there are at most two types of each gene type. The final classification of the drug-related genes is a homozygous or heterozygous type. Further, based on the drug reaction information corresponding to the drug reaction-related gene in the genotype standard database of the drug reaction-related gene, the corresponding drug reaction result of the sample can also be obtained.
  • the exon regions of 48 drug-reactive genes can be detected at one time, including all known and unknown polymorphic sites in the corresponding region.
  • the detected unknown polymorphic sites can be used as a data accumulation to discover new polymorphic sites affecting drug response; It not only has clinical guidance, but also has certain research significance.
  • the present invention establishes a drug reaction-related database using the hgl9 genome as a reference sequence, and combines sequencing and bioinformatics analysis to accurately distinguish the corresponding genotypes based on the mutated base type of each polymorphic site. Do not give information on the corresponding drug response.
  • the solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be considered as limiting the scope of the invention. In the examples, the specific techniques or conditions are not indicated, according to the techniques or conditions described in the literature in the field (for example, refer to J.
  • a gene standard type database for a drug reaction-related gene is constructed, specifically: the inventor collects all 48 existing drug-related reactions. Functional gene (see Table 1 above), via BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) comparison software, with the human genome-wide standard sequence hgl9 as the reference sequence, 48 All genotype sequences of the drug-response-related genes are aligned with the hgl9 reference sequence, and the mutation site information relative to hgl9 is obtained according to the comparison result, and all genotypes of the drug-response-related genes are converted into a uniform format and standard. . Annotated letters based on genes on the whole genome Interest, the genotype is converted to a type with hgl9 as the standard. Specifically, the following steps are included:
  • the information mainly includes the name of the genotype, the amino acid mutation information corresponding to the genotype, the mutation information of the genotype and the specific sequence, the drug reaction information corresponding to the genotype, and the reference literature.
  • the inventors first identified the position of the drug-response-related gene on hgl9, and then from 5000 bp upstream of the CDS start position to 500 bp downstream of the CDS stop position as a region of drug-related genes, but some genotype mutations The locus is far from the CDS region and is beyond the above range. For these genes, the inventor will set the region of this gene to be longer, in order to include the above mutation sites.
  • the alignment result of the best position is selected, and the bases at each position are analyzed to obtain a specific sequence with hgl9 at each Base correspondence at one position. It should be noted that if the alignment is on the negative strand of the stain, the base needs to be converted to a « on the positive strand.
  • genotypes of all drug-reactive genes were converted to hgl9-based mutation site information based on the alignment of specific sequences with hgl9.
  • the CDS starting position and the defined gene region of the above gene are required.
  • the mutation site information on some genotypes is negatively linked, and the negative strand information needs to be converted to a positive strand during the conversion.
  • the file format is sorted, and the drug reaction information corresponding to the genotype is also added, and then the correctness of the result is checked.
  • the experimental procedure section of this example is described as a hybrid of 50 samples including Yanhuang.
  • the number of samples in this example is used to explain the present invention, rather than limiting the number of samples that each chip can hybridize.
  • reagents in this example are shown in Table 3. Other reagents, consumables, and equipment are not indicated in Table 3, and are all general-purpose products that can be purchased through the market. Table 3 Reagents used in this example
  • a 3 g protein-free, RNA-free, and non-degrading inflammatory genomic DNA was used as a material and disrupted using a Covaris-S2 Ultrasonic Interrupter (Covaris, US) instrument.
  • the interrupt parameter settings are as follows: Duty/cycle ( % ) (load ratio) 10
  • Treatment Treatment (Process 3) Time(s) (Time (seconds)) 0
  • Treatment4 (Process 4) Time(s) (Time (seconds)) 0
  • the DNA fragment obtained after the interruption was recovered and purified, and a terminal repair reaction system was prepared in a 1.5 mL centrifuge tube to form a flattened terminal phosphorylated DNA fragment.
  • the above 100 ⁇ reaction mixture was lightly mixed, and then purified in a Thermomixer (Eppendorf) at 20 ° C for 30 min, and then purified by QIAquick PC purification kit, and the DNA was finally dissolved in 32 ⁇ L of ddH20.
  • reaction mixture was gently shaken and mixed uniformly. After centrifugation, it was placed in a Thermomixer (Eppendorf) at 20 ° C for 15 min. After the reaction, it was purified by MiniElute PCR Purification Kit, and finally the sample was dissolved in 25 ⁇ L of elution buffer.
  • Thermomixer Eppendorf
  • the DNA library after the adaptor was amplified by the primer sequence primer, and the amplification system and conditions were as follows:
  • the PCR program was 94 ° C for 2 min; 4 cycles of 94 ° C for 15 s, 62 ° C for 30 s, 72 ° C for 30 s; 72 ° C for 5 min.
  • the PCR product was purified using a QIAquick PCR purification kit with an elution volume of 30 ⁇ .
  • the tag linker consists of two parts, an Index base sequence and a linker sequence for distinguishing each library.
  • the construction of the exon library comprises hybridization of the prepared sequence capture library with the capture chip, enriching all exons of the 48 drug reaction-related genes onto the capture chip, eluting the hybridized capture chip, and eluting the product
  • the sequence of the exon is amplified by exon sequence to obtain an exon library, as follows:
  • Hybridization method is described in NimbleGen Arrays User's Guide, Version 3.1, 7 Jul 2009, Roche NimbleGen, Inc., which is incorporated herein by reference in its entirety.
  • the sample was loaded at 35 ⁇ l and hybridized at 42 ° C for 64-72 hr.
  • the sequence enriched on the chip was eluted with 900 ⁇ l of 160 mM NaOH, and the eluted product was purified by MinElute PCR purification kit. Finally eluted with 80 ⁇ l elution buffer.
  • PCR amplification was performed using the sequence eluted from the capture chip as a template, the system was Phusion Mix 150 ⁇ 1, the upstream and downstream primers were each 4.2 ⁇ l (Multixing sequencing primer and Phix Control kit), and the above 80 ⁇ l elution sample was added with 85 ⁇ l ⁇ 20 . After mixing, PCR was carried out in 6 tubes. PCR reaction conditions 94 °C, lmin; 16 cycles of 94 °C 30s, 58 V 30s, 72V 30s; 72V 5min. After the PCR reaction, 6 tubes were mixed and purified by magnetic beads purification using a QIAquick PCR purification kit to recover a fragment of 300-450 bp in an elution volume of 50 ⁇ l.
  • Two-way filtering of the data obtained by sequencing one is to sequence the quality value, and calculate the alkali matrix for the entire sequence. Measured value, when the average mass value of the whole sequence is lower than 10, it is filtered out; the second is to detect the contamination of the joint, and if the sequence contains the linker sequence, it is also filtered out.
  • the data filtered sequences were compared using BWA (Burrows- Wheeler Aligner) comparison software. When matching, each sequence can allow up to 5 mismatches, and open gap (allowing insertion and deletion when comparing). When a sequence has multiple optimal alignment positions, randomly select a position output, but There are tags. In the test of this example, the sequence on the sample alignment accounted for approximately 97% of all aligned sequences.
  • the sequences of the target regions aligned to the reference sequence are retained, and the next analysis is performed.
  • the data quality control includes the data volume of the sample, the amount of data filtered, the ratio of the sequence alignment to the upper sequence, whether the average depth of the sample is in line with expectations, whether the single base depth coverage map conforms to the Poisson distribution, and the target area of the sample. Coverage, etc.
  • data quality control includes two aspects. On the one hand, it is to see whether the samples are relatively consistent. If the data between the samples is similar, the requirements are met. If there are individual samples, the other samples are quite different. Explain that this sample is likely to have problems; on the other hand, each quality control data of each sample, those skilled in the art can determine a rough range based on experience, and different sequencing areas may have some changes, specifically , "Remaining amount after data filtering" is generally above 85%, the ratio of the aligned sequence (%) is more than 90%, the amount of remaining data after deduplication is more than 60%, and the proportion of unique reads is related to the specific sequencing target area and 90 Above %, the average depth meets the expected experimental design requirements, and the coverage is over 95%, which is acceptable.
  • the SNP is obtained by using samtools. After selecting the sequence to the target area, using samtools to convert the format and sorting, use the mpileup command to perform SNP Callings.
  • the original SNP also performs some filtering, including bits. Point depth, quality value, etc. Usually, the depth is in accordance with the requirements of 4-400, and the quality value is calculated by statistically calculating the significance of the quality value.
  • the mutation site information of each gene is extracted based on the region of each gene on the whole genome. Based on these mutation site information, the genotype information of the sample and the corresponding drug reaction information were determined by comparing with the gene standard type database of the drug reaction-related gene constructed in Example 1. The test results of some samples are shown in Table 7. Shown.
  • the genotyping method for the drug reaction-related gene and the method for detecting the drug reaction of the gene standard type database for constructing a drug reaction-related gene of the present invention can be effectively used for the genotyping of a drug-related gene and the detection of a drug reaction And save time and labor, low cost and accurate results.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided are a standard genotype database for medicament reaction-related genes and a construction method therefor, a method for genotyping medicament reaction-related genes, and a method for detecting medicament reaction effects. The method for constructing the standard genotype database for the medicament reaction-related genes comprises the following steps: comparing a specific sequence corresponding to genotype mutation information of a medicament reaction-related gene with the standard sequence of complete human genome, acquiring the corresponding relation on each base location between the specific sequence of the medicament reaction-related gene and the standard sequence of complete human genome; converting the genotype of the medicament reaction-related gene to a genotype corresponding to the standard sequence of complete human genome on the basis of the base corresponding relation acquired, and acquiring a standardized genotype of the medicament reaction-related gene.

Description

药物相关基因型别数据库、 基因分型及药物反应检测方法 优先权信息  Drug-related genotype databases, genotyping and drug response detection methods
本申请请求 2012 年 01 月 06 日向中国国家知识产权局提交的、 专利申请号为 201210002898.7的专利申请的优先权和权益, 并且通过参照将其全文并入此处。 技术领域  Priority is claimed on Japanese Patent Application No. 201210002898.7, the entire disclosure of which is hereby incorporated by reference. Technical field
本发明涉及基因检测领域,特别涉及药物反应相关基因的基因标准型别数据库及其 构建方法, 药物反应相关基因的基因分型方法, 以及药物反应作用的检测方法。 背景技术  The invention relates to the field of gene detection, in particular to a gene standard type database of a drug reaction related gene and a construction method thereof, a genotyping method of a drug reaction related gene, and a detection method of a drug reaction. Background technique
药物反应个体差异是临床常见的问题。 临床上的许多药物仅对部分患者有效,据估 计, 哮喘、 心血管以及精神病治疗药物有效率约为 60%, 多达 40%的患者疗效不理想 甚至无效。 同时, 部分患者对于常规治疗药物容易产生不良反应。 美国流行病学研究表 明, 6.7%的病人曾经发生过严重的副反应, 其中 0.32%是致命的, 是住院病人的第 4~6 大死亡原因。造成药物反应个体差异的因素有许多, 包括性别、年龄、体重等多个方面, 其中最主要的是遗传因素, 包括药物代谢、 转运以及作用靶点基因的遗传多态性。  Individual differences in drug response are common clinical problems. Many clinical drugs are only effective in some patients. It is estimated that asthma, cardiovascular and psychiatric treatments are about 60% effective, and up to 40% of patients are unsatisfactory or even ineffective. At the same time, some patients are prone to adverse reactions to conventional treatments. American epidemiological studies have shown that 6.7% of patients have had serious side effects, of which 0.32% are fatal and are the fourth to sixth leading cause of death in hospitalized patients. There are many factors that cause individual differences in drug response, including gender, age, and body weight. The most important ones are genetic factors, including drug metabolism, transport, and genetic polymorphism of target genes.
进入机体的药物主要在肝脏、 小肠内经过 I相(氧化还原、 水解反应)和 II相(结 合反应)代谢后排除体外。 参与 I相代谢反应的酶类主要为 CYP450家族, 其中研究较 多的为 CYP1、 CYP2以及 CYP3亚家族。 编码这些酶基因的多态性明显影响酶的活性, 从而影响药物在体内的代谢。 如普萘洛尔是 CYP2D6 的重要底物, 在不同个体中的血 药浓度最多可相差 20倍, 中国人群中 CYP2D6* 10等位基因的突变高达 51.6%, 是导致 中国人群中 CYP2D6 代谢活性下降的主要原因。 参与 II相反应的酶类包括巯嘌呤甲基 转移酶( TPMT )、N-乙酰基转移酶( NATs )、尿核苷二磷酸葡萄糖苷酰转移酶 (UGT1A1) 等。 其中 TPMT基因为纯和突变的白血病患者, 对于常规剂量的 6-巯基嘌呤会产生严 重的毒性, 导致严重的骨髓抑制和肝损害。 药物转运相关蛋白突变可导致机体药物累积 浓度过高, 或降低细胞内的药物浓度。 研究显示多药耐性基因 ABCB1 ( MDR1 ) 其突 变与多种抗癌药物的抗性密切相关。  The drugs that enter the body are mainly excreted in the liver and small intestine through phase I (redox, hydrolysis) and phase II (binding) metabolism. The enzymes involved in the phase I metabolic reaction are mainly the CYP450 family, among which the more studied are the CYP1, CYP2 and CYP3 subfamilies. Polymorphisms encoding these enzyme genes significantly affect the activity of the enzyme, thereby affecting the metabolism of the drug in the body. For example, propranolol is an important substrate for CYP2D6. The blood concentration in different individuals can differ by up to 20 times. The mutation of CYP2D6* 10 allele in Chinese population is as high as 51.6%, which leads to the decrease of CYP2D6 metabolic activity in Chinese population. The main reason. Enzymes involved in the phase II reaction include guanidine methyltransferase (TPMT), N-acetyltransferase (NATs), and uridine nucleoside diphosphate glucosyltransferase (UGT1A1). Patients with leukemia whose TPMT gene is pure and mutated will have severe toxicity to conventional doses of 6-mercaptopurine, leading to severe myelosuppression and liver damage. Mutations in drug transport-associated proteins can result in excessive concentrations of the drug in the body or reduce the concentration of the drug in the cell. Studies have shown that the multidrug resistance gene ABCB1 (MDR1) is closely related to the resistance of various anticancer drugs.
根据个体基因型的检测来辅助指导临床用药剂量或针对性用药,能有效减少和避免 不良反应的发生, 达到最佳的治疗效果。 目前针对药物相关的基因多态性检测如 PC / FLP, 探针杂交等, 大多存在检测位点少, 通量低的不足, 其中用于多态性检测 最新的技术为基于多重 PCR的芯片杂交技术, 但是也存在检测位点数量限制且检测位 点必须为已知的缺陷。  According to the detection of individual genotypes, it can help to guide the clinical dose or targeted medication, which can effectively reduce and avoid the occurrence of adverse reactions and achieve the best therapeutic effect. At present, drug-related gene polymorphisms such as PC/FLP, probe hybridization, etc., mostly have few detection sites and low flux, and the latest technology for polymorphism detection is multiplex PCR-based chip hybridization. Technology, but there are also limitations in the number of detection sites and the detection sites must be known defects.
因而, 目前的药物反应相关基因的突变检测及基因分型的方法仍有待改进。 发明内容 本发明旨在至少解决现有技术中存在的技术问题之一。 为此, 本发明的一个目的是 提供一种药物反应相关基因的基因标准型别数据库及其构建方法, 及以该数据库为基础的 快速检测药物反应相关基因的基因分型的方法和药物反应作用的检测方法。 Therefore, the current methods for detecting mutations and genotyping of drug-related genes still need to be improved. Summary of the invention The present invention aims to solve at least one of the technical problems existing in the prior art. To this end, an object of the present invention is to provide a gene standard type database for a drug reaction-related gene and a method for constructing the same, and a method for rapidly detecting a genotyping of a drug-related gene based on the database and a drug reaction Detection method.
因而, 根据本发明的一个方面, 本发明提供了一种构建药物反应相关基因的基因标 准型别数据库的方法。 根据本发明的实施例, 该方法包括以下步骤: 将药物反应相关基因 的基因型别的突变信息对应的特定序列与人类全基因组标准序列进行比对, 获得药物反应 相关基因的特定序列与人类全基因组标准序列在每个碱基位置上的对应关系; 根据比对获 得的碱基对应关系, 将药物反应相关基因的基因型别转换成相对于人类全基因组标准序列 的基因型, 获得药物反应相关基因的标准化基因型别。 发明人惊奇地发现, 利用本发明的 构建药物反应相关基因的基因标准型别数据库的方法, 能够有效地构建药物反应相关基 因的基因标准型别数据库, 并且该数据库能够为药物反应相关基因的基因分型提供一个 统一标准, 从而, 基于该数据库, 对于已知基因型的药物反应相关基因, 能够快速准确 地确定待测样本的基因型别, 进一步, 能够辅助指导对待测样本来源的患者的临床用药 剂量或针对性用药, 从而能有效减少和避免患者发生不良药物反应, 达到最佳的治疗效 果。  Thus, in accordance with one aspect of the present invention, the present invention provides a method of constructing a gene standard type database of a drug response related gene. According to an embodiment of the present invention, the method comprises the steps of: comparing a specific sequence corresponding to the mutation information of the genotype of the drug reaction-related gene with a human whole genome standard sequence, and obtaining a specific sequence of the drug reaction-related gene and the human whole Correspondence of the genomic standard sequence at each base position; according to the base correspondence obtained by the alignment, the genotype of the drug reaction-related gene is converted into a genotype relative to the human genome-wide standard sequence, and the drug reaction is correlated The standardized genotype of the gene. The inventors have surprisingly found that the gene standard type database for constructing a drug reaction-related gene can be efficiently constructed by using the gene standard type database of the drug reaction-related gene of the present invention, and the database can be a gene for a drug reaction-related gene. The classification provides a uniform standard, whereby, based on the database, the drug response-related genes of known genotypes can quickly and accurately determine the genotype of the sample to be tested, and further, can assist in guiding the clinical analysis of the patient from which the sample is to be measured. The dosage or the targeted medication can effectively reduce and avoid the adverse drug reaction of the patient and achieve the best therapeutic effect.
根据本发明的另一方面, 本发明还提供了一种药物反应相关基因的基因标准型别数 据库, 其是采用前面所述的本发明的构建药物反应相关基因的基因标准型别数据库的方法 所构建的。 发明人发现, 本发明的药物反应相关基因的基因标准型别数据库, 能够为药 物反应相关基因的基因分型提供一个统一标准, 从而, 基于该数据库, 对于已知基因型 的药物反应相关基因, 能够快速准确地确定待测样本的基因型别, 进一步, 能够辅助指 导对待测样本来源的患者的临床用药剂量或针对性用药,从而能有效减少和避免患者发 生不良药物反应, 达到最佳的治疗效果。  According to another aspect of the present invention, the present invention also provides a gene standard type database of a drug reaction-related gene, which is a method for constructing a gene standard type database of a drug reaction-related gene of the present invention as described above. Constructed. The inventors have found that the gene standard type database of the drug reaction-related gene of the present invention can provide a unified standard for genotyping of a drug reaction-related gene, and thus, based on the database, a drug reaction-related gene of a known genotype, It can quickly and accurately determine the genotype of the sample to be tested, and further, can assist in guiding the clinical dose or targeted medication of the patient from which the sample is to be measured, thereby effectively reducing and avoiding adverse drug reactions in patients and achieving optimal treatment. effect.
根据本发明的又一方面, 本发明还提供了一种药物反应相关基因的基因分型方法。 根据本发明的实施例, 该方法包括: 获取待测样本药物反应相关基因的外显子序列, 采用 高通量测序平台测序并进行数据分析, 将分析结果与前面所述的本发明的药物反应相关基 因的基因标准型别数据库进行比较, 从而得到待测样本的基因型别。 利用该方法, 对于已 知基因型的药物反应相关基因, 能够快速准确地确定待测样本的基因型别, 进一步, 能 够辅助指导待测样本来源的患者的临床用药剂量或针对性用药,从而能有效减少和避免 患者发生不良药物反应, 达到最佳的治疗效果。  According to still another aspect of the present invention, the present invention also provides a method for genotyping a drug reaction-related gene. According to an embodiment of the present invention, the method comprises: obtaining an exon sequence of a drug reaction-related gene of a sample to be tested, sequencing by a high-throughput sequencing platform, and performing data analysis, and reacting the analysis result with the drug of the present invention described above The gene standard type databases of related genes are compared to obtain the genotype of the sample to be tested. By using the method, the genotypes of the drug-response related genes of the known genotype can quickly and accurately determine the genotype of the sample to be tested, and further, can assist the clinical dosage or the targeted drug for guiding the patient of the sample to be tested, thereby enabling Effectively reduce and avoid adverse drug reactions in patients to achieve the best therapeutic effect.
根据本发明的再一方面, 本发明还提供了一种药物反应作用的检测方法。根据本发明 的实施例, 该方法包括: 获取待测样本药物反应相关基因的外显子序列, 采用高通量测序 平台测序并进行数据分析, 将分析结果与前面所述的本发明提供的含有药物反应作用信息 的药物反应相关基因的基因标准型别数据库进行比较, 得到待测样本的基因型别, 根据该 待测样本的基因型别对应的药物反应作用信息获得待测样本的药物反应作用结果。 利用该 方法, 对于已知基因型的药物反应相关基因, 能够快速准确地确定待测样本的基因型别 以及相应的药物反应作用结果, 进一步, 能够辅助指导待测样本来源的患者的临床用药 剂量或针对性用药, 从而能有效减少和避免患者发生不良药物反应, 达到最佳的治疗效 果。 本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得 明显, 或通过本发明的实践了解到。 附图说明 According to still another aspect of the present invention, the present invention also provides a method for detecting a drug reaction. According to an embodiment of the present invention, the method comprises: obtaining an exon sequence of a drug reaction-related gene of a sample to be tested, sequencing by a high-throughput sequencing platform, and performing data analysis, and the analysis result is provided by the present invention provided by the present invention. Comparing the gene standard type database of the drug reaction related genes of the drug reaction information, obtaining the genotype of the sample to be tested, and obtaining the drug reaction of the sample to be tested according to the drug reaction information corresponding to the genotype of the sample to be tested result. By using this method, the drug response-related genes of known genotypes can quickly and accurately determine the genotype of the sample to be tested and the corresponding drug reaction results, and further, can assist in guiding the clinical drug use of the sample source of the sample to be tested. Dosage or targeted medication can effectively reduce and avoid adverse drug reactions in patients to achieve the best therapeutic effect. The additional aspects and advantages of the invention will be set forth in part in the description which follows. DRAWINGS
本发明的上述和 /或附加的方面和优点从结合下面附图对实施例的描述中将变得明 显和容易理解, 其中:  The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from
图 1为根据本发明一个实施例,本发明的药物反应相关基因的基因分型及药物反应 作用的检测方法中,获取待测样本药物反应相关基因的外显子序列的步骤的具体流程示 意图;  1 is a schematic flow chart showing the steps of obtaining an exon sequence of a drug reaction-related gene of a sample to be tested in the method for detecting genotyping and drug reaction of a drug reaction-related gene according to an embodiment of the present invention;
图 2为根据本发明一个实施例,本发明的药物反应相关基因的基因分型及药物反应 作用的检测方法中, 数据分析步骤的流程示意图。 发明详细描述  Fig. 2 is a flow chart showing the steps of data analysis in the method for detecting genotyping and drug reaction of a drug reaction-related gene according to an embodiment of the present invention. Detailed description of the invention
下面详细描述本发明的实施例。 下面通过参考附图描述的实施例是示例性的,仅用 于解释本发明, 而不能理解为对本发明的限制。在本发明的描述中,除非另有说明, "多 个" 的含义是两个或两个以上。  Embodiments of the present invention are described in detail below. The embodiments described below with reference to the drawings are intended to be illustrative only and not to limit the invention. In the description of the present invention, "multiple" means two or more unless otherwise stated.
根据本发明的一个方面, 本发明提供了一种构建药物反应相关基因的基因标准型别 数据库的方法。 根据本发明的实施例, 该方法包括以下步骤: 将药物反应相关基因的基因 型别的突变信息对应的特定序列与人类全基因组标准序列进行比对, 获得药物反应相关基 因的特定序列与人类全基因组标准序列在每个碱基位置上的对应关系; 根据比对获得的碱 基对应关系, 将药物反应相关基因的基因型别转换成相对于人类全基因组标准序列的基因 型, 获得药物反应相关基因的标准化基因型别。 发明人惊奇地发现, 利用本发明的构建药 物反应相关基因的基因标准型别数据库的方法, 能够有效地构建药物反应相关基因的基 因标准型别数据库, 并且该数据库能够为药物反应相关基因的基因分型提供一个统一标 准, 从而, 基于该数据库, 对于已知基因型的药物反应相关基因, 能够快速准确地确定 待测样本的基因型别, 进一步, 能够辅助指导对待测样本来源的患者的临床用药剂量或 针对性用药, 从而能有效减少和避免患者发生不良药物反应, 达到最佳的治疗效果。  According to one aspect of the invention, the invention provides a method of constructing a gene standard type database for a drug response related gene. According to an embodiment of the present invention, the method comprises the steps of: comparing a specific sequence corresponding to the mutation information of the genotype of the drug reaction-related gene with a human whole genome standard sequence, and obtaining a specific sequence of the drug reaction-related gene and the human whole Correspondence of the genomic standard sequence at each base position; according to the base correspondence obtained by the alignment, the genotype of the drug reaction-related gene is converted into a genotype relative to the human genome-wide standard sequence, and the drug reaction is correlated The standardized genotype of the gene. The inventors have surprisingly found that the gene standard type database for constructing a drug reaction-related gene can be efficiently constructed by using the gene standard type database of the drug reaction-related gene of the present invention, and the database can be a gene for a drug reaction-related gene. The classification provides a uniform standard, whereby, based on the database, the drug response-related genes of known genotypes can quickly and accurately determine the genotype of the sample to be tested, and further, can assist in guiding the clinical analysis of the patient from which the sample is to be measured. The dosage or the targeted medication can effectively reduce and avoid the adverse drug reaction of the patient and achieve the best therapeutic effect.
根据本发明的一个实施例, 药物反应相关基因包括选自 ABCB 1、 ABCG2、 ADRB 1、 APC、 A G ASL、 ASS1、 BCHE、 BRAF、 CDKN2A、 CPS1、 CYP19A CYP1A2、 CYP1B CYP2B6、 CYP2C19、 CYP2C9、 CYP2D6、 CYP2E CYP3A4、 CYP3A5、 CYP3A7、 CYP4F2、 DPYD、 EGFR、 EG ERBB2、 F2、 F5、 G6PD、 GSTA HLA-B、 KIT、 K AS、 MTHFR、 NAGS、 NAT 1、NAT2、NRAS、 OTC、 RN SLCOIB SULT1A1、 TPMT、 TYMS、UGT1A1、 VKO C X CC1等 48个人类药物反应相关基因的至少一种。 根据本发明的另一个实施例,上述方法进一步包括将药物反应相关基因定位于人类全 基因组标准序列上, 确定药物反应相关基因编码序列的起始位置和终止位置, 获得药物反 应相关基因的基因型别突变信息对应的特定序列。 According to an embodiment of the present invention, the drug reaction-related gene comprises a group selected from the group consisting of ABCB 1, ABCG2, ADRB 1, APC, AG ASL, ASS1, BCHE, BRAF, CDKN2A, CPS1, CYP19A CYP1A2, CYP1B CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP2E CYP3A4, CYP3A5, CYP3A7, CYP4F2, DPYD, EGFR, EG ERBB2, F2, F5, G6PD, GSTA HLA-B, KIT, K AS, MTHFR, NAGS, NAT 1, NAT2, NRAS, OTC, RN SLCOIB SULT1A1, TPMT At least one of 48 human drug-related genes such as TYMS, UGT1A1, VKO CX CC1, and the like. According to another embodiment of the present invention, the method further comprises locating a drug reaction-related gene on a human genome-wide standard sequence, determining a start position and a termination position of a coding sequence of the drug-related gene, and obtaining a genotype of the drug-related gene. Do not mutate the specific sequence corresponding to the information.
根据本发明的一个实施例, 药物反应相关基因的基因型别突变信息对应的特定序列包 含药物反应相关基因的编码序列的起始位置上游 5000bp至编码序列终止位置下游的 500bp 区域。  According to an embodiment of the present invention, the specific sequence corresponding to the genotype mutation information of the drug reaction-related gene comprises 5000 bp upstream from the start position of the coding sequence of the drug reaction-related gene to a 500 bp region downstream of the end position of the coding sequence.
根据本发明的一个优选实施例, 人类全基因组标准序列为 hgl9。  According to a preferred embodiment of the invention, the human whole genome standard sequence is hgl9.
根据本发明的另一方面, 本发明还提供了一种药物反应相关基因的基因标准型别数 据库, 其是采用前面所述的本发明的构建药物反应相关基因的基因标准型别数据库的方法 所构建的。 发明人发现, 本发明的药物反应相关基因的基因标准型别数据库, 能够为药 物反应相关基因的基因分型提供一个统一标准, 从而, 基于该数据库, 对于已知基因型 的药物反应相关基因, 能够快速准确地确定待测样本的基因型别, 进一步, 能够辅助指 导对待测样本来源的患者的临床用药剂量或针对性用药,从而能有效减少和避免患者发 生不良药物反应, 达到最佳的治疗效果。  According to another aspect of the present invention, the present invention also provides a gene standard type database of a drug reaction-related gene, which is a method for constructing a gene standard type database of a drug reaction-related gene of the present invention as described above. Constructed. The inventors have found that the gene standard type database of the drug reaction-related gene of the present invention can provide a unified standard for genotyping of a drug reaction-related gene, and thus, based on the database, a drug reaction-related gene of a known genotype, It can quickly and accurately determine the genotype of the sample to be tested, and further, can assist in guiding the clinical dose or targeted medication of the patient from which the sample is to be measured, thereby effectively reducing and avoiding adverse drug reactions in patients and achieving optimal treatment. effect.
根据本发明的一个实施例, 药物反应相关基因的基因标准型别数据库中, 药物反应相 关基因的各标准化基因型别对应有药物反应作用的相关信息。  According to an embodiment of the present invention, in the gene standard type database of the drug reaction-related gene, each of the standardized genotypes of the drug-reactive gene corresponds to information on the drug response.
根据本发明的一个优选实施例, 药物反应相关基因包括以下表 1中所述的 48个人类药 物反应相关基因中的至少一种。  According to a preferred embodiment of the present invention, the drug response-related gene comprises at least one of the 48 human drug reaction-related genes described in Table 1 below.
人类药物反应相关基因  Human drug response related gene
Figure imgf000005_0001
Figure imgf000005_0001
根据本发明的又一方面, 本发明还提供了一种药物反应相关基因的基因分型方法。 根据本发明的实施例, 该方法包括: 获取待测样本药物反应相关基因的外显子序列, 采用 高通量测序平台测序并进行数据分析, 将分析结果与前面所述的本发明的药物反应相关基 因的基因标准型别数据库进行比较, 从而得到待测样本的基因型别。 利用该方法, 对于已 知基因型的药物反应相关基因, 能够快速准确地确定待测样本的基因型别, 进一步, 能 够辅助指导待测样本来源的患者的临床用药剂量或针对性用药,从而能有效减少和避免 患者发生不良药物反应, 达到最佳的治疗效果。 According to still another aspect of the present invention, the present invention also provides a method for genotyping a drug reaction-related gene. According to an embodiment of the present invention, the method comprises: obtaining an exon sequence of a drug reaction-related gene of a sample to be tested, sequencing by a high-throughput sequencing platform, and performing data analysis, and reacting the analysis result with the drug of the present invention described above The gene standard type databases of related genes are compared to obtain the genotype of the sample to be tested. Using this method, the gene response-related genes of known genotypes can quickly and accurately determine the genotype of the sample to be tested, and further, It can help to guide the clinical dose or targeted medication of the patients from the source of the sample to be tested, so as to effectively reduce and avoid the adverse drug reaction of the patient and achieve the best therapeutic effect.
根据本发明的再一方面, 本发明还提供了一种药物反应作用的检测方法。根据本发明 的实施例, 该方法包括: 获取待测样本药物反应相关基因的外显子序列, 采用高通量测序 平台测序并进行数据分析, 将分析结果与前面所述的本发明提供的含有药物反应作用信息 的药物反应相关基因的基因标准型别数据库进行比较, 得到待测样本的基因型别, 根据该 待测样本的基因型别对应的药物反应作用信息获得待测样本的药物反应作用结果。 利用该 方法, 对于已知基因型的药物反应相关基因, 能够快速准确地确定待测样本的基因型别 以及相应的药物反应作用结果, 进一步, 能够辅助指导待测样本来源的患者的临床用药 剂量或针对性用药, 从而能有效减少和避免患者发生不良药物反应, 达到最佳的治疗效 果。  According to still another aspect of the present invention, the present invention also provides a method for detecting a drug reaction. According to an embodiment of the present invention, the method comprises: obtaining an exon sequence of a drug reaction-related gene of a sample to be tested, sequencing by a high-throughput sequencing platform, and performing data analysis, and the analysis result is provided by the present invention provided by the present invention. Comparing the gene standard type database of the drug reaction related genes of the drug reaction information, obtaining the genotype of the sample to be tested, and obtaining the drug reaction of the sample to be tested according to the drug reaction information corresponding to the genotype of the sample to be tested result. By using this method, the drug response-related genes of known genotypes can quickly and accurately determine the genotype of the sample to be tested and the corresponding drug reaction results, and further, can assist in guiding the clinical dose of the patient from which the sample is to be tested. Or targeted medication, which can effectively reduce and avoid adverse drug reactions in patients, to achieve the best therapeutic effect.
根据本发明的一个实施例,获取待测样本药物反应相关基因的外显子序列是通过以下 步骤实现的:  According to an embodiment of the present invention, obtaining the exon sequence of the drug reaction-related gene of the sample to be tested is achieved by the following steps:
A、 制备能够捕获药物反应相关基因外显子序列的芯片, 所述芯片上含有与药物反应相 关基因外显子序列反向互补的寡核苷酸探针;  A. preparing a chip capable of capturing an exon sequence of a drug reaction-related gene, the chip comprising an oligonucleotide probe which is inversely complementary to a gene reaction-related gene exon sequence;
B、 用待测样本的基因组 DNA制备序列捕获文库, 包括将待测样本基因组 DNA打断 为 200~500bp大小的片段, 进行末端处理后扩增得到序列捕获文库;  B. Preparing a sequence capture library using the genomic DNA of the sample to be tested, comprising interrupting the genomic DNA of the sample to be tested into a fragment of 200-500 bp in size, and performing terminal treatment to obtain a sequence capture library;
C、 将步骤 B制备得到的序列捕获文库与步骤 A的芯片杂交, 从而获取得到待测样本 的药物反应相关基因外显子文库。  C. The sequence capture library prepared in step B is hybridized with the chip of step A to obtain a drug reaction-related gene exon library of the sample to be tested.
根据本发明的一个实施例, 在上述步骤 A中, 芯片含有能分别与 48个药物反应相关 基因的所有外显子序列反向互补的寡核苷酸探针, 该寡核苷酸探针的长度为 55-105bp。  According to an embodiment of the present invention, in the above step A, the chip contains an oligonucleotide probe which can be inversely complementary to all exon sequences of 48 drug reaction-related genes, respectively, of the oligonucleotide probe The length is 55-105 bp.
根据本发明的再一个实施例, 在上述步骤 B 中, 将待测样本基因组 DNA打断为 200~300b 大小的片段。  According to still another embodiment of the present invention, in the above step B, the genomic DNA of the sample to be tested is interrupted into fragments of 200 to 300 b in size.
根据本发明的另一个实施例, 在上述步骤 B 中, 末端处理包括进行末端修复形成平 末端磷酸化的 DNA片段, 并在该平末端磷酸化的 DNA片段的 3'末端加上碱基 "A" , 并 进一步连接标签。  According to another embodiment of the present invention, in the above step B, the terminal treatment comprises performing a terminal repair to form a blunt-end phosphorylated DNA fragment, and adding a base "A" at the 3' end of the blunt-ended phosphorylated DNA fragment. " and further connect the tags.
进一步, 根据本发明的一个实施例, 在上述步骤 C 中, 在进行杂交之前, 将来自多 个不同待测样本的序列捕获文库混合后再同时与步骤 A的芯片杂交, 每个文库带有不同的 标签( Index )碱基序列而相互区别, 该标签碱基序列长度优选为 6~8bp。  Further, according to an embodiment of the present invention, in the above step C, before the hybridization, the sequence capture libraries from the plurality of different samples to be tested are mixed and then hybridized with the chip of the step A, each library having a different The base sequence of the label is different from each other, and the length of the base sequence of the tag is preferably 6 to 8 bp.
根据本发明的一个实施例, 在本发明的药物反应相关基因的基因分型方法和药物反 应作用的检测方法中, 数据分析可以进一步包括:  According to an embodiment of the present invention, in the method for genotyping a drug reaction-related gene of the present invention and the method for detecting a drug reaction, the data analysis may further include:
i、 过滤去掉影响信息分析的低质量测序序列;  i. Filtering to remove low quality sequencing sequences that affect information analysis;
ii、 以人类全基因组标准序列为参考序列, 将步骤 i得到的序列用比对软件进行比对, 比对软件优选用 SOAP或 BWA;  Ii, using the human whole genome standard sequence as a reference sequence, the sequence obtained in step i is compared with the comparison software, and the comparison software preferably uses SOAP or BWA;
iii、选取比对到目标区域的序列进行后续分析, 所述目标区域是指药物反应相关基因外 显子序列所在区域; iv、 数据质控合格后进行变异分析, 所述变异分析包括检测以下中的至少一种: 单核苷 酸多态性 SNP、 插入和删除 INDEL、 结构性变异 SV、 拷贝数变异 CNV。 Iii. Performing a subsequent analysis on the sequence aligned to the target region, where the target region refers to the region of the exon sequence of the drug-related gene; Iv. Perform variation analysis after passing the data quality control, and the variation analysis includes detecting at least one of the following: a single nucleotide polymorphism SNP, an insertion and deletion INDEL, a structural variation SV, and a copy number variation CNV.
需要说明的是, 由于采用了以上技术方案, 使本发明至少具备以下有益效果: It should be noted that, due to the adoption of the above technical solutions, the present invention has at least the following beneficial effects:
1、 本发明的构建药物反应相关基因的基因标准型别数据库的方法, 能够为 48个药 物反应相关基因的基因分型提供一个统一标准的数据库。 基于该数据库, 对于已知基因型 的药物反应相关基因, 能够快速准确地给出待测样本的相应基因型别信息, 从而能够为临 床用药提供更为准确的辅助依据, 具有很好的临床指导作用。 同时, 利用本发明的药物反 应相关基因的基因标准型别数据库, 还可以检测出 48个药物反应相关基因的所有未知的 多态性位点, 可作为一种数据积累, 为研究发现新的影响药物反应的多态性位点奠定基础, 是药物反应相关基因的基础研究。 1. A method for constructing a gene standard type database for a drug response-related gene of the present invention, which provides a unified standard database for genotyping of 48 drug reaction related genes. Based on the database, the drug-related genes of known genotypes can quickly and accurately give the corresponding genotype information of the sample to be tested, thereby providing a more accurate auxiliary basis for clinical drug use, and having good clinical guidance. effect. At the same time, by using the gene standard type database of the drug reaction related gene of the present invention, all unknown polymorphic sites of 48 drug reaction related genes can be detected, which can be used as a kind of data accumulation, and new effects are found for the research. The basis for the polymorphic site of drug response is the basic research on genes related to drug response.
2、 本发明的药物反应相关基因的基因分型方法, 利用芯片捕获 48个药物相关基因的 外显子, 一次实验能够同时检测多达上百个样本, 不仅提高了检测样本的数量, 同时也大 大降低了每个样本的检测费用。  2. The genotyping method of the drug reaction-related gene of the present invention uses a chip to capture the exons of 48 drug-related genes, and an experiment can simultaneously detect up to hundreds of samples, which not only increases the number of detection samples, but also The cost of testing each sample is greatly reduced.
3、 本发明的药物反应作用的检测方法, 以 hgl9基因组为参考序列建立药物反应相关 的数据库, 结合测序和生物信息学分析, 在准确地给出各个多态性位点的突变碱基型的基 础上, 能区分相应基因型别并给出相应的药物反应信息。  3. The method for detecting a drug reaction of the present invention, using a hgl9 genome as a reference sequence to establish a drug reaction-related database, combined with sequencing and bioinformatics analysis, accurately giving a mutated base type of each polymorphic site On the basis, the corresponding genotypes can be distinguished and corresponding drug response information can be given.
进一步, 本发明还提供了构建药物反应相关基因的基因标准型别数据库、 药物反应相 关基因的基因分型及药物反应作用的检测的整体技术方案, 其是以目标区域经序列捕获后 的高通量测序为基础进行的, 具体地, 可以包括以下步骤:  Further, the present invention also provides an overall technical scheme for constructing a gene standard type database of a drug reaction-related gene, genotyping of a drug reaction-related gene, and detection of a drug reaction, which is a Qualcomm after sequence capture of a target region. Based on the sequencing, specifically, the following steps may be included:
一、 药物反应相关基因型别标准化数据库的构建  I. Construction of a standardized database for genotypes related to drug reactions
收集现有的全部与药物反应相关的 48 个功能基因 (见上文表 1 ), 通过 BLAST ( http://blast.ncbi.nlm.nih. gov/Blast.cgi )比对软件, 以人类全基因组标准序列 hgl9为参考序 歹 |J , 将 48个药物反应相关基因的所有基因型别序列与 hgl9参考序列比对, 才艮据比对结果 得到相对于 hgl9的突变位点信息, 将药物反应相关基因的所有基因型别转换成统一的格式 和标准。 根据基因在全基因组上的注释信息, 将基因型别转换为以 hgl9为标准的型别。 具 体包括以下步骤:  Collect all available 48 functional genes related to drug response (see Table 1 above), and compare them to BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) The genomic standard sequence hgl9 is the reference sequence 歹|J, and all genotype sequences of 48 drug-response-related genes are aligned with the hgl9 reference sequence, and the drug-response information is obtained relative to the hgl9 mutation site. All genotypes of related genes are converted into uniform formats and standards. The genotype is converted to a hgl9-based type based on the annotation information of the gene on the whole genome. Specifically, the following steps are included:
1. 收集药物反应相关基因的基因型别相关突变及酶活性信息  1. Collect genotype-related mutations and enzyme activity information of drug response-related genes
收集现有的 48个药物反应相关基因的所有基因型别的突变信息和型别与酶活性相关信 息。 这些信息主要包括基因型别的名称、 基因型别对应的氨基酸突变信息、 基因型别与特 定序列的突变信息、 基因型别对应的药物反应信息、 参考文献等。 需要说明的是, 本申请 中所述 "特定序列 ", 是指研究中所采用的作为参考的 DNA序列片段或者一段 cDNA序列。 对收集的资料分析发现, 每个型别的突变信息都是相对于其中一个特定序列给出的; 也就 是说, 不同的研究资料中, 48 个药物反应相关基因其基因型别的参考对象不同, 而针对不 同的参考对象, 同一个基因的不同基因型别也存在差异。 对于不同资料上格式的不一致, 需要改成统一的格式, 以便后续的整理。  Information on the mutation information and type and enzyme activity of all genotypes of the existing 48 drug response-related genes was collected. The information mainly includes the genotype name, the amino acid mutation information corresponding to the genotype, the mutation information of the genotype and the specific sequence, the drug reaction information corresponding to the genotype, and the reference literature. It should be noted that the "specific sequence" as used in the present application refers to a DNA sequence fragment or a cDNA sequence used as a reference in the study. Analysis of the collected data revealed that each type of mutation information was given relative to one of the specific sequences; that is, 48 drug-response related genes had different genotype reference objects in different research data. There are also differences in the different genotypes of the same gene for different reference objects. For the inconsistency of the format on different data, it needs to be changed to a uniform format for subsequent collation.
2. 收集基因在特定序列上的 CDS区域, 及基因在 hgl9上的位置 在收集的资料中, 很多基因型别突变信息是相对于给定的特定序列的, 并且, 突变位 点信息是以 1998公布的基因突变命名规则 ( Recommendations for a nomenclature system for human gene mutations. Nomenclature Working Group ) 为标准的 , 以基因的 CDS (编码序歹1 J ) 起始位置为 +1 的标准来给出突变位置的。 所以为了后续的分析, 需要找出所有基因在特定 序列上的 CDS起始位置。 又因为特定序列非常的长, 有些序列上包括了多个基因, 所以要 确定哪一段区域是发明人需要的药物反应相关的基因。 发明人是先找出药物反应相关基因 在 hgl9上的位置,然后从 CDS起始位置上游的 5000bp到 CDS终止位置下游的 500bp作为 药物反应相关基因的区域, 但有些基因型别的突变位点离 CDS区比较远, 超出了上述的范 围, 对于这些基因, 发明人会把这个基因的区域定得更长一些, 以嚢括上述突变位点为原 则。 2. Collect the CDS region of the gene on a specific sequence, and the position of the gene on hgl9 In the collected data, many genotype mutation information is relative to a given specific sequence, and the mutation site information is published in 1998. The nomenclature system for human gene mutations. Nomenclature Working Group) is a standard, giving the position of the mutation with the standard of the gene's CDS (coding sequence 歹1 J ) starting at +1. So for subsequent analysis, we need to find the CDS starting position of all genes on a particular sequence. And because the specific sequence is very long, and some sequences include multiple genes, it is necessary to determine which region is the drug response-related gene that the inventor needs. The inventors first found the position of the drug-response-related gene on hgl9, and then from 5000 bp upstream of the CDS start position to 500 bp downstream of the CDS stop position as a region of drug-related genes, but some genotypes were separated from each other. The CDS area is farther away from the above range. For these genes, the inventors will set the region of this gene to be longer, in order to include the above mutation sites.
3. BLAST比对  3. BLAST comparison
将特定序列与 hgl9进行 BLAST比对。 如果特定序列是 cDNA, 发明人用 BLAT进行 比对。  A specific sequence was BLAST aligned with hgl9. If the specific sequence is cDNA, the inventors used BLAT for alignment.
4. 确定特定序列与 hgl9的突变信息  4. Determine mutation information for specific sequences and hgl9
在比对结果中, 特定序列可能会比对上 hgl9的多个位置, 选择比对最好的一个位置的 比对结果, 对每一个位置上的碱基进行分析, 得到特定序列与 hgl9在每一个位置上的碱基 对应关系。 需要注意的是, 如果比对到染色上的负链上, 需要将碱基转换成正链上的 «。  In the alignment results, the specific sequence may be aligned with multiple positions on hgl9, selecting the alignment of the best position, analyzing the bases at each position, and obtaining a specific sequence with hgl9 at each Base correspondence at one position. It should be noted that if the alignment is on the negative strand of the stain, the base needs to be converted to a « on the positive strand.
5. 转换所有药物反应相关基因的基因型别  5. Convert genotypes of all drug response related genes
根据特定序列与 hgl9的比对情况,将所有药物反应相关基因的基因型别转换为以 hgl9 为标准的突变位点信息。 在进行坐标转换时, 需要用到上面基因的 CDS起始位置和定义的 基因区域。 有些基因型别上的突变位点信息都是负链的, 在转换时需要将负链信息转换为 正链。  The genotypes of all drug-reactive genes were converted to hgl9-based mutation site information based on the alignment of specific sequences with hgl9. When performing coordinate transformation, the CDS starting position and the defined gene region of the above gene are required. The mutation site information on some genotypes is negatively linked, and the negative strand information needs to be converted to a positive strand during the conversion.
6. 整理文件格式及检查  6. Organize file formats and check
整理文件格式, 将与基因型别对应的药物反应作用信息也加入进来, 具体例子如表格 2 所列。 之后再检查结果的正确性。  Organize the file format and add information on the drug response corresponding to the genotype. The specific examples are listed in Table 2. Then check the correctness of the results.
药物相关基因的标准化基因型数据库信息  Standardized genotype database information for drug-related genes
基因 染色 SNP突 INDEL突 氨基酸突  Gene staining SNP mutation INDEL mutation amino acid
基因 药物名称 药物反应作用 型别 体 变信息 变信息 变信息  Gene drug name drug reaction type body change information variable information variable information
TPMP酶活性可能 TPMP activity may be
TPM 1814871 6-巯基嘌呤、 6-硫鸟 下降, 服用药物毒TPM 1814871 6-mercaptopurine, 6-sulfur bird drop, taking drug poison
TPMT 6 一 215H TPMT 6 a 215H
P*8 5: G>A 嘌 p令、 11米 嘌呤 副作用危险性增 加。 P*8 5: G>A 嘌p order, 11 meters, increased risk of side effects.
TPM TPMP酶活性丧  TPM TPMP enzyme activity
1814889 6巯基嘌呤、 6-統鸟  1814889 6巯基嘌呤, 6-统鸟
TPMT T*3 6 一 A154T 失, 服用相应药物  TPMT T*3 6 A A154T lost, taking the corresponding drug
9: G>A 嘌呤、 咪唑 嘌呤  9: G>A 嘌呤, imidazole 嘌呤
Β 危险性非常大。  Β The danger is very large.
98386184  98386184
DPYD酶活性丧 DPYD enzyme activity
DPY 9834888 —9838618 5-氟尿嘧啶,卡培他 DPY 9834888 — 9838618 5-fluorouracil, capecita
DPYD 1 失, 服用相应药物  DPYD 1 lost, taking the corresponding drug
D*7 5: OT I 移码突变  D*7 5: OT I frameshift mutation
: Del: 滨, '4} A  : Del: Bin, '4} A
会导致药物中毒。  Can cause drug poisoning.
TCAT NAT2酶活性正 利福早, 异烟骈, TCAT NAT2 enzyme activity is positive, early isoniazid,
NAT 1825835 常, 服用相应药物 NAT 1825835 Often, taking the appropriate drug
NAT2 8 一 K282T 吡'秦酰胺, 异山梨 NAT2 8 - K282T Pyridyl 'Qinamide, Different Yamanashi
2*18 8: A>C 不会产生毒副作 醇 > 肼苯达'  2*18 8: A>C will not produce toxic by-products alcohol > 肼苯达'
用。 二、 药物反应相关基因的基因型别检测  use. 2. Genotypic detection of genes related to drug reactions
首先, 参照图 1所示的获取待测样本药物相关基因的外显子序列的步骤的具体流程 示意图, 确定待测样本药物相关基因的外显子序列, 具体地:  First, referring to the specific flow diagram of the step of obtaining the exon sequence of the drug-related gene of the sample to be tested shown in FIG. 1, the exon sequence of the drug-related gene of the sample to be tested is determined, specifically:
1、 药物反应相关基因外显子探针设计  1. Design of exon probes for drug-related genes
发明人根据 48个药物反应相关基因, 以人类基因组 hg 19为参考序列, 选取这 48个基 因的全部外显子区域作为靶序列, 靶序列长度之总和约 160kb。 针对每一个外显子序列, 设 计与外显子序列反向互补的长度约为 55-105bp的寡核苷酸捕获探针。 将设计的捕获探针高 密度的固定合成在芯片上, 形成包含 48个药物反应相关基因所有外显子捕获探针的捕获芯 片。 设计好的探针由 Roche-Nimblegen生产并合成固定在捕获芯片上。  The inventors used the human genome hg 19 as a reference sequence based on 48 drug reaction-related genes, and selected all exon regions of the 48 genes as target sequences, and the total length of the target sequences was about 160 kb. For each exon sequence, an oligonucleotide capture probe of approximately 55-105 bp in length that is inversely complementary to the exon sequence is designed. A high density of immobilized capture probes was immobilized on the chip to form a capture chip containing all of the exon capture probes of 48 drug response related genes. The designed probe was produced by Roche-Nimblegen and assembled and fixed on the capture chip.
上述探针序列是参照 hgl9设计的, 由于不同物种间基因组序列存在一定的差异, 因此 该探针优先适用于人源基因组 DNA捕获,其它跟人类基因组同源性较高的物种的基因组可 以适用, 但捕获效果可能不如人源基因组理想。 不同物种可以根据其参考序列设计跟本发 明类似的探针, 应用于不同物种靶区域的捕获。  The above probe sequences are designed with reference to hgl9. Because of the differences in genomic sequences among different species, the probes are preferentially suitable for human genomic DNA capture, and other genomes of species with higher homology to the human genome can be applied. But the capture effect may not be as good as the human genome. Different species can design probes similar to the present invention based on their reference sequences for capture in target regions of different species.
2、 基因组打断、 纯化  2. Genomic disruption and purification
以没有 RNA、 蛋白质污染且没有降解的人基因组 DNA作为实验材料, 利用物理或化 学的方法将 DNA打断成 200~300bp大小的片段, 使用相关回收试剂盒回收 DNA片段。  Human genomic DNA without RNA, protein contamination and no degradation was used as an experimental material, and DNA was broken into fragments of 200-300 bp by physical or chemical methods, and DNA fragments were recovered using a related recovery kit.
3、 末端修复、 纯化  3, end repair, purification
回收纯化的片段化 DNA通过 T4 DNA聚合酶、 Klenow片段和 T4多核苷酸激酶等酶 的作用以 dNTP为作用底物进行末端修复,形成补平的末端磷酸化的 DNA片段,然后纯化。  The purified and fragmented DNA is recovered by the action of an enzyme such as T4 DNA polymerase, Klenow fragment and T4 polynucleotide kinase using dNTP as a substrate to form a blunt-ended terminal phosphorylated DNA fragment, which is then purified.
4、 3'加 "A,,、 纯化  4, 3' plus "A,,, purification
利用 Klenow片段(3'-5'6 0-)聚合酶及(1 丁?,将经过纯化的末端磷酸化的 DNA片段的 3'末端加上碱基 "A" , 然后纯化。  The Klenow fragment (3'-5'60-) polymerase and (1 D?) were used to add the base "A" to the 3' end of the purified terminal phosphorylated DNA fragment, followed by purification.
5、 接头连接、 纯化  5, joint connection, purification
利用 T4 DNA连接酶, 将经过纯化的末端加 "A" 的 DNA片段与标签接头连接, 并用 试剂盒纯化接头连接产物。  The purified DNA fragment with the end "A" was ligated to the tag linker using T4 DNA ligase, and the adaptor ligation product was purified using a kit.
6、 连接产物 PCR、 定量  6. Linking products PCR, quantification
以标签接头序列引物对加接头后的 DNA文库进行扩增, 扩增产物经纯化后经 Agilent 2100和 NanoDrop定量、 质控合格后备用。  The DNA library after the adapter was amplified by the primer sequence primer, and the amplified product was purified and quantified by Agilent 2100 and NanoDrop, and the quality control was used for later use.
7、 PCR产物、 接头封闭序列、 Cotl DNA混合  7. PCR product, linker blocking sequence, Cotl DNA mix
将上述获得的 PCR产物、 接头封闭序列、 Cotl DNA混合以构成外显子捕获文库。  The PCR product, the linker blocking sequence, and the Cotl DNA obtained above were mixed to constitute an exon capture library.
然后, 将构建好的多个样本的文库混合, 为了在测序中区别来自不同样本的文库, 每 个文库的 DNA在添加标签接头接头时, 其接头中都含有不同的 6bp或 8bp的 Index碱基序 列, 每个文库 DNA混合量可根据需要等量或按照一定比例混合。 需要说明的是, 等量即在 需要每个样本测序数据量相同时, 每个文库混合 DNA量一致; 有的研究不同样本测序数据 量可能不同, 文库使用量也就不同, 混合比例按照本领域技术人员具体的研究目的或设计 要求来确定。 Then, the libraries of the constructed multiple samples are mixed, in order to distinguish the libraries from different samples in the sequencing, each When the DNA of the library is added with a tag-ligand linker, the linker contains a different 6 bp or 8 bp Index base sequence, and the DNA mix amount of each library can be mixed in equal amounts or in a certain ratio as needed. It should be noted that when the amount of sequencing data is the same for each sample, the amount of mixed DNA in each library is the same; some studies may have different amounts of sequencing data, and the amount of library used will be different. The mixing ratio is in accordance with the field. The specific research purpose or design requirements of the technician are determined.
8、 探针与目标区域杂交  8. The probe hybridizes to the target area
按照 Nimblegen固相芯片杂交标准操作说明,将上述多个样本的混合文库与芯片进行杂 交。  The hybrid library of the above plurality of samples was hybridized with the chip according to the Nimblegen solid phase chip hybridization standard operating instructions.
9、 探针洗脱、 纯化  9, probe elution, purification
将杂交后的 DNA进行洗脱、 纯化, 然后以接头序列为引物进行扩增, 以便获得扩增产 物。  The hybridized DNA is eluted, purified, and then amplified using a linker sequence as a primer to obtain an amplified product.
10、 QC (即质控)  10, QC (ie quality control)
利用 Agilent 2100和 Q-PCR将上述获得的扩增产物进行质控, 经质控合格后备用。 其 中各扩增产物构成测序文库。  The amplification products obtained above were subjected to quality control using Agilent 2100 and Q-PCR, and were ready for quality control. Each of the amplified products constitutes a sequencing library.
11、 上机测序 然后, 以人类基因组 hgl9 ( UCSC )为参考序列, 对获得的测序数据进行数据分析。 参 照图 2, 该数据分析步骤可以包括以下步骤:  11. Sequencing on the machine Then, using the human genome hgl9 (UCSC) as a reference sequence, the obtained sequencing data was analyzed. Referring to Figure 2, the data analysis step can include the following steps:
步骤一 过滤: 去掉质量值较低和有测序接头污染的序列  Step one Filter: Remove sequences with low mass values and contamination with sequencing primers
首先去掉影响信息分析的低质量测序序列: 序列中每个碱基分别对应一个测序质量值, 对于测序结果的一段序列, 计算这段序列的平均质量值, 若这条序列的平均质量值低于常 规的经验阈值, 这条序列会被过滤掉; 另一方面, 测序序列可能会被机器上的 Adapter接头 污染, 这部分含有接头的序列也会被过滤掉。  First, the low-quality sequencing sequence that affects the information analysis is removed: each base in the sequence corresponds to a sequencing quality value, and for a sequence of sequencing results, the average quality value of the sequence is calculated, if the average quality value of the sequence is lower than The conventional empirical threshold, this sequence will be filtered out; on the other hand, the sequencing sequence may be contaminated by the Adapter connector on the machine, and the sequence containing the connector will also be filtered out.
步骤二 与参考序列 (hgl9 )进行序列比对  Step 2 Sequence alignment with the reference sequence (hgl9)
以 hgl9( UCSC )为参考序列,将经过步骤一过滤后的序列用比对软件(如 SOAP, BWA ) 进行序列比对。 这些比对软件对于一段序列, 能够选择一个最佳的比对位置。 对于比对位 置有多个的重复序列, 软件会选择一个位置输出, 并添加一个标签。  Using hgl9 ( UCSC ) as a reference sequence, the sequence filtered by step 1 is aligned with alignment software (such as SOAP, BWA). These alignment software is able to select an optimal alignment position for a sequence. For multiple repeat sequences in the alignment position, the software selects a position output and adds a label.
步骤三 选取比对到目标区域的序列  Step 3 Select the sequence that is aligned to the target area.
芯片杂交后会捕获到部分非目标区域的序列, 步骤二中以 hgl9全基因组序列作为参考 序列, 非目标区域的序列就会 据最佳匹配原则比对到相应的位置, 而不会比对的目标区 域。 选取比对到目标区域的序列用于后续分析, 保证选取的序列都是目标区域序列。  After the hybridization of the chip, the sequence of some non-target regions is captured. In step 2, the whole genome sequence of hgl9 is used as the reference sequence, and the sequence of the non-target region is compared to the corresponding position according to the best matching principle, and the comparison is not performed. target area. The sequence aligned to the target area is selected for subsequent analysis, ensuring that the selected sequences are all target region sequences.
步骤四 数据控制  Step 4 Data Control
数据控制 (质控) 包括多个方面, 如比对上序列的百分比, unique reads (序列与参考 序列比对时只有一个最佳比对位置) 的百分比, duplication (相同的序列) 的比例, 测序深 度, 目标区域的覆盖度等。 这些质控要符合常规的经验阈值才能进行下一步的分析, 如测 序深度与预期一致, 单碱基深度覆盖图服从泊松分布。 步骤五 变异检测 Data control (QC) includes multiple aspects, such as the percentage of aligned sequences, the percentage of unique reads (only one optimal alignment position when the sequence is aligned with the reference sequence), the ratio of duplication (same sequence), sequencing Depth, coverage of the target area, etc. These quality controls must meet the conventional empirical thresholds for further analysis. For example, the sequencing depth is consistent with expectations, and the single base depth overlay is subject to the Poisson distribution. Step 5 mutation detection
数据质控合格后, 才能进行变异分析, 包括检测 SNP (单核苷酸多态性), INDEL (插 入和删除), SV (结构性变异), CNV (拷贝数变异)等。 每种变异检测可根据需要使用不 同的方式来实现。  After the data quality control is passed, the mutation analysis can be performed, including detection of SNP (single nucleotide polymorphism), INDEL (insertion and deletion), SV (structural variation), and CNV (copy number variation). Each variation detection can be implemented in different ways as needed.
步骤六 药物反应相关基因分型  Step 6 Drug reaction related genotyping
当变异检测分析完之后, 整理每个基因中的突变位点信息, 与之前整理好的药物反应 相关基因的基因型别标准数据库中的相应基因型别比较, 得到每个样本的基因型别。 由于 人是二倍体生物, 每个基因的型别最多只有两种型别, 最后药物反应相关基因的分型结果 是一种纯合型别或者杂合型别。 进一步, 基于药物反应相关基因的基因型别标准数据库中 药物反应相关基因对应的药物反应作用信息, 同时也可以得到样本的相应药物反应作用结 果。  After the mutation detection and analysis, the mutation site information in each gene is sorted, and the genotype of each sample is obtained by comparing with the corresponding genotypes in the genotype standard database of the previously prepared drug reaction-related genes. Since humans are diploid organisms, there are at most two types of each gene type. The final classification of the drug-related genes is a homozygous or heterozygous type. Further, based on the drug reaction information corresponding to the drug reaction-related gene in the genotype standard database of the drug reaction-related gene, the corresponding drug reaction result of the sample can also be obtained.
此外, 还需要说明的是, 目前对于药物反应相关基因突变检测的技术主要集中于单个 或几个基因的已知突变, 对于未知突变或者大量样本检测均存在耗时长、 费用高等限制因 素。 与现有技术相比, 本发明的上述整体技术方案明显具有以下几个优点:  In addition, it should be noted that the current techniques for detecting mutations in drug-related genes mainly focus on known mutations of single or several genes, and there are limitations such as long time-consuming and high-cost for unknown mutations or large sample tests. Compared with the prior art, the above overall technical solution of the present invention obviously has the following advantages:
一、可一次检测 48个药物反应相关基因的外显子区域, 包括相应区域所有已知和未知 的多态性位点。 除了根据检出的已知多态性位点作为辅助临床用药的依据, 检测出的未知 的多态性位点可作为一种数据积累, 用于发现新的影响药物反应的多态性位点; 不仅具有 临床指导作用, 还具有一定的研究意义。  First, the exon regions of 48 drug-reactive genes can be detected at one time, including all known and unknown polymorphic sites in the corresponding region. In addition to the detection of known polymorphic sites as a basis for adjuvant clinical use, the detected unknown polymorphic sites can be used as a data accumulation to discover new polymorphic sites affecting drug response; It not only has clinical guidance, but also has certain research significance.
二、 利用芯片捕获具有高通量的性质, 一次实验能够同时检测多达上百个样本, 不仅 提高了检测样本的数量, 同时也大大降低了每个样本的检测费用。  Second, the use of chip capture with high-throughput properties, one test can simultaneously detect up to hundreds of samples, not only improve the number of test samples, but also greatly reduce the cost of each sample.
三、 本发明以 hgl9基因组为参考序列建立药物反应相关的数据库, 结合测序和生物信 息学分析, 在准确地给出各个多态性位点的突变碱基型的基础上, 能区分相应基因型别并 给出相对应药物反应信息。 下面将结合实施例对本发明的方案进行解释。 本领域技术人员将会理解, 下面的实施 例仅用于说明本发明, 而不应视为限定本发明的范围。 实施例中未注明具体技术或条件的, 按照本领域内的文献所描述的技术或条件(例如参考 J.萨姆布鲁克等著, 黄培堂等译的《分 子克隆实验指南》, 第三版, 科学出版社)或者按照产品说明书进行。 所用试剂或仪器未注 明生产厂商者, 均为可以通过市购获得的常规产品, 例如可以采购自 Illumina公司。  3. The present invention establishes a drug reaction-related database using the hgl9 genome as a reference sequence, and combines sequencing and bioinformatics analysis to accurately distinguish the corresponding genotypes based on the mutated base type of each polymorphic site. Do not give information on the corresponding drug response. The solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be considered as limiting the scope of the invention. In the examples, the specific techniques or conditions are not indicated, according to the techniques or conditions described in the literature in the field (for example, refer to J. Sambrook et al., Huang Peitang et al., Molecular Cloning Experimental Guide, Third Edition, Science Press) or in accordance with the product manual. The reagents or instruments used are not specified by the manufacturer, and are conventional products that are commercially available, for example, from Illumina.
实施例 1:  Example 1:
本实施例根据本发明的构建药物反应相关基因的基因标准型别数据库的方法, 构建药 物反应相关基因的基因标准型别数据库, 具体地: 发明人收集现有的全部与药物反应相关 的 48个功能基因 (见上文表 1;), 通过 BLAST ( http://blast.ncbi.nlm.nih. gov/Blast.cgi ) 比对 软件, 以人类全基因组标准序列 hgl9为参考序列, 将 48个药物反应相关基因的所有基因 型别序列与 hgl9参考序列比对, 才艮据比对结果得到相对于 hgl9的突变位点信息, 将药物 反应相关基因的所有基因型别转换成统一的格式和标准。 根据基因在全基因组上的注释信 息, 将基因型别转换为以 hgl9为标准的型别。 具体包括以下步骤: In the present embodiment, according to the method for constructing a gene standard type database of a drug reaction-related gene of the present invention, a gene standard type database for a drug reaction-related gene is constructed, specifically: the inventor collects all 48 existing drug-related reactions. Functional gene (see Table 1 above), via BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) comparison software, with the human genome-wide standard sequence hgl9 as the reference sequence, 48 All genotype sequences of the drug-response-related genes are aligned with the hgl9 reference sequence, and the mutation site information relative to hgl9 is obtained according to the comparison result, and all genotypes of the drug-response-related genes are converted into a uniform format and standard. . Annotated letters based on genes on the whole genome Interest, the genotype is converted to a type with hgl9 as the standard. Specifically, the following steps are included:
1. 收集药物反应相关基因的基因型别相关突变及酶活性信息  1. Collect genotype-related mutations and enzyme activity information of drug response-related genes
收集现有的 48个药物反应相关基因的所有基因型别的突变信息和型别与药物反应作用 相关信息。 这些信息主要包括基因型别的名称、 基因型别对应的氨基酸突变信息、 基因型 别与特定序列的突变信息、 基因型别对应的药物反应信息、 参考文献等。  Collect information on the mutation information and type and drug response of all genotypes of the existing 48 drug response-related genes. The information mainly includes the name of the genotype, the amino acid mutation information corresponding to the genotype, the mutation information of the genotype and the specific sequence, the drug reaction information corresponding to the genotype, and the reference literature.
2. 收集基因在特定序列上的 CDS区域, 及基因在 hgl9上的位置  2. Collect the CDS region of the gene on a specific sequence, and the position of the gene on hgl9
找出所有基因在特定序列上的 CDS起始位置, 并确定哪一段区域是本发明需要的药物 反应相关的基因。具体地,发明人先找出了药物反应相关基因在 hgl9上的位置,然后从 CDS 起始位置上游的 5000bp到 CDS终止位置下游的 500bp作为药物反应相关基因的区域,但有 些基因型别的突变位点离 CDS区比较远, 超出了上述的范围, 对于这些基因, 发明人会把 这个基因的区域定得更长一些, 以嚢括上述突变位点为原则。  Find the CDS start position of all genes on a particular sequence and determine which region is the drug response-related gene required by the present invention. Specifically, the inventors first identified the position of the drug-response-related gene on hgl9, and then from 5000 bp upstream of the CDS start position to 500 bp downstream of the CDS stop position as a region of drug-related genes, but some genotype mutations The locus is far from the CDS region and is beyond the above range. For these genes, the inventor will set the region of this gene to be longer, in order to include the above mutation sites.
3. BLAST比对  3. BLAST comparison
将特定序列与 hgl9进行 BLAST比对。  A specific sequence was BLAST aligned with hgl9.
4. 确定特定序列与 hgl9的突变信息  4. Determine mutation information for specific sequences and hgl9
在比对结果中, 对于比对上 hgl9的多个位置的特定序列, 选择比对最好的一个位置的 比对结果, 对每一个位置上的碱基进行分析, 得到特定序列与 hgl9在每一个位置上的碱基 对应关系。 需要注意的是, 如果比对到染色上的负链上, 需要将碱基转换成正链上的 «。  In the alignment result, for the specific sequence of the multiple positions of hgl9, the alignment result of the best position is selected, and the bases at each position are analyzed to obtain a specific sequence with hgl9 at each Base correspondence at one position. It should be noted that if the alignment is on the negative strand of the stain, the base needs to be converted to a « on the positive strand.
5. 转换所有药物反应相关基因的基因型别  5. Convert genotypes of all drug response related genes
根据特定序列与 hgl9的比对情况,将所有药物反应相关基因的基因型别转换为以 hgl9 为标准的突变位点信息。 在进行坐标转换时, 需要用到上面基因的 CDS起始位置和定义的 基因区域。 有些基因型别上的突变位点信息都是负链的, 在转换时需要将负链信息转换为 正链。  The genotypes of all drug-reactive genes were converted to hgl9-based mutation site information based on the alignment of specific sequences with hgl9. When performing coordinate transformation, the CDS starting position and the defined gene region of the above gene are required. The mutation site information on some genotypes is negatively linked, and the negative strand information needs to be converted to a positive strand during the conversion.
6. 整理文件格式及检查  6. Organize file formats and check
整理文件格式, 将与基因型别对应的药物反应作用信息也加入进来, 之后再检查结果 的正确性。  The file format is sorted, and the drug reaction information corresponding to the genotype is also added, and then the correctness of the result is checked.
由此, 获得类似表 2所示的包含 48个药物反应相关基因的基因型别及药物反应作用信 息的药物反应相关基因的基因标准型别数据库。 实施例 2:  Thus, a gene standard type database similar to the drug response-related gene of the genotype and drug reaction information of 48 drug reaction-related genes shown in Table 2 was obtained. Example 2:
本实施例实验流程部分描述为包括炎黄在内的 50个样本建库杂交一张芯片, 本实施例 中的样本数用以解释本发明, 而不是限制每张芯片可以杂交的样本数。  The experimental procedure section of this example is described as a hybrid of 50 samples including Yanhuang. The number of samples in this example is used to explain the present invention, rather than limiting the number of samples that each chip can hybridize.
1、 实验材料  1. Experimental materials
本实施例中的试剂见表 3 , 其它试剂、 耗材和仪器设备未在表 3中注明者, 均为可通过 市场购买的通用产品。 表 3 本实施例所用试剂 The reagents in this example are shown in Table 3. Other reagents, consumables, and equipment are not indicated in Table 3, and are all general-purpose products that can be purchased through the market. Table 3 Reagents used in this example
Figure imgf000013_0001
Figure imgf000013_0001
2、 序列捕获文库制备 2. Sequence capture library preparation
( 1)基因组 DNA片段化  (1) Genomic DNA fragmentation
以 3 g无蛋白质、 RNA污染且没有降解的炎黄基因组 DNA为材料, 使用 Covaris-S2 超声打断仪(Covaris, US)仪器进行打断。 打断参数设置如下: duty/cycle ( % ) (负载比) 10 A 3 g protein-free, RNA-free, and non-degrading inflammatory genomic DNA was used as a material and disrupted using a Covaris-S2 Ultrasonic Interrupter (Covaris, US) instrument. The interrupt parameter settings are as follows: Duty/cycle ( % ) (load ratio) 10
Intensity (强度) 10  Intensity 10
Treatmentl (处理 1 )  Treatmentl (Process 1)
Cycle/burst (循环 /脉冲) 1000  Cycle/burst (cycle/pulse) 1000
Time(min) (时间 (秒)) 60  Time(min) (time (seconds)) 60
Treatment2 (处理 2 ) Time(s) (时间 (秒)) 0  Treatment2 Time(s) (time (seconds)) 0
Treatment (处理 3 ) Time(s) (时间 (秒)) 0  Treatment (Process 3) Time(s) (Time (seconds)) 0
Treatment4 (处理 4 ) Time(s) (时间 (秒)) 0  Treatment4 (Process 4) Time(s) (Time (seconds)) 0
Cycles (循环) 4 打断后的片段经电泳检测合格(主带集中在 200bp-300bp之间)后,使用 QIAquick PCR 纯化试剂盒回收纯化, 样本溶于 75μί洗脱緩冲液中。  Cycles 4 After the disrupted fragments were tested by electrophoresis (the main bands were concentrated between 200 bp and 300 bp), they were recovered by QIAquick PCR Purification Kit and the samples were dissolved in 75 μL of elution buffer.
( 2 ) DNA片段末端修复  (2) DNA fragment end repair
将打断后回收纯化得到的 DNA片段按下表在 1.5mL的离心管中配制末端修复反应体 系, 形成补平的末端磷酸化的 DNA片段。  The DNA fragment obtained after the interruption was recovered and purified, and a terminal repair reaction system was prepared in a 1.5 mL centrifuge tube to form a flattened terminal phosphorylated DNA fragment.
Figure imgf000014_0001
Figure imgf000014_0001
将上述 100 μ 反应混合物轻 混匀后,在 Thermomixer( Eppendorf )中 20°C温浴 30 min 后用 QIAquick PC 纯化试剂盒纯化, DNA最后于 32 μL ddH20中充分溶解。  The above 100 μ reaction mixture was lightly mixed, and then purified in a Thermomixer (Eppendorf) at 20 ° C for 30 min, and then purified by QIAquick PC purification kit, and the DNA was finally dissolved in 32 μL of ddH20.
( 3 ) 3'末端加 "A" 碱基修饰  (3) Add the "A" base modification at the 3' end
在末端补平修复后的 DNA片段 3'末端加上 "A" 碱基, 以便于下一步标签接头接头连 接。 末端加 "A" 碱基反应体系如下表。  Add the "A" base to the 3' end of the DNA fragment after the end of the repair, so that the next label joint is connected. The "A" base reaction system at the end is shown in the following table.
Figure imgf000014_0002
Figure imgf000014_0002
将上述 50μί反应混合物轻微混匀后, 在 Thermomixer ( Eppendorf )中 37°C温浴 30min 后用 QIAquick PCR 纯化试剂盒纯化, DNA最后于 15 ddH20中充分溶解。 ( 4 )标签接头接头连接 The above 50 μί reaction mixture was gently mixed, and then purified by a QIAquick PCR purification kit in a Thermomixer (Eppendorf) at 37 ° C for 30 min, and the DNA was finally dissolved in 15 ddH20. (4) Label connector joint connection
末端加 "A"后的 DNA片段纯化后在 T4 DNA连接酶作用下与标签接头连接。在 1.5 ml 的离心管中配制标签接头连接反应体系:  The DNA fragment after the end of "A" was purified and ligated to the tag linker by T4 DNA ligase. Formulate the labeling linker reaction system in a 1.5 ml centrifuge tube:
Figure imgf000015_0001
Figure imgf000015_0001
上述 50μί反应混合物轻^ 振荡混合均匀, 瞬时离心后置于 Thermomixer ( Eppendorf ) 中 20°C温浴 15min,反应完后用 MiniElute PCR纯化试剂盒进行纯化,最后将样品溶于 25μί 洗脱緩冲液。  The above 50 μί reaction mixture was gently shaken and mixed uniformly. After centrifugation, it was placed in a Thermomixer (Eppendorf) at 20 ° C for 15 min. After the reaction, it was purified by MiniElute PCR Purification Kit, and finally the sample was dissolved in 25 μL of elution buffer.
( 5 ) 杂交前 PCR及产物纯化  (5) Pre-hybridization PCR and product purification
以标签接头序列引物对加接头后的 DNA文库进行扩增, 扩增体系和条件如下:  The DNA library after the adaptor was amplified by the primer sequence primer, and the amplification system and conditions were as follows:
Figure imgf000015_0002
PCR程序为 94°C 2min; 4个循环的 94°C 15s, 62 °C 30s, 72 °C 30s; 72 °C 5min。 PCR 产物用 QIAquick PCR纯化试剂盒纯化, 洗脱体积为 30μί。
Figure imgf000015_0002
The PCR program was 94 ° C for 2 min; 4 cycles of 94 ° C for 15 s, 62 ° C for 30 s, 72 ° C for 30 s; 72 ° C for 5 min. The PCR product was purified using a QIAquick PCR purification kit with an elution volume of 30 μί.
( 6 )样本文库混合  (6) sample library mixing
按照上述 DNA打断、 末端修复、 加标签接头、 杂交前 PCR等步骤, 构建其它 49个样 本文库, 包括炎黄基因组 DNA样本文库共计 50个文库(包含 4个 HapMap样本、 1个炎黄 样本和 45个正常人样本, 其中 45正常人样本用于测试一张芯片可以杂交的样本数目), 从 这 50个文库中取等量的 DNA均匀混合。 为了在测序中区别来自不同样本的文库, 在加标 是, 标签接头包括两部分, 分别为用于区分各文库的 Index碱基序列和接头序列。 According to the above DNA disruption, end repair, tagging linker, pre-hybridization PCR and other steps, construct another 49 sample libraries, including 50 libraries of Yanhuang genomic DNA sample library (including 4 HapMap samples, 1 Yanhuang sample and 45 A normal human sample, of which 45 normal human samples are used to test the number of samples that a chip can hybridize, and an equal amount of DNA is uniformly mixed from the 50 libraries. In order to distinguish libraries from different samples in sequencing, Yes, the tag linker consists of two parts, an Index base sequence and a linker sequence for distinguishing each library.
4、 外显子文库构建  4. Exon library construction
外显子文库的构建包括采用制备的序列捕获文库与捕获芯片杂交, 将 48个药物反应相 关基因的全部外显子富集到捕获芯片上, 洗脱杂交后的捕获芯片, 洗脱产物即外显子序列, 对外显子序列扩增处理得到外显子文库, 具体如下:  The construction of the exon library comprises hybridization of the prepared sequence capture library with the capture chip, enriching all exons of the 48 drug reaction-related genes onto the capture chip, eluting the hybridized capture chip, and eluting the product The sequence of the exon is amplified by exon sequence to obtain an exon library, as follows:
( 1 ) 芯片杂交  (1) chip hybridization
A )在 1.5mL离心管中加入 45(^g的 Cot-1 DNA、 3 g来自混合文库的 DNA、 lnmol Index-adpaterl -block和 Index-adpater2 -block ( Multiplexing Sample Preparation Oligonucleotide 试剂盒, Illumina ), 混合物置于 SpeedVac ( Thermo ) 中蒸干, 温度设置为 60°C。  A) Add 45 (^g of Cot-1 DNA, 3 g of DNA from the mixed library, lnmol Index-adpaterl-block and Index-adpater2 -block (Multixing Sample Preparation Oligonucleotide Kit, Illumina) to a 1.5 mL centrifuge tube. The mixture was evaporated to dryness in a SpeedVac ( Thermo) and the temperature was set to 60 °C.
B )在蒸干的离心管中加入 11.2μί纯水, 充分溶解 DNA后加入 18.5μί的 2 <SC 杂交 緩冲液和 7.3μί的 SC Hybridiation, 充分混匀后将混合物转移至杂交仪 ( Nimblegen )上 95 °C干浴器中 10分钟使 DNA变性。  B) Add 11.2 μL of pure water to the evaporated tube, dissolve the DNA, add 18.5 μL of 2 <SC hybridization buffer and 7.3 μί of SC Hybridiation, mix well and transfer the mixture to the hybrid instrument (Nimblegen). DNA was denatured in a 95 °C dry bath for 10 minutes.
C )将样品取出震荡后置于离心机上全速离心 30秒, 置于杂交仪 ( Nimblegen )上 42 °C位置, 与外显子捕获芯片杂交。  C) The sample was taken out and shaken, placed in a centrifuge at full speed for 30 seconds, placed on a hybridizer (Nimblegen) at 42 °C, and hybridized with the exon capture chip.
D )杂交方法参照 NimbleGen公司芯片杂交方法( NimbleGen Arrays User's Guide, Version 3.1 , 7 Jul 2009, Roche NimbleGen, Inc. , 通过参照将其全文并入本文)。 样品上样量 35μ1, 42°C杂交 64-72hr, 杂交完成并经过芯片的杂交后处理后, 用 900μ1 160mM NaOH洗脱富集 在芯片上的序列, 洗脱产物用 MinElute PCR纯化试剂盒纯化, 最终用 80μ1 洗脱緩冲液洗 脱。  D) Hybridization method is described in NimbleGen Arrays User's Guide, Version 3.1, 7 Jul 2009, Roche NimbleGen, Inc., which is incorporated herein by reference in its entirety. The sample was loaded at 35 μl and hybridized at 42 ° C for 64-72 hr. After hybridization and post-hybridization of the chip, the sequence enriched on the chip was eluted with 900 μl of 160 mM NaOH, and the eluted product was purified by MinElute PCR purification kit. Finally eluted with 80 μl elution buffer.
( 2 )捕获后 PCR扩增  (2) PCR amplification after capture
以从捕获芯片上洗脱下来的序列为模板进行 PCR扩增, 体系为 Phusion Mix 150μ1, 上 下游引物各 4.2μ1 ( Multiplexing测序引物和 Phix Control试剂盒), 上述的 80μ1洗脱样品加 85μ1 άάΗ20 , 混合后分 6管进行 PCR。 PCR反应条件 94 °C , lmin; 16个循环的 94 °C 30s , 58 V 30s, 72V 30s; 72V 5min。 PCR反应后把 6管混合并用 QIAquick PCR纯化试剂盒 磁珠纯化回收 300-450bp大小的片段, 洗脱体积为 50μ1。  PCR amplification was performed using the sequence eluted from the capture chip as a template, the system was Phusion Mix 150 μ1, the upstream and downstream primers were each 4.2 μl (Multixing sequencing primer and Phix Control kit), and the above 80 μl elution sample was added with 85 μl άάΗ 20 . After mixing, PCR was carried out in 6 tubes. PCR reaction conditions 94 °C, lmin; 16 cycles of 94 °C 30s, 58 V 30s, 72V 30s; 72V 5min. After the PCR reaction, 6 tubes were mixed and purified by magnetic beads purification using a QIAquick PCR purification kit to recover a fragment of 300-450 bp in an elution volume of 50 μl.
( 3 )文库检测:  (3) Library detection:
采用 Bioanalyzer analysis system (Agilent, Santa Clara, US A)检测文库片段大小及含量; Q-PC 精确定量文库的浓度。  Library fragment size and content were determined using a Bioanalyzer analysis system (Agilent, Santa Clara, US A); Q-PC accurately quantified library concentrations.
5、 序列测定  5, sequence determination
对上述经过纯化和质量检测合格的 PCR扩增产物进行测序, 测序方法参照 Illumina公 司 HiSeq2000操作方法( HiSeq 2000 User Guide. Catalog # SY-940-1001 Part # 15011190 Rev B , Illumina )。  The above purified and quality-tested PCR amplification products were sequenced by the Illumina HiSeq2000 method (HiSeq 2000 User Guide. Catalog # SY-940-1001 Part # 15011190 Rev B , Illumina ).
6、 数据分析  6, data analysis
( 1 )测序数据过滤  (1) Sequencing data filtering
对测序获得的数据进行两方面的过滤, 一是测序质量值, 对整条序列, 计算其碱基质 量值, 当整条序列的平均质量值低于 10时, 将其过滤掉; 二是检测接头污染, 如果序列中 含有接头序列, 也将其过滤掉。 Two-way filtering of the data obtained by sequencing, one is to sequence the quality value, and calculate the alkali matrix for the entire sequence. Measured value, when the average mass value of the whole sequence is lower than 10, it is filtered out; the second is to detect the contamination of the joint, and if the sequence contains the linker sequence, it is also filtered out.
测序数据过滤结果显示, 被过滤掉的序列约占 7%, 其余 93%用于下一步的分析。  The sequencing data filtering results showed that the filtered sequence accounted for about 7%, and the remaining 93% was used for the next analysis.
( 2 )序列比对  (2) Sequence alignment
以 hgl9为参考序列, 用 BWA ( Burrows- Wheeler Aligner ) 比对软件对经过数据过滤的 序列进行比对。 比对时每条序列最多允许 5个错配, 开 gap (比对时允许有插入和删除)的 比对, 当一条序列有多个最佳比对位置时, 随机选择一个位置输出, 但会有标记。 在本实 施例的测试中, 样本比对上的序列占所有进行比对的序列的约 97%。  Using the hgl9 as a reference sequence, the data filtered sequences were compared using BWA (Burrows- Wheeler Aligner) comparison software. When matching, each sequence can allow up to 5 mismatches, and open gap (allowing insertion and deletion when comparing). When a sequence has multiple optimal alignment positions, randomly select a position output, but There are tags. In the test of this example, the sequence on the sample alignment accounted for approximately 97% of all aligned sequences.
( 3 )选取比对到目标区域的序列  (3) Select the sequence that is aligned to the target area
比对完之后, 首先, 根据比对的结果, 去掉非 unique reads, 只保留那些唯一比对到全 基因组中的序列; 再去 duplication, 对于比对到参考序列上同一位置的配对 reads , 去重复 任意保留其中一对 reads , 因为比对到同一位置的配对序列很可能是 PCR过程引起的。  After the comparison, first, according to the result of the comparison, remove the non-unique reads, and only retain those sequences that are uniquely aligned to the whole genome; then go to duplication, for the paired reads that are aligned to the same position on the reference sequence, repeat Any pair of reads is reserved arbitrarily, because the pairing sequence aligned to the same position is most likely caused by the PCR process.
上面处理完后, 根据药物反应相关基因芯片设计的目标区域, 保留那些比对到参考序 列上的目标区域的序列, 进行下一步的分析。  After the above treatment, according to the target region of the drug reaction-related gene chip design, the sequences of the target regions aligned to the reference sequence are retained, and the next analysis is performed.
( 4 )数据质控  (4) Data quality control
数据质控包括样本的数据量, 过滤的数据量大小, 序列比对时比对上序列的比例, 样 本的平均深度是否符合预期, 单碱基深度覆盖图是否符合泊松分布, 样本的目标区域覆盖 度等。  The data quality control includes the data volume of the sample, the amount of data filtered, the ratio of the sequence alignment to the upper sequence, whether the average depth of the sample is in line with expectations, whether the single base depth coverage map conforms to the Poisson distribution, and the target area of the sample. Coverage, etc.
统计分析结果显示, 本实施例的 50个样本均符合质控要求, 部分结果见表 4。  The statistical analysis showed that the 50 samples of this example met the quality control requirements, and some of the results are shown in Table 4.
具体地, 数据质控包括两方面, 一方面是看各样本之间是不是比较一致, 如果各样本 之间的数据都差不多, 表示符合要求, 如果有个别样本的数据其他大多数样本相差很多, 说明这个样本很可能有问题; 另一方面是每个样本的各质控数据, 这些标准本领域技术人 员都可根据经验来确定一个大概的范围, 不同的测序区域可能会有些变化, 具体来说, "数 据过滤后剩余量"一般在 85%以上, 比对序列的比例 (%) 90%以上, 去重复后剩余数据量 60%以上, unique reads占的比例与具体的测序目标区域相关且 90%以上, 平均深度符合预 期的实验设计要求, 覆盖度要 95%以上, 都是可以接受的。  Specifically, data quality control includes two aspects. On the one hand, it is to see whether the samples are relatively consistent. If the data between the samples is similar, the requirements are met. If there are individual samples, the other samples are quite different. Explain that this sample is likely to have problems; on the other hand, each quality control data of each sample, those skilled in the art can determine a rough range based on experience, and different sequencing areas may have some changes, specifically , "Remaining amount after data filtering" is generally above 85%, the ratio of the aligned sequence (%) is more than 90%, the amount of remaining data after deduplication is more than 60%, and the proportion of unique reads is related to the specific sequencing target area and 90 Above %, the average depth meets the expected experimental design requirements, and the coverage is over 95%, which is acceptable.
部分样本的数据质控结果  Data quality control results for some samples
去重复 unique  Deduplicate unique
比对序 目标区  Alignment target area
样本名 原始数 数据过滤后 后剩余 reads占 平均深 覆盖度 列的比 域数据 Sample name Original number After data filtering, the remaining reads account for the average deep coverage column ratio.
称 据量 剩余量(% ) 数据量 的比例 度(% ) ( % ) 例 ( % ) 里  Weighing amount (%) of the amount of data (%) (%) Example (%)
( % ) ( % )  ( % ) ( % )
样本 1 45.25M 92.75 98.15 82.27 96.16 12.72M 79.98 98.95 样本 2 30.35M 88.25 97.12 82.21 94.66 7.77M 48.88 98.67 样本 3 35.13M 86.70 96.72 81.83 94.97 8.49M 53.42 98.56 样本 4 34.6M 85.52 96.87 82.00 94.89 8.43M 53.04 98.62 样本 5 32.52M 86.66 97.10 81.89 94.90 8.07M 50.81 98.60 样本 6 30.13M 95.71 99.12 82.78 96.50 7.68M 48.30 98.89 样本 7 36.10M 96.01 99.01 82.71 96.28 9.30M 58.54 98.89 样本 8 36.27M 95.95 99.06 82.50 62.74 9.35M 58.80 98.98 样本 9 36.86M 95.86 99.01 81.97 96.61 9.55M 60.07 98.78 样本 10 32.16M 95.74 99.08 82.44 96.50 8.38M 52.75 98.84 Sample 1 45.25M 92.75 98.15 82.27 96.16 12.72M 79.98 98.95 Sample 2 30.35M 88.25 97.12 82.21 94.66 7.77M 48.88 98.67 Sample 3 35.13M 86.70 96.72 81.83 94.97 8.49M 53.42 98.56 Sample 4 34.6M 85.52 96.87 82.00 94.89 8.43M 53.04 98.62 Sample 5 32.52M 86.66 97.10 81.89 94.90 8.07M 50.81 98.60 Sample 6 30.13M 95.71 99.12 82.78 96.50 7.68M 48.30 98.89 Sample 7 36.10M 96.01 99.01 82.71 96.28 9.30M 58.54 98.89 Sample 8 36.27M 95.95 99.06 82.50 62.74 9.35M 58.80 98.98 Sample 9 36.86M 95.86 99.01 81.97 96.61 9.55M 60.07 98.78 Sample 10 32.16M 95.74 99.08 82.44 96.50 8.38M 52.75 98.84
( 5 ) SNP分析 (5) SNP analysis
本实施例中, SNP是用 samtools得到的, 当选取比对到目标区域的序列后, 用 samtools 转换格式、 排序之后 , 用其中的 mpileup命令进行 SNP Callings 原始的 SNP还会进行一些 过滤, 包括位点的深度、 质量值等。 通常, 深度在 4-400符合要求, 质量值则是通过用统计 的方法计算质量值的显著性, 对显著性过滤。  In this embodiment, the SNP is obtained by using samtools. After selecting the sequence to the target area, using samtools to convert the format and sorting, use the mpileup command to perform SNP Callings. The original SNP also performs some filtering, including bits. Point depth, quality value, etc. Usually, the depth is in accordance with the requirements of 4-400, and the quality value is calculated by statistically calculating the significance of the quality value.
在本实施例的样本中, 包括 4个 HapMap样本( a、 b、 c、 d )和 1个炎黄样本(这 5 个样本已经有公布的基因组及分型数据), 其中炎黄样本测了两次,对这五个样本的 SNP进 行了评价。 4个 HapMap样本与已有的 HapMap数据进行比较, 炎黄样本的 SNP与已有的 炎黄样本 Genotyping位点进行了比较, 表 5和表 6。
Figure imgf000018_0001
表 6 炎黄样本的 SNP分析结果
In the sample of this example, four HapMap samples (a, b, c, d) and one Yanhuang sample (these five samples have published genomic and typing data) were included, and the Yanhuang sample was measured twice. The SNPs of these five samples were evaluated. The four HapMap samples were compared with the existing HapMap data. The SNP of the Yanhuang sample was compared with the existing Yannoting site of the Yanhuang sample, Tables 5 and 6.
Figure imgf000018_0001
Table 6 SNP analysis results of Yanhuang samples
Figure imgf000018_0002
( 6 ) 药物反应相关基因的基因型
Figure imgf000018_0002
(6) genotype of drug response related genes
做完变异检测后, 根据每个基因在全基因组上的区域, 提取出每个基因的突变位点信 息。 根据这些突变位点信息与实施例 1 中构建好的药物反应相关基因的基因标准型别数据 库进行比较, 确定样本的基因型别信息和对应的药物反应信息。部分样本的检测结果如表 7 所示。 After the mutation detection, the mutation site information of each gene is extracted based on the region of each gene on the whole genome. Based on these mutation site information, the genotype information of the sample and the corresponding drug reaction information were determined by comparing with the gene standard type database of the drug reaction-related gene constructed in Example 1. The test results of some samples are shown in Table 7. Shown.
部分样本的药物反应相关基因的基因型及药物反应信息  Genotypic and drug response information of drug-related genes in some samples
Figure imgf000019_0001
Figure imgf000019_0001
分型结果显示, 采用本实施例的方法得到的基因型别信息及药物反应作用信息与现有 已知的记载一致。 工业实用性  The typing results showed that the genotype information and the drug reaction information obtained by the method of the present example were consistent with the known ones. Industrial applicability
本发明的构建药物反应相关基因的基因标准型别数据库药物反应相关基因的基因分 型方法及药物反应作用的检测方法,能够有效地用于药物反应相关基因的基因分型和药物 反应作用的检测, 并且省时省工、 成本低、 结果准确。 尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理解。 根据已 经公开的所有教导, 可以对那些细节进行各种修改和替换, 这些改变均在本发明的保护范 围之内。 本发明的全部范围由所附权利要求及其任何等同物给出。  The genotyping method for the drug reaction-related gene and the method for detecting the drug reaction of the gene standard type database for constructing a drug reaction-related gene of the present invention can be effectively used for the genotyping of a drug-related gene and the detection of a drug reaction And save time and labor, low cost and accurate results. Although specific embodiments of the invention have been described in detail, those skilled in the art will understand. Various modifications and alterations of those details are possible in light of the teachings of the invention. The full scope of the invention is given by the appended claims and any equivalents thereof.
在本说明书的描述中, 参考术语 "一个实施例"、 "一些实施例"、 "示意性实施例"、 "示 例"、 "具体示例"、 或 "一些示例" 等的描述意指结合该实施例或示例描述的具体特征、 结 构、 材料或者特点包含于本发明的至少一个实施例或示例中。 在本说明书中, 对上述术语 的示意性表述不一定指的是相同的实施例或示例。 而且, 描述的具体特征、 结构、 材料或 者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。  In the description of the present specification, the description of the terms "one embodiment", "some embodiments", "illustrative embodiment", "example", "specific example", or "some examples", etc. Particular features, structures, materials or features described in the examples or examples are included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.

Claims

权利要求书 Claim
1、 一种构建药物反应相关基因的基因标准型别数据库的方法, 其特征在于, 包括以下 步骤: A method for constructing a gene standard type database for a drug reaction-related gene, comprising the steps of:
将药物反应相关基因的基因型别突变信息对应的特定序列与人类全基因组标准序列进 行比对, 获得所述药物反应相关基因的特定序列与所述人类全基因组标准序列在每个碱基 位置上的对应关系;  Comparing a specific sequence corresponding to the genotype mutation information of the drug reaction-related gene with a human whole genome standard sequence, obtaining a specific sequence of the drug reaction-related gene and the human whole genome standard sequence at each base position Correspondence relationship;
根据所述对应关系, 将所述药物反应相关基因的基因型转换成以所述人类全基因组标 准序列为标准的基因型, 获得所述药物反应相关基因的标准化基因型别。  According to the correspondence relationship, the genotype of the drug reaction-related gene is converted into a genotype based on the human whole genome standard sequence, and a standardized genotype of the drug reaction-related gene is obtained.
2、根据权利要求 1所述的方法,其特征在于,所述药物反应相关基因包括选自 ABCB1、 2. The method of claim 1 wherein said drug response related gene comprises a selected from the group consisting of ABCB1.
ABCG2、 ADRB APC、 A G ASL、 ASS1、 BCHE、 B AF、 CDKN2A、 CPS1、 CYP19A CYP1A2、 CYP1B CYP2B6、 CYP2C19、 CYP2C9、 CYP2D6、 CYP2E CYP3A4、 CYP3A5、 CYP3A7、 CYP4F2、 DPYD、 EGFR、 EG ERBB2、 F2、 F5、 G6PD、 GSTA1、 HLA-B、 KIT、 K AS、 MTHFR、 NAGS、 NAT1、 NAT2、 NRAS、 OTC、 RNR1、 SLCOIB SULT1A TPMT、 TYMS、 UGT1A VKO C XRCC1等 48个人类药物反应相关基因的至少一种。 ABCG2, ADRB APC, AG ASL, ASS1, BCHE, B AF, CDKN2A, CPS1, CYP19A CYP1A2, CYP1B CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP2E CYP3A4, CYP3A5, CYP3A7, CYP4F2, DPYD, EGFR, EG ERBB2, F2, F5 At least one of 48 human drug-related genes, such as G6PD, GSTA1, HLA-B, KIT, K AS, MTHFR, NAGS, NAT1, NAT2, NRAS, OTC, RNR1, SLCOIB SULT1A TPMT, TYMS, UGT1A VKO C XRCC1 .
3、 根据权利要求 1所述的方法, 其特征在于, 进一步包括:  3. The method according to claim 1, further comprising:
定位所述药物反应相关基因于所述人类全基因组标准序列上, 确定所述药物反应相关 基因编码序列的起始位置和终止位置, 获得所述药物反应相关基因的基因型别突变信息对 应的特定序列。  Positioning the drug reaction-related gene on the human whole genome standard sequence, determining a starting position and a termination position of the coding sequence of the drug reaction-related gene, and obtaining a specificity corresponding to the genotype mutation information of the drug reaction-related gene sequence.
4、 根据权利要求 1所述的方法, 其特征在于, 所述药物反应相关基因的基因型别突变 信息对应的特定序列包含所述药物反应相关基因的编码序列的起始位置上游 5000bp至终止 位置下游的 500bp区域。  4. The method according to claim 1, wherein the specific sequence corresponding to the genotype mutation information of the drug reaction-related gene comprises 5000 bp upstream to the end position of the coding sequence of the drug reaction-related gene. The 500 bp region downstream.
5、 根据权利要求 1所述的方法, 其特征在于, 所述人类全基因组标准序列为 hgl9。 5. The method according to claim 1, wherein the human whole genome standard sequence is hgl9.
6、 一种药物反应相关基因的基因标准型别数据库, 其是通过权利要求 1-5任一项所述 的方法构建的。 A gene standard type database for a drug reaction-related gene, which is constructed by the method according to any one of claims 1-5.
7、 根据权利要求 6所述的药物反应相关基因的基因标准型别数据库, 其特征在于, 所 述药物反应相关基因的基因标准型别数据库中, 所述药物反应相关基因的各标准化基因型 别对应有药物反应作用的相关信息。  The gene standard type database of the drug reaction-related gene according to claim 6, wherein the normalized genotype of the drug reaction-related gene in the gene standard type database of the drug reaction-related gene Corresponding to information on drug response.
8、 根据权利要求 6所述的药物反应相关基因的基因标准型别数据库, 其特征在于, 所 述药物反应相关基因包括选自 ABCB1、 ABCG2、 ADRB APC、 ARG1、 ASL、 ASS1、 The gene standard type database of the drug reaction-related gene according to claim 6, wherein the drug reaction-related gene comprises a gene selected from the group consisting of ABCB1, ABCG2, ADRB APC, ARG1, ASL, ASS1.
BCHE、 BRAF、 CDKN2A、 CPS CYP19A CYP1A2、 CYP1B CYP2B6、 CYP2C19、 CYP2C9、 CYP2D6、 CYP2E CYP3A4、 CYP3A5、 CYP3A7、 CYP4F2、 DPYD、 EGFR、 EG ERBB2、 F2、 F5、 G6PD、 GSTA1、 HLA-B、 KIT、 KRAS、 MTHFR、 NAGS、 NAT1、 NAT2、 N AS、 OTC、 N SLCOIB SULT1A TPMT、 TYMS、 UGT1A VKO C XRCC1基因的至少一种。 BCHE, BRAF, CDKN2A, CPS CYP19A CYP1A2, CYP1B CYP2B6, CYP2C19, CYP2C9, CYP2D6, CYP2E CYP3A4, CYP3A5, CYP3A7, CYP4F2, DPYD, EGFR, EG ERBB2, F2, F5, G6PD, GSTA1, HLA-B, KIT, KRAS At least one of MTHFR, NAGS, NAT1, NAT2, N AS, OTC, N SLCOIB SULT1A TPMT, TYMS, UGT1A VKO C XRCC1 gene.
9、 一种药物反应相关基因的基因分型方法, 其特征在于, 包括: 获取待测样本药物反应相关基因的外显子序列, 采用高通量测序平台测序并进行数据 分析,将分析结果与权利要求 6-8任一项所述的药物反应相关基因的基因标准型别数据库进 行比较, 从而得到所述待测样本的基因型别。 9. A method for genotyping a gene related to a drug reaction, comprising: Obtaining the exon sequence of the drug-related gene of the sample to be tested, sequencing by a high-throughput sequencing platform, and performing data analysis, and analyzing the result with the gene standard type of the drug-related gene according to any one of claims 6-8 The database is compared to obtain the genotype of the sample to be tested.
10、 一种药物反应作用的检测方法, 其特征在于, 包括:  10. A method for detecting a drug response, characterized by comprising:
获取待测样本药物反应相关基因的外显子序列, 采用高通量测序平台测序并进行数据 分析, 将分析结果与权利要求 7所述的药物反应相关基因的基因标准型别数据库进行比较, 得到待测样本的基因型别, 并根据所述待测样本的基因型别对应的药物反应作用信息获得 所述待测样本的药物反应作用结果。  Obtaining the exon sequence of the drug-related gene of the sample to be tested, sequencing using a high-throughput sequencing platform, and performing data analysis, and comparing the analysis result with the gene standard type database of the drug reaction-related gene according to claim 7 The genotype of the sample to be tested is obtained, and the drug reaction result of the sample to be tested is obtained according to the drug reaction information corresponding to the genotype of the sample to be tested.
11、 根据权利要求 9或 10所述的方法, 其特征在于, 所述获取待测样本药物反应相关 基因的外显子序列是通过以下步骤实现的:  The method according to claim 9 or 10, wherein the obtaining the exon sequence of the drug reaction-related gene of the sample to be tested is achieved by the following steps:
A、 制备能够捕获药物反应相关基因外显子序列的芯片, 所述芯片上含有与所述药物反 应相关基因外显子序列反向互补的寡核苷酸探针;  A. preparing a chip capable of capturing an exon sequence of a drug reaction-related gene, the chip comprising an oligonucleotide probe complementary to a reverse complement of the exon sequence of the drug-related gene;
B、 用待测样本的基因组 DNA制备序列捕获文库, 包括将所述待测样本基因组 DNA 打断为 200~500bp大小的片段, 进行末端处理后扩增得到所述序列捕获文库;  B. Preparing a sequence capture library by using the genomic DNA of the sample to be tested, comprising interrupting the genomic DNA of the sample to be tested into a fragment of 200-500 bp in size, and performing terminal treatment to obtain the sequence capture library;
C、 将步骤 B制备得到的序列捕获文库与步骤 A的芯片杂交, 从而获取得到所述待测 样本的药物反应相关基因外显子文库。  C. The sequence capture library prepared in step B is hybridized with the chip of step A to obtain a drug reaction-related gene exon library of the sample to be tested.
12、 根据权利要求 11所述的方法, 其特征在于, 在所述步骤 A中, 所述芯片含有能分 别与 48个人类药物反应相关基因的所有外显子序列反向互补的寡核苷酸探针, 所述寡核苷 酸探针的长度为 55-105bp。  12. The method according to claim 11, wherein in said step A, said chip contains an oligonucleotide which is inversely complementary to all exon sequences of 48 human drug-related genes, respectively. The probe has a length of 55-105 bp.
13、 根据权利要求 11所述的方法, 其特征在于, 在所述步骤 B中, 将所述待测样本基 因组 DNA打断为 200~300bp大小的片段。  The method according to claim 11, wherein in the step B, the genomic DNA of the sample to be tested is interrupted into fragments of 200 to 300 bp in size.
14、 根据权利要求 11所述的方法, 其特征在于, 在所述步骤 B中, 所述末端处理包括 进行末端修复形成平末端磷酸化的 DNA片段, 并在所述平末端磷酸化的 DNA片段的 3'末 端加上碱基 "A" , 并进一步连接标签。  14. The method according to claim 11, wherein in the step B, the terminal treatment comprises performing a terminal repair to form a blunt-end phosphorylated DNA fragment, and the blunt-end phosphorylated DNA fragment Add the base "A" to the 3' end and further attach the tag.
15、 根据权利要求 11所述的方法, 其特征在于, 在所述步骤 C中, 在进行所述杂交之 前, 将来自多个不同待测样本的序列捕获文库混合后再同时与步骤 A的芯片杂交, 每个文 库带有不同的标签碱基序列而相互区别。  15. The method according to claim 11, wherein in the step C, a sequence capture library from a plurality of different samples to be tested is mixed and simultaneously with the chip of step A before the hybridization is performed. Hybridization, each library differs from each other with different tag base sequences.
16、 根据权利要求 11所述的方法, 其特征在于, 所述标签碱基序列长度为 6~8bp。 The method according to claim 11, wherein the tag base sequence has a length of 6 to 8 bp.
17、 根据权利要求 9或 10所述的方法, 其特征在于, 所述数据分析进一步包括: i、 过滤去掉影响信息分析的低质量测序序列; The method according to claim 9 or 10, wherein the data analysis further comprises: i. filtering and removing the low quality sequencing sequence that affects the information analysis;
ii、 以人类全基因组标准序列为参考序列, 将步骤 i得到的序列用比对软件进行比对; iii、选取比对到目标区域的序列进行后续分析, 所述目标区域是指药物反应相关基因外 显子序列所在区域;  Ii. using the human whole genome standard sequence as a reference sequence, and comparing the sequence obtained in step i with the comparison software; iii, selecting a sequence to be compared to the target region for subsequent analysis, the target region refers to a drug reaction related gene The region where the exon sequence is located;
iv、 数据质控合格后进行变异分析, 所述变异分析包括检测以下中的至少一种: 单核苷 酸多态性、 插入和删除、 结构性变异、 拷贝数变异。  Iv. Perform variation analysis after passing the data quality control, and the variation analysis includes detecting at least one of the following: single nucleotide polymorphism, insertion and deletion, structural variation, copy number variation.
18、 根据权利要求 17所述的方法, 其特征在于, 所述比对软件为选自 SOAP和 BWA 的至少一种。 18. The method according to claim 17, wherein the comparison software is selected from the group consisting of SOAP and BWA At least one of them.
PCT/CN2013/070081 2012-01-06 2013-01-05 Medicament-related genotype database, method for genotyping and for detecting medicament reaction WO2013102442A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210002898.7 2012-01-06
CN201210002898.7A CN103198238B (en) 2012-01-06 2012-01-06 Build method and its application of drug reaction related gene standard type data base

Publications (1)

Publication Number Publication Date
WO2013102442A1 true WO2013102442A1 (en) 2013-07-11

Family

ID=48720791

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/070081 WO2013102442A1 (en) 2012-01-06 2013-01-05 Medicament-related genotype database, method for genotyping and for detecting medicament reaction

Country Status (2)

Country Link
CN (1) CN103198238B (en)
WO (1) WO2013102442A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2593577C1 (en) * 2015-04-28 2016-08-10 Акционерное общество "Соцмедика" (АО "Соцмедика") Method of determining medicinal interactions and limitations to use of drugs with the help of structured knowledge base
RU2607194C1 (en) * 2015-08-10 2017-01-10 Акционерное общество "Соцмедика" (АО "Соцмедика") Method for automatic selection of pharmaceutical drugs
CN107641645A (en) * 2017-11-14 2018-01-30 北京阅微基因技术有限公司 Angiocardiopathy personalized medicine related gene polymorphism detection architecture and kit
CN107904302A (en) * 2017-11-29 2018-04-13 昆明理工大学 One group of primer for detecting anticoagulant related gene polymorphism at the same time and application
CN108179148A (en) * 2018-02-11 2018-06-19 北京乐普基因科技股份有限公司 A kind of probe for detecting genetic cardiomyopathies and its application
CN114317533A (en) * 2022-01-12 2022-04-12 武汉艾迪康医学检验所有限公司 Group of probes and library construction kit for detecting polymorphism of pharmacogenomic related gene CYP2D6 by utilizing hybrid capture method

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105940116A (en) * 2014-01-27 2016-09-14 财团法人生技医疗科技政策研究中心 Method for screening risk of drug-induced toxicity
CN104109709A (en) * 2014-04-04 2014-10-22 北京泛生子生物科技有限公司 Important gene enrichment method used for individual cancer diagnosis and treatment
CN105512514B (en) * 2014-09-23 2018-05-01 深圳华大基因股份有限公司 A kind of MHC completions database, its construction method and application
CN106834427A (en) * 2015-12-07 2017-06-13 广州康昕瑞基因健康科技有限公司 A kind of SNP classifying methods and kit
CN106909806B (en) * 2015-12-22 2019-04-09 广州华大基因医学检验所有限公司 The method and apparatus of fixed point detection variation
CN105525004A (en) * 2016-01-22 2016-04-27 广州金域检测科技股份有限公司 Primer and method for simultaneously detecting MDR1 and CYP19A1 gene polymorphism
CN108350498B (en) * 2016-02-18 2021-10-19 深圳华大生命科学研究院 Parting method and device
CN105925666A (en) * 2016-03-30 2016-09-07 广州精科生物技术有限公司 Kit and application thereof, and method and system for detecting area target variation
CN105925562A (en) * 2016-05-10 2016-09-07 广州嘉检医学检测有限公司 Method and kit for enriching 4000 human pathogenic target genes
CN106086192A (en) * 2016-06-27 2016-11-09 上海泽因生物科技有限公司 The parting detecting reagent of tacrolimus personalized medicine related gene
CN106222281A (en) * 2016-08-10 2016-12-14 中南大学湘雅三医院 Test kit, application and method of based on the gene pleiomorphism accurate medication of guiding children patient
CN107038351B (en) * 2017-04-17 2020-06-02 为朔医学数据科技(北京)有限公司 Method for systematically predicting influence of omics variation on drug effect
CN108733974B (en) * 2017-04-21 2021-12-17 胤安国际(辽宁)基因科技股份有限公司 Mitochondrial sequence splicing and copy number determination method based on high-throughput sequencing
CN106906300A (en) * 2017-04-21 2017-06-30 为朔医学数据科技(北京)有限公司 A kind of genetic ID card and preparation method thereof
CN107103199A (en) * 2017-04-28 2017-08-29 为朔医学数据科技(北京)有限公司 A kind of method and device of direction of medication usage
CN107292129A (en) * 2017-05-26 2017-10-24 中国科学院上海药物研究所 Susceptible genotype detection method
CN107463764A (en) * 2017-06-16 2017-12-12 康美健康云服务有限公司 A kind of patient medication guidance method and system based on hereditary information
CN107506615A (en) * 2017-08-21 2017-12-22 为朔医学数据科技(北京)有限公司 A kind of genomics data managing method, server and system
CN107944224B (en) * 2017-12-06 2021-04-13 懿奈(上海)生物科技有限公司 Method for constructing skin-related gene standard type database and application
CN107974490B (en) * 2017-12-08 2019-05-14 东莞博奥木华基因科技有限公司 PKU Disease-causing gene mutation detection methods and device based on semiconductor sequencing
CN108334750B (en) * 2018-04-19 2019-02-12 江苏先声医学诊断有限公司 A kind of macro genomic data analysis method and system
CN108728522A (en) * 2018-06-11 2018-11-02 苏州艾达康医疗科技有限公司 Drug Discovery detection method
CN108753954B (en) * 2018-06-26 2022-11-18 中南大学湘雅医院 Capture probe set of dementia-related gene, kit, library construction method and application
CN108959856B (en) * 2018-06-29 2019-06-21 迈凯基因科技有限公司 A kind of variation of disease gene and drug interpret multiple database interactive system and method
CN109295189A (en) * 2018-10-22 2019-02-01 北京华夏时代生物工程有限公司 Snp analysis system and the detection of the SNP for BChE is sequenced in fluorescence in situ hybridization
CN109457026A (en) * 2018-10-22 2019-03-12 江苏美因康生物科技有限公司 A kind of kit and method of quick detection antithrombotic personalized medicine gene pleiomorphism
CN109741788A (en) * 2018-12-24 2019-05-10 广州合众生物科技有限公司 A kind of SNP site analysis method and system
CN110136780B (en) * 2019-05-14 2022-03-04 杭州链康医学检验实验室有限公司 Method for constructing probe specificity database based on comparison algorithm
CN113136413B (en) * 2020-01-20 2022-07-19 河南科技大学 In-vitro activity determination method of vitamin K epoxide reductase and application thereof
CN112725440B (en) * 2021-02-10 2023-08-18 上海百傲科技股份有限公司 Method, kit, primer pair and probe for detecting G6PD gene
CN114231606A (en) * 2021-11-29 2022-03-25 北京艾迪康医学检验实验室有限公司 Method for rapidly analyzing CYP2C9 genotype

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070020671A1 (en) * 2005-07-12 2007-01-25 Radtkey Ray R Method for detecting large mutations and duplications using control amplification comparisons to paralogous genes
CN101054601A (en) * 2006-04-13 2007-10-17 中国人民解放军军事医学科学院放射与辐射医学研究所 Oligonucleotide for detecting cytochrome P450 enzyme series mutation site and gene chip
CN101429559A (en) * 2008-12-12 2009-05-13 深圳华大基因研究院 Environmental microorganism detection method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ERHARDT, P.W. ET AL.: "A human drug metabolism database: potential roles in the quantitative predictions of drug metabolism and metabolism-related drug-drug interactions", CURRENT DRUG METABOLISM, vol. 4, 2003, pages 411 - 422 *
JIANG, TAO ET AL.: "High-performance single-chip exon capture allows accurate whole exome sequencing using the Illumina Genome Analyzer.", SCIENTIA SINICA VITAE, vol. 41, no. 9, 2011, pages 714 - 721, XP019969844 *
SIM, S.C. ET AL.: "The human cytochrome P450 (CYP) allele nomenclature website: a peer-reviewed database of CYP variants and their associated effects", HUMAN GENOMICS, vol. 4, no. 4, April 2010 (2010-04-01), pages 278 - 281, XP021126954 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2593577C1 (en) * 2015-04-28 2016-08-10 Акционерное общество "Соцмедика" (АО "Соцмедика") Method of determining medicinal interactions and limitations to use of drugs with the help of structured knowledge base
RU2607194C1 (en) * 2015-08-10 2017-01-10 Акционерное общество "Соцмедика" (АО "Соцмедика") Method for automatic selection of pharmaceutical drugs
CN107641645A (en) * 2017-11-14 2018-01-30 北京阅微基因技术有限公司 Angiocardiopathy personalized medicine related gene polymorphism detection architecture and kit
CN107641645B (en) * 2017-11-14 2021-02-19 北京阅微基因技术股份有限公司 Cardiovascular disease personalized medication related gene polymorphism detection system and kit
CN107904302A (en) * 2017-11-29 2018-04-13 昆明理工大学 One group of primer for detecting anticoagulant related gene polymorphism at the same time and application
CN108179148A (en) * 2018-02-11 2018-06-19 北京乐普基因科技股份有限公司 A kind of probe for detecting genetic cardiomyopathies and its application
CN108179148B (en) * 2018-02-11 2024-02-13 北京爱普益医学检验中心有限公司 Probe for detecting hereditary cardiomyopathy and application thereof
CN114317533A (en) * 2022-01-12 2022-04-12 武汉艾迪康医学检验所有限公司 Group of probes and library construction kit for detecting polymorphism of pharmacogenomic related gene CYP2D6 by utilizing hybrid capture method
CN114317533B (en) * 2022-01-12 2023-08-29 武汉艾迪康医学检验所有限公司 A set of probes and a kit for constructing a library for detecting polymorphism of CYP2D6 gene related to pharmacogenomics by utilizing hybrid capture method

Also Published As

Publication number Publication date
CN103198238B (en) 2017-04-05
CN103198238A (en) 2013-07-10

Similar Documents

Publication Publication Date Title
WO2013102442A1 (en) Medicament-related genotype database, method for genotyping and for detecting medicament reaction
JP7119014B2 (en) Systems and methods for detecting rare mutations and copy number variations
US20220325344A1 (en) Identifying a de novo fetal mutation from a maternal biological sample
Xuan et al. Next-generation sequencing in the clinic: promises and challenges
US9920370B2 (en) Haplotying of HLA loci with ultra-deep shotgun sequencing
JP6867045B2 (en) Single molecule sequencing of plasma DNA
JP2022519159A (en) Analytical method of circulating cells
WO2013102441A1 (en) Cyp450 genotype database and method for genotyping and assessment of enzyme activity
JP2014507164A (en) Method and system for haplotype determination
Profaizer et al. Human leukocyte antigen typing by next-generation sequencing
WO2015200701A2 (en) Software haplotying of hla loci
TW201326400A (en) Method of detecting DMD gene exon deletion and/or repeated
CN115679000B (en) Method, device, equipment and storage medium for detecting tiny residual focus
US20180142300A1 (en) Universal haplotype-based noninvasive prenatal testing for single gene diseases
AU2013203446B2 (en) Identifying a de novo fetal mutation from a maternal biological sample
Hui Epigenetic heterogeneity revealed through single-cell DNA methylation sequencing
Konnick et al. Existing and Emerging Molecular Technologies in Myeloid Neoplasms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13733807

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 26/11/2014)

122 Ep: pct application non-entry in european phase

Ref document number: 13733807

Country of ref document: EP

Kind code of ref document: A1