WO2012125848A2 - A method for comprehensive sequence analysis using deep sequencing technology - Google Patents

A method for comprehensive sequence analysis using deep sequencing technology Download PDF

Info

Publication number
WO2012125848A2
WO2012125848A2 PCT/US2012/029266 US2012029266W WO2012125848A2 WO 2012125848 A2 WO2012125848 A2 WO 2012125848A2 US 2012029266 W US2012029266 W US 2012029266W WO 2012125848 A2 WO2012125848 A2 WO 2012125848A2
Authority
WO
WIPO (PCT)
Prior art keywords
reads
dna
sample
sequence
sequencing
Prior art date
Application number
PCT/US2012/029266
Other languages
French (fr)
Other versions
WO2012125848A3 (en
Inventor
Wei Zhang
Lee-Jin C. WONG
Hong Cui
Original Assignee
Baylor College Of Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baylor College Of Medicine filed Critical Baylor College Of Medicine
Publication of WO2012125848A2 publication Critical patent/WO2012125848A2/en
Publication of WO2012125848A3 publication Critical patent/WO2012125848A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • Clinical genetic testing is one of the most rapidly expanding fields in laboratory testing and clinical practice. Special considerations must be made in these clinical laboratories that are not made in traditional research laboratories, as results effect the treatment and life decisions of patients and their families. The results of such genetic testing must be accurate and reliable with error rates as small as possible given the technology. Additional quality assurance checks at each step of the way provide assurance that the sample is the correct sample, that there has been no cross-contamination, and that the machine error is not prohibitive to an accurate diagnosis.
  • the deep sequencing technique demonstrates uniform coverage of each of the 16,569 bases of the mitochondrial genome at over 10,000 fold, for example.
  • the high coverage allows not only the detection of nucleotide changes, but also the degree of heteroplasmy at every single base in mitochondrial DNA.
  • the deep sequencing technique is able to simultaneously detect small indels and large deletions, map exact breakpoints, calculate deletion heteroplasmy, and monitor copy number changes.
  • the embodiments described below demonstrate the superior sensitivity and specificity of base calling with quantitative information when compared to the gold standard Sanger sequencing. For quality assurance, additional qualitative and quantitative controls may be analyzed along with each sample.
  • the controls also allow the determination of experimental errors which provide the estimation of limit of detection.
  • the "deep" sequencing approach provides a comprehensive molecular analysis for patients with suspicion of genetic diseases in a timely, accurate, and cost- effective manner.
  • Embodiments of the invention provide a multifaceted approach to deep sequencing nuclear and/or mitochondrial DNA, error checking and quality assurance, while enabling multiple subjects to be sequenced in the same run.
  • the invention relates to deep sequencing and to additional quality control checks for use in a clinical diagnostic setting.
  • Embodiments of the invention include deep sequencing methods, external and internal quality control methods and kits, and methods to determine the overall quality of a sequencing run.
  • An embodiment of the invention is a method of quality control comprising adding to an unsequenced sample of DNA at least three known sequences, wherein each sequence is at a different known concentration. At least four, five, six, seven, eight, nine or more known sequences with different known concentrations may be added to the sample.
  • the embodiment may further comprise sequencing the DNA sample and comparing the percentages of the known sequences in the sequenced sample to the starting concentration of the known sequences. The sequence may be rejected if the correlation coefficient of the expected versus observed values of the concentrations is less than about 99%, less than about 98%, less than about 95%, or less than about 90%.
  • Another embodiment of the invention is a kit comprising three or more known sequences of DNA, wherein each sequence is at a different known concentration.
  • a general embodiment of the invention is a method of quality control comprising genotyping at least a first and second samples; pooling the samples; sequencing the pooled samples; demultiplexing the samples; and comparing the genotype of the first sample to the demultiplexed sequence of the first sample.
  • the method may further comprise rejecting the sequence if the sequence does not match with at least 50% of the sequence, at least 75% of the sequence, at least 80% of the sequence, at least 85% of the sequence, at least 90% of the sequence, at least 95% of the sequence, at least 98% of the sequence, at least 99% of the sequence, or at least 100 % of the sequence.
  • Another general embodiment of the invention is a method for quality control of sequencing data comprising: receiving at least three parameters corresponding to DNA sequencing, wherein in specific embodiments the parameters comprise three or more of the average number of reads of external control DNA, the average number of sample reads, the average number of sample reads normalized to the average number of reads of external control, the correlation coefficient of the expected versus observed values of external control, the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, the specificity determined from reads mapped and the sensitivity of reads mapped, or the number of unmapped reads; determining, using a processor, a weighted summed value based on the received parameters, accepting results of the DNA sequencing if the value is over a
  • the method may comprise receiving at least three parameters, receiving at least four parameters, receiving at least five parameters, or receiving all parameters.
  • the known sequences mimic heteroplasmy.
  • the sequences each have one nucleotide that is different from another of the known sequences.
  • Embodiments also include a system for quality control of sequencing data, the system comprising a processor in communication with a memory where: the memory stores processor-executable code; the processor is configured to be operable in conjunction with the processor-executable code to: receive at least three parameters corresponding to DNA sequencing, wherein the parameters are the average number of reads of external control DNA, the average number of sample reads, the average number of sample reads normalized to the average number of reads of external control, the correlation coefficient of the expected versus observed values of external control, the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, the specificity determined from reads mapped and the sensitivity of reads mapped, or the number of unmapped reads; determine a weighted summed value based on the received parameters and, transmit the weighted summed value.
  • the parameters are the average number of reads of external control DNA, the average number of sample reads, the average number of sample reads normalized to the average number of reads of external control, the correlation coefficient of
  • the system may reject the sequence if the weighted summed value is above or below a predetermined number.
  • Another embodiment is a non-transitory computer readable- medium comprising computer-usable program code executable to perform operations comprising: receiving at least three parameters corresponding to DNA sequencing, wherein the parameters are the average number of reads of external control DNA, the average number of sample reads, the average number of sample reads normalized to the average number of reads of external control, the correlation coefficient of the expected versus observed values of external control, the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, the specificity determined from reads mapped and the sensitivity of reads mapped, or the number of unmapped reads; determining a weighted summed value based on the received parameters, transmitting the weighted summed value. Additionally, the sequence may be rejected if the weighted summed value is above or below a predetermined number.
  • Another general embodiment of the invention is a method of quality control comprising genotyping at least a first and second samples; pooling the samples; sequencing the pooled samples; demultiplexing the samples; and comparing the genotype of the first sample to the demultiplexed sequence of the first sample.
  • genotypes are represented by SNPs.
  • sequencing is done with next generation sequencers such as an Illumina sequencer, Roche's 454 by pyro-sequencing, ABI SOLiD, Ion Torrent sequencer, Helicos Helioscope, or Pacific
  • Another general embodiment of the invention is a method comprising receiving a plurality of DNA samples; pooling the samples; sequencing the sample on a next generation sequencer, wherein the sequencer has been adjusted to provide deep sequencing; demultiplexing the samples; and outputting the sequences of the demultiplexed samples.
  • the samples may include only nuclear DNA, only mitochondrial DNA or both.
  • the deep sequencing may comprise sequencing with greater than 1,000 fold average reads per nucleotide, greater than 10,000 fold average reads per nucleotide, greater than 20,000 fold average reads per nucleotide, greater than 30,000 fold average reads per nucleotide, greater than 40,000 fold average reads per nucleotide, greater than 50,000 fold average reads per nucleotide, greater than 75,000 fold average reads per nucleotide, or greater than 100,000 fold average reads per nucleotide, in the context of mitochondrial deep sequencing.
  • the deep sequencing may comprise greater than 100 fold average reads per nucleotide, greater than 200 fold average reads per nucleotide, greater than 300 fold average reads per nucleotide, greater than 400 fold average reads per nucleotide, greater than 500 fold average reads per nucleotide, greater than 600 fold average reads per nucleotide, greater than 700 fold average reads per nucleotide, or greater than 800 fold average reads per nucleotide reads per nucleotide.
  • FIG. 1 is an embodiment of the detection of mtDNA large deletions by massively parallel sequencing (MPS), also known as Next Generation Sequencing (NGS) compared to detection by aCGH or Southern analysis (ND is not determined).
  • MPS massively parallel sequencing
  • NGS Next Generation Sequencing
  • FIG. 2 is a quantification of heteroplasmic point mutations by MPS.
  • heteroplasmy are listed and compared to the results obtained by ARMS qPCR.
  • FIG. 3 illustrates the assessment of analytical error and correlation of the observed and the expected data.
  • Fig. 3A depicts the instrumental (open bars) and experimental (close bars) errors for the 12 indexed ExQC DNAs and the corresponding samples.
  • Fig. 3B depicts the correlation of the observed and the expected percentage heteroplasmy at a specific position in the spiked-in controls (ExQC). The correlation coefficient is 1.
  • Fig. 4 illustrates the detection of mtDNA deletions by whole mtDNA amplification followed by MPS.
  • Fig. 4A illustrates a uniform coverage throughout the entire mitochondrial genome with the following subfigures - a: normal control without deletion; b-e: mtDNA large deletions with various size and percentage of deletion heteroplasmy. The deletion breakpoints are clearly shown.
  • Fig.4B is the agarose gel analysis of example LR-PCR products. Lanes Ml and M2 are size markers; lane a: normal control without deletion, lanes b and d: showing DNA bands of both intact and smaller deletion molecules, lanes c and e: showing smaller bands of deletion molecules (>90 ), the intact mtDNA are barely detected ( ⁇ 10 ).
  • Fig. 4C illustrates the size of deletion and degree of heteroplasmy. The exact breakpoints determined by PCR/sequencing and targeted aCGH are listed in Fig. 1.
  • Fig. 5 illustrates performance evaluation of target enrichment methods by "deep sequencing index” (DSI).
  • Three target enrichment methods in solution capture, 24 PCR fragments, and single LR-PCR amplification of the entire mtDNA) were performed and evaluated by the DSI.
  • Fig. 5A is an exemplary formula of the "deep sequencing index” (DSI).
  • Fig. 5B is in solution enrichment of target genes using SureSelect RNA probes.
  • Fig. 5C is gene enrichment by mixing 24 PCR fragments.
  • Fig. 5D is long range PCR with a single pair of primers for the entire mitochondrial genome.
  • Fig. 6 is an exemplary flow chart for deep sequencing of the mitochondrial genome.
  • Fig. 7 is an overview of an example Deep Sequencing Index.
  • Fig. 8 is exemplary results from a patient sample run with mitochondrial whole genome deep sequencing.
  • the top table represents standards and/or controls, while the bottom table is a subject sample.
  • Fig. 9 illustrates mitochondrial genome with example primers shown as F and R.
  • Fig. 10 is a list of exemplary primers for amplification of InQC genotype markers (SEQ ID NOS:6-33).
  • Fig. 11 is a list of exemplary external quality controls and proportions of nucleotide at each specific variant position
  • Fig. 12 illustrates base calls by MPS and Sanger sequencing of an exemplary sample.
  • the 3.7% heteroplasmy of the m. l630A>G mutation was not detected by Sanger method (see Fig. 1 for sequence trace).
  • Fig. 13 illustrates the detection of the m.1630 mutation by NGS but not by Sanger sequencing and is the sequence trace of Sanger results.
  • the m. l630A>G change was not detected by Sanger sequencing (SEQ ID NO:34).
  • Fig. 14 illustrates that low heteroplasmy of m.1630A>G is detected by MPS (SEQ ID NO:35 and 36).
  • Fig. 15 is an example of detailed information generated by the NextGENe program.
  • Fig. 16 is a comparison of next generation MPS with Sanger sequencing and shows sensitivity and specificity of variant detection by MPS using Sanger method as the standard.
  • Fig. 17 illustrates the reproducibility of mtDNA heteroplasmy detection by NGS with deep sequencing.
  • the heteroplasmy of a sample with an admixture of haplogroups J and H was analyzed twice in two independent illumina's paired end and a-single-end runs. Each measurement shows about 15% heteroplasmy for haplogroup H (hatched bars) and 85 % for haplogroup J (open and closed bars).
  • the Pearson's correlation coefficient for the two runs are 0.998 with p value ⁇ 0.001.
  • the numbers on the top of the histogram are shared homoplasmic mtDNA variant positions.
  • the numbers at the bottom are mtDNA variant positions
  • an "individual” is an appropriate individual for the method of the present invention. Individuals may also be referred to as “patients,” or “subjects.”
  • Multiplex PCR refers to using multiple PCR primers to amplify the same pool of DNA.
  • Multiplex sequencing refers to pooling multiple subjects DNA and sequencing the pool in one run.
  • demultiplexing or “demultiplexed” refers to a sequence that has been assigned to a subject. For example, in multiplexed sequencing each fragment of a subjects DNA is tagged with an identifying DNA fragment that corresponds to the subject. After multiple subjects DNA fragments are mixed together and sequenced, this ID tag is then used to identify which sequence belongs to which subject.
  • Deep sequencing or “deep coverage” refers to having a high amount of coverage for every nucleotide being sequenced.
  • deep sequencing has greater than 1,000 fold average reads per nucleotide, greater than 10,000 fold average reads per nucleotide, greater than 20,000 fold average reads per nucleotide, greater than 30,000 fold average reads per nucleotide, greater than 40,000 fold average reads per nucleotide, greater than 50,000 fold average reads per nucleotide, greater than 75,000 fold average reads per nucleotide, or greater than 100,000 fold average reads per nucleotide, in the context of mitochondrial deep sequencing.
  • the least read nucleotide in the run has at least 200 reads, at least 300 reads, at least 400 reads, at least 500 reads, at least 600 reads, at least 700 reads, at least 800 reads, at least 900 reads, at least 1000 reads, at least 1500 reads, at least 2000 reads or at least 3000 reads.
  • the deep sequencing has greater than 100 fold average reads per nucleotide, greater than 200 fold average reads per nucleotide, greater than 300 fold average reads per nucleotide, greater than 400 fold average reads per nucleotide, greater than 500 fold average reads per nucleotide, greater than 600 fold average reads per nucleotide, greater than 700 fold average reads per nucleotide, or greater than 800 fold average reads per nucleotide reads per nucleotide.
  • the least read nucleotide has at least 40 reads, at least 50 reads, at least 60 reads, at least 70 reads, at least 80 reads, at least 90 reads, at least 100 reads, at least 200 reads, at least 300 reads, at least 400 reads, or at least 500 reads.
  • predetermined number refers to a number that is calculated prior to the experiment. For instance, values from experimental runs may be compared to a predetermined number to determine if the experimental value is acceptable. In a specific embodiment, the deep sequencing index is calculate and a run is considered acceptable if the value of the DSI is greater than about 80, a predetermined number.
  • a processor or processors can be used to implement the invention in some embodiments.
  • a processor or processors can be used in performance of the operations driven by tangible computer-readable media disclosed herein.
  • the processor or processors can perform those operations under hardware control, or under a combination of hardware and software control.
  • the processor may be a processor specifically configured to carry out one or more those operations, such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the use of a processor or processors allows for the processing of information (e.g., data) that is not possible without the aid of a processor or processors, or at least not at the speed achievable with a processor or processors.
  • Some embodiments of the performance of such operations may be achieved within a certain amount of time, such as an amount of time less than what it would take to perform the operations without the use of a computer system, processor, or processors, including no more than one hour, no more than 30 minutes, no more than 15 minutes, no more than 10 minutes, no more than one minute, no more than one second, and no more than every time interval in seconds between one second and one hour.
  • Some embodiments of the tangible computer-readable media may be, for example, a CD-ROM, a DVD-ROM, a flash drive, a hard drive, or any other physical storage device.
  • Some embodiments of the present methods may include recording a tangible computer- readable medium with computer-readable code that, when executed by a computer, causes the computer to perform any of the operations discussed herein, including those associated with the present tangible computer-readable media. Recording the tangible computer-readable medium may include, for example, burning data onto a CD-ROM or a DVD-ROM, or otherwise populating a physical storage device with the data.
  • Mitochondrial disorders are clinically and genetically heterogeneous, with variable penetrance, expressivity, and differing age of onset (2, 3).
  • Deleterious mtDNA mutations usually exist in a mixed state (4).
  • the proportion of mutant mtDNA, known as mutant heteroplasmy varies in different tissues.
  • the type of mtDNA mutation, the tissue distribution of the mtDNA mutation and its degree of heteroplasmy contribute to the clinical phenotype and the severity of the disease.
  • a female carrying a pathogenic mtDNA mutation is at risk of passing the mutation to her offspring, even if she is asymptomatic with low level heteroplasmy in somatic cells, due to the possible involvement of the germline.
  • Detection and quantification of the degree of heteroplasmy among different tissues is necessary for an accurate diagnosis, proper genetic counseling, and management of the affected individual.
  • somatic mtDNA alterations are associated with aging and a variety of common diseases, including diabetes and cancers (5-10).
  • An embodiment of the invention is qualitative and quantitative evaluation of the entire mitochondrial genome used in assisting researchers and physicians in the identification of mtDNA alterations.
  • An embodiment of the invention is an accurate, comprehensive and cost effective method for the molecular diagnosis of mtDNA- based diseases. Disclosed below is one embodied approach using massively parallel sequencing (MPS) that provides quantitative base calls, exact deletion junction sequences, and quantification of deletion heteroplasmy. In order to adapt this method for molecular diagnosis in a clinical diagnostic setting, qualitative and quantitative controls have been developed and may be included in the analysis of each indexed sample for quality assurance.
  • MPS massively parallel sequencing
  • Deep sequencing and the quality control checks described here may also be used to test for nuclear genetic diseases.
  • Progressive Familial Intrahepatic Cholestasis is a group of genetic disorders which result in disruption of bile formation and flow, and are inherited in an autosomal recessive manner.
  • Four genes contribute to this genetic disease, ABACBl 1, ATP8B1, ABCB4, and JAGl.
  • Another example of a nuclear genetic based disease is glycogen storage diseases, which can cause hypoglycaemia, hepatomegaly, developmental delay and muscle cramps.
  • Early diagnosis with proper treatment can greatly improve the quality of life, reduce organ damage, and extend a subjects life span.
  • TK2, SUCLA2, SUCLG1, and RRM2B have been observed with mtDNA depletion and encephalomyopathy.
  • Myopathy with elevated creatine kinase is a frequent feature of TK2 deficiency.
  • Correctly selecting a group of genes to be analyzed can maximize the chance of successfully obtaining a molecular diagnosis.
  • many affected individuals do not fit into one particular category which becomes a challenge for clinicians to select the candidate genes.
  • Coverage depth of greater than about 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 fold may be used for the nuclear genome.
  • mtDNA the accuracy of calling an mtDNA pathogenic mutation and its mutant load (mutation heteroplasmy) has direct impact in the interpretation and any correlations with disease management, clinical outcomes, and genetic counseling. Therefore, in an embodiment of the invention, a much higher coverage is recommended in a diagnostic laboratory.
  • a coverage depth of greater than about 10,000, 20,000, 30,000, 40,000, 50,000, or 100,000 fold may be used for the mitochondrial genome. Quantification of heteroplasmy
  • heteroplasmy corresponds to 200 sequence reads, sufficient for the accurate measurement of mtDNA mutation heteroplasmy.
  • Discrepancies in the degree of heteroplasmy measured by ARMS qPCR and deep sequencing have been observed, although most are consistent. This is most likely due to the fundamental differences in methodologies.
  • ARMS qPCR is based on the discrimination of the 3' end nucleotide of either the forward or the reverse primer for the extension of DNA synthesis (11). The PCR reactions containing the wild type primers or the mutant primer are carried out in different tubes under the same conditions. The reported percentage of heteroplasmy may be influenced by the efficiency of PCR due to different primers.
  • the primers used for LR-PCR of the wild type and mutant mtDNA molecules may be identical.
  • the primers used for sequencing of the wild type and mutant mtDNA may also be identical.
  • the discrepancy between these two methods may also be due to the presence of mtDNA homologous regions in the human genome (NUMT), which is not amplified by LR-PCR.
  • Somatic mtDNA alterations have recently been found in all types of cancers and may play a role in tumorigenesis (5, 22-25). However, relatively few cancers harbor detectable deleterious somatic mtDNA mutations (6, 26, 27). Since increased reactive oxygen species (ROS) production and the resulting oxidative DNA damage is one of the hallmarks of cancer, one non- limiting hypothesis is that there may be numerous random mtDNA mutations in different mtDNA molecules, each at a heteroplasmic level below the detection limit of Sanger sequencing. Nevertheless, the sum of this damage may contribute to tumorigenesis (5). This hypothesis may be studied by using the deep sequencing approach that may reach the detection level equivalent to single molecule analysis.
  • ROS reactive oxygen species
  • a set of 14 nuclear polymorphic markers may be genotyped for each sample before further preparation for MPS. These polymorphic markers may then be amplified in a single multiplex PCR and the resulting DNA fragments mixed with LR-PCR-enriched DNA fragments from the same individual for indexing and MPS. The markers tested before and after MPS should match, a good quality control in a clinical diagnostic setting, particularly when patients' samples are mixed for analyses using complex procedures. InQC is platform independent. That is, this quality control may be used in any NGS sequencing protocols or systems. Examples of 14 polymorphic markers are found in the table below.
  • the nuclear polymorphic markers are chosen based on their random distribution throughout the nuclear DNA.
  • polymorphic markers may be chosen from published forensic lists, such as the list from Forensic Science International 149 (2005) 279-286. Additionally, the primer areas may be sequenced or genotyped prior to deep sequencing to be sure that there are no SNPs in the primer area. If there are SNPs in the primer area the SNPs will be lost. As such, a different primer may be used if SNPs are found in the original primer location.
  • DNA fragments with known sequences in various ratios were added to and indexed with each sample as a quality control measure, and processed exactly as the test sample.
  • This approach helps in identifying the source of any errors and in setting the guidelines for the secondary analysis of the clinical samples, including mapping the reads, base calling and calculation of heteroplasmy.
  • the results from Examples 3 and 4 demonstrate that the sequencing errors are largely stochastic, but some are not random.
  • the base line error rate is around 0.3%, with a standard deviation of 0.335%.
  • the inclusion of ExQC in samples may be used in a clinical diagnostic laboratory in order to assure the quality of performance and to determine the limit of detection for quantitative measurements of heteroplasmy, when used in mitochondrial DNA sequencing.
  • nucleotide position A is at 10%
  • nucleotide position B is at 20%
  • nucleotide position C is at 30 %
  • nucleotide position D is at 40%
  • nucleotide position E is at 50%
  • nucleotide position F is at 60%
  • nucleotide position G is at 70%
  • nucleotide position H is at 80%
  • nucleotide position I is at 90%
  • nucleotide position J is at 100% of the concentration of sample DNA.
  • sequences then form a ladder in which to measure machine error. For example, if the machine measures nucleotide position A at 5% instead of 10%, this is an indication that there may be something wrong with the run.
  • 3, 4, 5, 6, 7, 8, 9, 10 or more known sequences are used in the ladder.
  • a deviation of greater than about 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99% from the ladder is considered to be unacceptable error.
  • An example of an external quality control is to use seven synthetic DNA fragments, each 150 bp long with known variant at specific positions. These DNA fragments are amplified and then diluted. The diluted PCR products are pooled together according to the volumes a ExQC solution with specific percentages of wild type and variants, such as 0, 2, 5, 20, 50, 80, and 100 % of wild type. The ExQC DNA is then spiked into the sample DNA pool in order measure machine error. ExQC may be used in nuclear DNA sequencing to determine machine error, but also provides the added benefit of an accurate ladder when determining heteroplasmy in mitochondrial DNA. ExQC is platform independent. That is, this quality control may be used in any NGS sequencing protocols or systems.
  • heteroplasmy is added to a machine run in order to be sure that the heteroplasmy is detected. If the heteroplasmy of the known sample is not detected, all samples run may be considered unacceptable for clinical diagnostics.
  • the known sample has about 1% to 5% heteroplasmy. In a specific embodiment of the invention, the known sample has about 1% heteroplasmy.
  • LR-PCR Deep Sequencing with Long Range PCR
  • NUMTs may contribute to the variable coverage distribution for capture-based methods.
  • the depth of some of the well-covered regions may be vastly over-represented. This uneven coverage throughout the mitochondrial genome makes the detection of large deletions difficult.
  • the presence of NMUT in the human genome not only interferes with the base calls, but also with the quantification of mutation heteroplasmy, which is another critical factor in the diagnosis of mtDNA disorders.
  • the multiplex PCR-based target gene enrichment can efficiently cover the entire mtDNA genome with overlapping PCR fragments, which may also be used with nuclear DNA amplifaction.
  • overlapping PCR fragments due to the highly polymorphic nature of the mtDNA, the use of multiple pairs of primers increases the chance of a rare or novel variant at the primer binding sites, resulting in differential PCR efficiency among different amplicons and making the detection of large deletions almost impossible.
  • LR-PCR using one non-overlapping primer, is able to provide uniform coverage throughout the entire mitochondrial genome and, thus, allows for easy analysis of mtDNA large deletions (Fig. 4).
  • deep coverage is achieved by adjusting the number of the multiplexed samples to be sequenced in one lane of Illumina flowcell according to the capacity of the instrument and sequencing chemistry. At least 10 million reads are expected to achieve for each sample to guarantee 20,000X ⁇ 30,000X average coverage for the whole mitochondria genome.
  • Nuclear DNA may also be amplified by methods other than LR- PCR, such as being amplified by PCR amplification in solution capture.
  • Fig. 6 represents an example of the work flow of a deep sequencing run.
  • a sample is first obtained from a subject and information on the sample and the subject may be entered into a database.
  • the sample may be from any type of cell or tissue, or a mixture of cells or tissues.
  • the sample may be from blood, muscle, liver, cultured cells, buccal swab, tumor sample, urine, and hair.
  • the sample is then processed to extract the DNA of interest.
  • the sample may also be separated so that two or more regions of interested are extracted separately.
  • the DNA of interest may be mitochondrial DNA, nuclear DNA, or a combination thereof. Nuclear DNA of interest may come from only one
  • chromosome or may come from multiple chromosomes.
  • the DNA of interest may come from only one region, or may come from multiple regions.
  • the DNA of interest is then quantified by spectrophotometer, for example by NanoDrop. If necessary, the DNA may be separated into two or more different samples of the same subjects DNA.
  • LR-PCR is done on one sample in order to amplify all mitochondrial DNA interest.
  • Multiplex PCR may be done on the second sample in order to amplify nuclear DNA regions of interest.
  • Genotyping may be done on a third sample as part of the InQC.
  • the LR-PCR and Multiplex PCR are then mixed together and the ExQC containing known sequences with known concentrations is spiked into the mix.
  • the mix is then fragmented, such as by ultrasound or by restriction enzyme digest.
  • the samples are then quality controlled by Bioanalyzer, a library is generated, then the sample is quality controlled again by Bioanalyzer and then processed with qPCR. At this point multiple subjects' samples may be mixed together forming a multiplex sequencing sample.
  • samples are sequenced using NGS on a machine such as an illumina sequencer, Roche's 454 by pyro- sequencing, ABI SOLiD, Ion Torrent sequencer, Helicos Helioscope, or Pacific Bioscience's single molecular real time (SMRT) instrument.
  • a machine such as an illumina sequencer, Roche's 454 by pyro- sequencing, ABI SOLiD, Ion Torrent sequencer, Helicos Helioscope, or Pacific Bioscience's single molecular real time (SMRT) instrument.
  • the sequence data is collected and the DNA is demultiplexed and matched to a subject.
  • the InQC from the multiple PCR nuclear DNA sample should match the nuclear DNA sequence. That is, the genotypes of the nuclear DNA for a subject should match the genotypes of the sequenced DNA for the nuclear DNA. Additionally, genotypes of the mitochondrial DNA for a subject should match the mitochondrial sequenced DNA of the subject.
  • the percentages of ExQC should be similar to the percentages of ExQC that were added to the sample prior to fragmentation and sequencing. As an added check, Sanger sequencing may be performed to verify specific base calls.
  • ExQC The procedure of generating ExQC is the same between mitochondria DNA and nuclear DNA based runs. However, InQC may be generated differently. For mitochondria genome sequencing by MPS, InQC may be done by multiplex PCR and spiked in with LR-PCR products for library preparation. Whereas in nuclear DNA based MPS tests, probes for InQC regions are incorporated in the library and therefore InQC regions are captured simultaneously with other target regions during hybridization, for example.
  • a "deep sequencing index” (DSI) is used evaluate the end performance of a sequencing run and to compare the quality of sequencing results among different gene enrichment methods (example shown in Fig. 4A and Fig. 7).
  • This equation contains at least three of the following parameters, each with an empirically assigned weight.
  • the parameters consist of i) the average number of reads of ExQC DNA, ii) the average number of sample reads normalized to the average number of reads of ExQC DNA, iii) the correlation coefficient of the expected versus observed values of ExQC DNA heteroplasmy, iv) the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, v) the specificity and the sensitivity of the run determined from the reads mapped to mtDNA, vi) the number of unmapped reads generated from the same run, vii) the average read for each nucleotide, viii) the number of unmapped nucleotides, and vii) the percentage of genotypes corresponding correctly to the sequence.
  • the weighting can be empirically determined depending on the emphasis of experiments.
  • the DSI is defined as the sum of at least three of these weighted parameters normalized to the average DSI from previous runs.
  • the DSI may be constantly monitored as the number of experiments increases, and quality of each experiment is always evaluated and compared to the average quality of all previous experiments.
  • the DSI is useful in comparing the quality of performance among different platforms, laboratories, methods, runs, and even different technicians.
  • the depth of coverage is a parameter in one embodiment of the calculation of DSI.
  • a DSI of greater than 80 is an indication of a quality sequence.
  • Fig. 7 is an example of a DSI using six DSI parameters.
  • Example standards for an acceptable run is an average coverage depth > 10,000, an average coverage uniformity with a standard deviation of less than about 20%, coverage on all bases in the region of interest, the linearity of
  • heteroplasmy as given by the ExQC correlation coefficient being greater than 95%, and a specific numeric value for the DSI for which to measure acceptable runs, for example.
  • Example 3 The results shown in Example 3, below, demonstrate that LR-PCR of the whole mitochondrial genome followed by deep sequencing may be considered as the new standard for the analysis of the mitochondrial genome.
  • This comprehensive approach makes qualitative and quantitative calls of every nucleotide position of the 16,569 bp mtDNA, it detects large and small mtDNA deletions and insertions, identifying both the breakpoints and the degree of heteroplasmy, and it is cost effective.
  • This novel approach will greatly facilitate the diagnosis of mitochondrial DNA diseases in a timely and cost effective fashion.
  • the deep sequencing method can be adapted to study somatic DNA alterations in cancer or aging tissues, and mosaicism, for example.
  • the deep sequencing techniques and quality controls can be used in cancer application. Some types of tumors experience mosaicism, and as such, the ability to quantify the percentages of mutations throughout a tumor may be important in diagnosis and treatment.
  • kits may comprise a suitably aliquoted ExQC DNA and the components of the kits may be packaged either in aqueous media or in lyophilized form.
  • the container means of the kits will generally include at least one vial, test tube, flask, bottle, syringe or other container means, into which a component may be placed, and preferably, suitably aliquoted. Where there are more than one component in the kit, the kit also will generally contain a second, third or other additional container into which the additional components may be separately placed. However, various combinations of components may be comprised in a vial.
  • the kits of the present invention also will typically include a means for containing the ExQC and any other reagent containers in close confinement for commercial sale. Such containers may include injection or blow molded plastic containers into which the desired vials are retained.
  • Mitochondrial diseases are clinically and genetically heterogeneous, with variable penetrance, expressivity, and differing age of onset. Disease-causing point mutations and large deletions in the mitochondrial genome often exist in a heteroplasmic state. Current molecular analyses require multiple different and complementary methods for the detection and quantification of mitochondrial DNA mutations. The quantification of heteroplasmy is limited to only a few common mutations.
  • the entire mitochondrial genome was amplified using 24 pairs of sequence specific overlapping primers (28-30). Sequencing reactions were performed on purified PCR products using the BigDye Terminator Cycle Sequencing kit, and analyzed on an ABI3730XL automated DNA sequencer. Sequences were analyzed using Mutation Surveyor version 3.20. GenBank sequence NC_012920.1 was used as reference sequence for the mitochondrial genome (28-30).
  • DNA samples that were evaluated by conventional methods listed above with or without positive mutations were analyzed on Illumina HiSeq 2000 platform. Eighteen samples with known deleterious point mutations identified by Sanger or ASO (allele specific oligonucleotide) dot blot screening (31) and quantified by ARMS qPCR (11) were verified by HiSeq sequencing. Eleven samples without any identified deleterious mutations by Sanger sequencing were also analyzed by this method. In addition, 4 samples with known mtDNA deletions were included.
  • NGS Next Generation Sequence
  • the whole mitochondria genome (16569 bp) was amplified with a single pair of primers in the D-loop region, which is free of deletion events reported so far in mtDB database (32) (Fig. 9).
  • the forward and reverse primers are (-): mtl6426F- 5'ccgcacaagagtgct actctcctc3' (SEQ ID NO: l) and mtl6425R- 5'gatattgatttcacggaggatggtg3' (SEQ ID NO:2).
  • PCR was performed using TaKaRa LA Taq Hot Start polymerase kit (TaKaRa Bio Inc., Madison, WI, USA) and 100 ng of total genome DNA isolated from blood or 15 ng from skeleton muscle as template in a 50 ⁇ PCR system. PCR product was analyzed on a 1.5% agarose gel with a 1 kb plus DNA ladder (Invitrogen, Carlsbad, CA, USA).
  • Primers were designed to amplify 14 different nuclear gene loci. PCR was performed with Roche FastStart Taq DNA polymerase (Roche Diagnostics, Indianapolis, IN, USA) using 7.5 ng of total genomic DNA template in a 50 ⁇ PCR system.
  • ExQCs External quality controls, ExQCs, are seven synthetic DNA fragments, each 150 bp long with known variant at specific positions. These DNA fragments were cloned into an EcoRV digested pBluescriptll SK(-) vector and amplified with primer pair MrfNGS-F (5 '-3' : gagagtaatctgtgctctggc) (SEQ ID NO:3) and MITNGS-R (5'-3': accgttagcgtggcag) (SEQ ID NO:4). PCR was performed using Roche FastStart Taq DNA polymerase (Roche diagnostics, Indianapolis, IN, USA).
  • PCR products were brought up to 500 ⁇ with TE buffer (10 mM Tris-Cl, pH 7.5, 1 mM EDTA).
  • TE buffer 10 mM Tris-Cl, pH 7.5, 1 mM EDTA.
  • the diluted PCR products were pooled together according to the volumes listed in Fig. 11 to make 1 ml ExQC solution with 0, 2, 5, 20, 50, 80, and 100 % of wild type and variants 1, 2, 3, 4, 5, and 6 respectively.
  • Indexed paired-end DNA libraries were prepared following the Multiplexing Sample Preparation Guide provided by the manufacturer (PE-930-1002, Illumina, CA, USA) with some modifications. Briefly, PCR products were quantified using the Qubit dsDNA HS assay (Invitrogen, Carlsbad, CA, USA). LR-PCR products were fragmented using Covaris S2 (Covaris Inc., Woburn, MA, USA) with default 200 bp program. Fragments were purified with AMPure XP beads (Beckman Coulter Inc., Brea, CA) and the size and distribution of the DNA fragments were evaluated by using BioAnalyzer (Agilent, Technologies, Foster City, CA, USA).
  • NextGENe software SoftGenetics, State College, PA. Reads with one unknown assigned base call were also removed. The data were processed with mutation filter percentages of 20% and 1%. The aligned reads were examined with NextGENe Viewer. The degree of deletion heteroplasmy is calculated based on the segmental average read depth of the deleted region and undeleted region.
  • Fig. 16 shows the results of base calls for 12 samples analyzed by MPS compared to those of Sanger sequencing. Both methods identified concordant nucleotide changes with the exception of three heteroplasmic calls: the low heteroplasmy m. l630A>G described above, an m.
  • l6194insC variant present at 15% heteroplasmy detected by MPS but missed by Sanger sequencing, and a high heteroplasmy of m.303insC/CC identified by MPS, but identified as homoplasmic by Sanger sequencing.
  • Small insertions and deletions within a homopolymeric stretch, such as m.303_309insC, m.311_315insC and m. l6194insC, are also accurately identified by MPS.
  • the MPS method not only detects all the changes detected by Sanger sequencing, but also identifies low heteroplasmic changes that are missed by the Sanger method (Fig. 16).
  • the MPS method provides quantitative information for each base call.
  • Amplification of the entire mitochondrial genome by LR-PCR using a single pair of primers provides uniform coverage and allows for easy detection of large deletions (Fig. 4A). Large mtDNA deletions with deletion breakpoints are clearly shown. The degree of deletion heteroplasmy can be calculated from the ratio of the average coverage at the deleted region to the average coverage at the non-deleted region.
  • Several samples with known mtDNA deletions have been analyzed by LR-PCR, followed by deep sequencing (Fig. 4B). The results of deletion junctions, size of deletion, and degree of heteroplasmy detected by MPS are consistent with the results obtained from target array CGH/ Southern blotting (16, 18) for each sample tested (Fig. 4C). Enhanced sensitivity was observed for detecting large mtDNA deletions, which is likely due to the preferential amplification of the shorter fragment.
  • the advantage of deep sequencing is the ability to accurately quantify each nucleotide.
  • Samples with known heteroplasmic mutations m.3243A>G, m.8993T>C, m. l l778G>A, and m. l4484T>C, m. 13513G>A, and m. l0191T>C were evaluated by deep sequencing.
  • Fig. 2 shows the different levels of heteroplasmic point mutations determined by MPS across the mtDNA genome.
  • the results from MPS are in good agreement with those obtained using amplification refractory mutation system (ARMS) qPCR method (12).
  • the deviations between the two methods are likely due to the intrinsic differences in the design of the methodologies. Due to the presence of a large number of nuclear mitochondrial DNA segments (NUMT or mtDNA pseudogenes, see discussion), the MPS method is believed to more accurately reflect the true mtDNA heteroplasmic status.
  • control DNA sequences except the nucleotide positions marked for heteroplasmy measurements, were used to calculate error rates.
  • the instrument sequencing error was determined by the number of incorrect nucleotide reads of the control DNA sequences and the total number of nucleotide reads mapped to the control DNA sequences under the analytical setting.
  • the experimental errors defined as overall errors of a sample from sample preparation to sequence results, were calculated in a similar manner, except that the reference was the mtDNA sequence.
  • the error rates for the indexed control DNAs and the 12 indexed samples are depicted in Fig. 3A.
  • the sample DNA has an analytical error rate of 0.326+/-0.335%, as compared to 0.151+/-0.394% for the control DNAs.
  • the limit of detection is calculated to be 1.33% under the current experimental, instrument, and analytical setups.
  • the observed and the expected percentages of the variants at specific positions exhibit an excellent correlation.
  • FIG. 5 panel B represents the result of gene enrichment using SureSelect solution-based sequence capture, with RNA probes showing an uneven coverage from as low as 1,000X to as high as 50,000x throughout the mtDNA genome.
  • the coverage is uniform within each PCR fragment itself but varies from about 5,000X to about 40,000X among different PCR fragments, with overlapping regions covered excessively at >80,000X (Fig. 5C).
  • Fig. 5C Fig.
  • 5D shows the uniform coverage throughout the entire mitochondrial genome when the mtDNA genome was enriched by a single LR-PCR amplification. Assuming that each of the 6 parameters is given the weight of 1, 2, 2, -3, 1, and 5, respectively, the DSI, for SureSelect capture-based, 24 overlapping PCR fragments, and LR- PCR gene enrichment methods was calculated to be about 36, 74, and 90, respectively. If, in addition to detecting point mutations, the detection of a large mtDNA deletion is required, SureSelect and 24 PCR mixtures are not satisfactory methods due to uneven coverage of different regions, which makes the detection of large deletions unreliable unless sophisticated analytical algorithms are developed.
  • a set of 14 nuclear polymorphic markers was genotyped for each sample before further preparation for MPS. Primer sets used in the production of InQC are shown in Fig. 10. The polymorphic markers were amplified in a single multiplex and the resulting DNA fragments were mixed with the LR-PCR-enriched mtDNA fragments from the same individual for indexing and MPS. The markers tested before and after MPS need to match, an important quality control absolutely required in a clinical diagnostic setting, particularly when patients' samples are mixed for analyses using complex procedures.
  • Mitochondrial diseases result from dysfunction of the mitochondrial respiratory chain. It can be caused by mutations in mitochondrial DNA (mtDNA) or in nuclear genes that encode proteins function in mitochondria. 80-95% of patients with clinically suspected primary mitochondrial disease do not harbor a pathogenic mutation in the initial screen of the mtDNA. These cases are normally further tested for mutations in nuclear-encoded mitochondrial genes that are associated with distinct clinical phenotypes. For example, POLG, DGUOK, MPV17, and ClOorfl mutations have been observed with mtDNA depletion and hepatoencephalopathy. TYMP mutations are associated with Mitochondrial
  • MNGIE NeuroGastroIntestinal Encephalomyopathy
  • the Depletion Panel is a panel that may be performed using the deep sequencing technique described above. It contains 14 nuclear genes (C10ORF2, DGUOK, MPV17, OPA1, OP A3, POLG, POLG2, RRM2B, SLC25A4, SUCLA2, SUCLG1, SUCLG2, TK2 and TYMP) that are involved in the maintenance of mtDNA integrity and deoxynucleotide salvage pathway. These genes are analyzed by the "deep sequencing technique" by the application of Massive Parallel Sequencing (MPS) utility to the clinical diagnosis. The results demonstrate that all of the targeted regions are fully covered with at least 100X coverage. The mutations called by MPS have 100% concordance with the list generated by Sanger sequencing. For quality assurance, proper qualitative and quantitative controls were instituted to be analyzed along with each sample. The controls allow the determination of experimental errors which provide the estimation of limit of detection. Table 1 below shows the results of a depletion-panel test.
  • the Progressive Familial Intrahepatic Cholestasis (PFIC) panel is another example of a nuclear genetic test which is prepared with the above deep sequencing technique.
  • PFIC is a group of genetic disorders which result in disruption of bile formation and flow, and are inherited in an autosomal recessive manner. The estimated incidence is 1 per 50,000- 100,000 births.
  • the PFIC panel consisting of four genes, ABCBl l, ATP8B 1, ABCB4, and JAGl offers a single, one-step and convenient test for molecular diagnosis of patients presenting with cholestasis, among other symptoms.
  • MPS Massive Parallel Sequencing
  • GSDs Glycogen storage diseases
  • GSDs are a group of inherited genetic defects of glycogen metabolism. GSDs are categorized into 14 subtypes, based on the specific enzyme deficiency. Common symptoms include hypoglycemia, hepatomegaly, developmental delay and muscle cramps. The outcome for untreated patients GSDs can be devastating, if early diagnosis is not made. Early diagnosis with proper treatment can greatly improve the quality of life, reduce organ damage, and extend patient's life span. Due to the genetic heterogeneity of GSDs and limited availability of enzyme studies, sequencing one gene at time by Sanger method is expensive and time consuming.
  • Deep sequencing was used to analyze two panels of genes responsible for the liver and the muscle forms of GSD with massively parallel sequencing for effective molecular diagnosis of patients with suspected GSDs.
  • a total of 294 coding exons of 16 genes; GYS2, GYS1, G6PC, SLC37A4, GAA, AGL, GBE1, PYGM, PYGL, PFKM, PHKA2, PHKB, PHKG2, PHKA1, PGAM2,and PGM1 were included. All exons were covered at > 50X with an average coverage of 700X.
  • a total of 7 samples with known mutations were validated. The results demonstrated equal sensitivity and specificity compared to Sanger method. All disease causing mutations were identified correctly.
  • the mutation types include single nucleotide substitution, small deletions and duplications.
  • a homozygous intragenic deletion involving exons 3-5 of the G6PC gene (GSDIa) and a homozygous deletion of exon 16 in the GBE gene (GSDIV) was also detected.
  • mutations were identified in patients who were previously tested negative in the selected limited number of GSD genes by Sanger sequencing. The GSD panel testing provides a cost effective diagnosis with fast turnaround time for patients with clinical indications and /or biochemical evidence suggesting a GSD.

Abstract

The present invention relates to a deep sequencing technique used to sequence nuclear and/or mitochondrial DNA. The invention also relates to techniques for quality control in a diagnostic laboratory environment. Specifically, approaches are disclosed to analyze the nuclear and or mitochondrial genomes by massively parallel sequencing, which demonstrates superior sensitivity and specificity of base calls with quantitative information at each nucleotide position of the genome. This method is able to simultaneously detect large deletions, map breakpoints, and quantify deletion heteroplasmy. Sufficient qualitative and quantitative controls for each sample are disclosed for use in a diagnostic setting.

Description

A METHOD FOR COMPREHENSIVE SEQUENCE ANALYSIS USING
DEEP SEQUENCING TECHNOLOGY
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 61/453,317 filed March 16, 2011, and U.S. Provisional Patent Application No. 61/598,439 filed February 14, 2012, both of which are hereby incorporated by reference in full.
BACKGROUND OF THE INVENTION
[0002] Traditional genetic laboratories have focused on the research aspects of the genetic code. These laboratories are directed to reviewing large amounts of genetic code from multiple individuals to uncover abnormalities that lead to disease. However, as more
information is published regarding specific genetic abnormalities that lead to disease, individuals can be tested for the specific genetic abnormalities. Clinical genetic testing is one of the most rapidly expanding fields in laboratory testing and clinical practice. Special considerations must be made in these clinical laboratories that are not made in traditional research laboratories, as results effect the treatment and life decisions of patients and their families. The results of such genetic testing must be accurate and reliable with error rates as small as possible given the technology. Additional quality assurance checks at each step of the way provide assurance that the sample is the correct sample, that there has been no cross-contamination, and that the machine error is not prohibitive to an accurate diagnosis.
[0003] Described here is a clinically applicable "deep" sequencing technique using Next Generation Sequencing (NGS) in clinical diagnosis. The deep sequencing technique demonstrates uniform coverage of each of the 16,569 bases of the mitochondrial genome at over 10,000 fold, for example. The high coverage allows not only the detection of nucleotide changes, but also the degree of heteroplasmy at every single base in mitochondrial DNA. Moreover, the deep sequencing technique is able to simultaneously detect small indels and large deletions, map exact breakpoints, calculate deletion heteroplasmy, and monitor copy number changes. The embodiments described below demonstrate the superior sensitivity and specificity of base calling with quantitative information when compared to the gold standard Sanger sequencing. For quality assurance, additional qualitative and quantitative controls may be analyzed along with each sample. The controls also allow the determination of experimental errors which provide the estimation of limit of detection. The "deep" sequencing approach provides a comprehensive molecular analysis for patients with suspicion of genetic diseases in a timely, accurate, and cost- effective manner. Embodiments of the invention provide a multifaceted approach to deep sequencing nuclear and/or mitochondrial DNA, error checking and quality assurance, while enabling multiple subjects to be sequenced in the same run.
BRIEF SUMMARY OF THE INVENTION
[0004] The invention relates to deep sequencing and to additional quality control checks for use in a clinical diagnostic setting. Embodiments of the invention include deep sequencing methods, external and internal quality control methods and kits, and methods to determine the overall quality of a sequencing run.
[0005] An embodiment of the invention is a method of quality control comprising adding to an unsequenced sample of DNA at least three known sequences, wherein each sequence is at a different known concentration. At least four, five, six, seven, eight, nine or more known sequences with different known concentrations may be added to the sample. The embodiment may further comprise sequencing the DNA sample and comparing the percentages of the known sequences in the sequenced sample to the starting concentration of the known sequences. The sequence may be rejected if the correlation coefficient of the expected versus observed values of the concentrations is less than about 99%, less than about 98%, less than about 95%, or less than about 90%. Another embodiment of the invention is a kit comprising three or more known sequences of DNA, wherein each sequence is at a different known concentration.
[0006] A general embodiment of the invention is a method of quality control comprising genotyping at least a first and second samples; pooling the samples; sequencing the pooled samples; demultiplexing the samples; and comparing the genotype of the first sample to the demultiplexed sequence of the first sample. The method may further comprise rejecting the sequence if the sequence does not match with at least 50% of the sequence, at least 75% of the sequence, at least 80% of the sequence, at least 85% of the sequence, at least 90% of the sequence, at least 95% of the sequence, at least 98% of the sequence, at least 99% of the sequence, or at least 100 % of the sequence. [0007] Another general embodiment of the invention is a method for quality control of sequencing data comprising: receiving at least three parameters corresponding to DNA sequencing, wherein in specific embodiments the parameters comprise three or more of the average number of reads of external control DNA, the average number of sample reads, the average number of sample reads normalized to the average number of reads of external control, the correlation coefficient of the expected versus observed values of external control, the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, the specificity determined from reads mapped and the sensitivity of reads mapped, or the number of unmapped reads; determining, using a processor, a weighted summed value based on the received parameters, accepting results of the DNA sequencing if the value is over a
predetermined number. The method may comprise receiving at least three parameters, receiving at least four parameters, receiving at least five parameters, or receiving all parameters. In an embodiment of the invention, the known sequences mimic heteroplasmy. In a specific embodiment of the invention, the sequences each have one nucleotide that is different from another of the known sequences.
[0008] Embodiments also include a system for quality control of sequencing data, the system comprising a processor in communication with a memory where: the memory stores processor-executable code; the processor is configured to be operable in conjunction with the processor-executable code to: receive at least three parameters corresponding to DNA sequencing, wherein the parameters are the average number of reads of external control DNA, the average number of sample reads, the average number of sample reads normalized to the average number of reads of external control, the correlation coefficient of the expected versus observed values of external control, the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, the specificity determined from reads mapped and the sensitivity of reads mapped, or the number of unmapped reads; determine a weighted summed value based on the received parameters and, transmit the weighted summed value. Additionally, the system may reject the sequence if the weighted summed value is above or below a predetermined number. Another embodiment is a non-transitory computer readable- medium comprising computer-usable program code executable to perform operations comprising: receiving at least three parameters corresponding to DNA sequencing, wherein the parameters are the average number of reads of external control DNA, the average number of sample reads, the average number of sample reads normalized to the average number of reads of external control, the correlation coefficient of the expected versus observed values of external control, the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, the specificity determined from reads mapped and the sensitivity of reads mapped, or the number of unmapped reads; determining a weighted summed value based on the received parameters, transmitting the weighted summed value. Additionally, the sequence may be rejected if the weighted summed value is above or below a predetermined number.
[0009] Another general embodiment of the invention is a method of quality control comprising genotyping at least a first and second samples; pooling the samples; sequencing the pooled samples; demultiplexing the samples; and comparing the genotype of the first sample to the demultiplexed sequence of the first sample. In embodiments of the invention 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 11 or more, 12 or more, 13 or more, 14 or more, 15 or more, 20 or more, 25 or more, 30 or more, 35 or more, 40 or more, 45 or more, 50 or more, 75 or more, 100 or more genotypes. In some embodiments of the invention, the genotypes are represented by SNPs. In embodiments of the invention 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more samples may be pooled. In embodiments of the invention, sequencing is done with next generation sequencers such as an Illumina sequencer, Roche's 454 by pyro-sequencing, ABI SOLiD, Ion Torrent sequencer, Helicos Helioscope, or Pacific
Bioscience's single molecular real time (SMRT) instrument.
[0010] Another general embodiment of the invention is a method comprising receiving a plurality of DNA samples; pooling the samples; sequencing the sample on a next generation sequencer, wherein the sequencer has been adjusted to provide deep sequencing; demultiplexing the samples; and outputting the sequences of the demultiplexed samples. The samples may include only nuclear DNA, only mitochondrial DNA or both. The deep sequencing may comprise sequencing with greater than 1,000 fold average reads per nucleotide, greater than 10,000 fold average reads per nucleotide, greater than 20,000 fold average reads per nucleotide, greater than 30,000 fold average reads per nucleotide, greater than 40,000 fold average reads per nucleotide, greater than 50,000 fold average reads per nucleotide, greater than 75,000 fold average reads per nucleotide, or greater than 100,000 fold average reads per nucleotide, in the context of mitochondrial deep sequencing. In the context of nuclear deep sequencing, the deep sequencing may comprise greater than 100 fold average reads per nucleotide, greater than 200 fold average reads per nucleotide, greater than 300 fold average reads per nucleotide, greater than 400 fold average reads per nucleotide, greater than 500 fold average reads per nucleotide, greater than 600 fold average reads per nucleotide, greater than 700 fold average reads per nucleotide, or greater than 800 fold average reads per nucleotide reads per nucleotide.
[0011] The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] For a more complete understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawing, in which:
[0013] FIG. 1 is an embodiment of the detection of mtDNA large deletions by massively parallel sequencing (MPS), also known as Next Generation Sequencing (NGS) compared to detection by aCGH or Southern analysis (ND is not determined).
[0014] FIG. 2 is a quantification of heteroplasmic point mutations by MPS.
Samples with known heteroplasmic mutations, m.3243A>G, m.8993T>C, m.l0191T>C m. l l778G>A, m.l4484T>C, and m. l3513G>A, were evaluated by MPS. Levels of
heteroplasmy are listed and compared to the results obtained by ARMS qPCR.
[0015] FIG. 3 illustrates the assessment of analytical error and correlation of the observed and the expected data. Fig. 3A depicts the instrumental (open bars) and experimental (close bars) errors for the 12 indexed ExQC DNAs and the corresponding samples. Fig. 3B depicts the correlation of the observed and the expected percentage heteroplasmy at a specific position in the spiked-in controls (ExQC). The correlation coefficient is 1.
[0016] Fig. 4 illustrates the detection of mtDNA deletions by whole mtDNA amplification followed by MPS. Fig. 4A illustrates a uniform coverage throughout the entire mitochondrial genome with the following subfigures - a: normal control without deletion; b-e: mtDNA large deletions with various size and percentage of deletion heteroplasmy. The deletion breakpoints are clearly shown. Fig.4B is the agarose gel analysis of example LR-PCR products. Lanes Ml and M2 are size markers; lane a: normal control without deletion, lanes b and d: showing DNA bands of both intact and smaller deletion molecules, lanes c and e: showing smaller bands of deletion molecules (>90 ), the intact mtDNA are barely detected (<10 ). Fig. 4C illustrates the size of deletion and degree of heteroplasmy. The exact breakpoints determined by PCR/sequencing and targeted aCGH are listed in Fig. 1.
[0017] Fig. 5 illustrates performance evaluation of target enrichment methods by "deep sequencing index" (DSI). Three target enrichment methods (in solution capture, 24 PCR fragments, and single LR-PCR amplification of the entire mtDNA) were performed and evaluated by the DSI. Fig. 5A is an exemplary formula of the "deep sequencing index" (DSI). Fig. 5B is in solution enrichment of target genes using SureSelect RNA probes. Fig. 5C is gene enrichment by mixing 24 PCR fragments. Fig. 5D is long range PCR with a single pair of primers for the entire mitochondrial genome.
[0018] Fig. 6 is an exemplary flow chart for deep sequencing of the mitochondrial genome.
[0019] Fig. 7 is an overview of an example Deep Sequencing Index.
[0020] Fig. 8 is exemplary results from a patient sample run with mitochondrial whole genome deep sequencing. The top table represents standards and/or controls, while the bottom table is a subject sample.
[0021] Fig. 9 illustrates mitochondrial genome with example primers shown as F and R.
[0022] Fig. 10 is a list of exemplary primers for amplification of InQC genotype markers (SEQ ID NOS:6-33). [0023] Fig. 11 is a list of exemplary external quality controls and proportions of nucleotide at each specific variant position
[0024] Fig. 12 illustrates base calls by MPS and Sanger sequencing of an exemplary sample. The 3.7% heteroplasmy of the m. l630A>G mutation was not detected by Sanger method (see Fig. 1 for sequence trace).
[0025] Fig. 13 illustrates the detection of the m.1630 mutation by NGS but not by Sanger sequencing and is the sequence trace of Sanger results. The m. l630A>G change was not detected by Sanger sequencing (SEQ ID NO:34).
[0026] Fig. 14 illustrates that low heteroplasmy of m.1630A>G is detected by MPS (SEQ ID NO:35 and 36).
[0027] Fig. 15 is an example of detailed information generated by the NextGENe program.
[0028] Fig. 16 is a comparison of next generation MPS with Sanger sequencing and shows sensitivity and specificity of variant detection by MPS using Sanger method as the standard.
[0029] Fig. 17 illustrates the reproducibility of mtDNA heteroplasmy detection by NGS with deep sequencing. The heteroplasmy of a sample with an admixture of haplogroups J and H was analyzed twice in two independent illumina's paired end and a-single-end runs. Each measurement shows about 15% heteroplasmy for haplogroup H (hatched bars) and 85 % for haplogroup J (open and closed bars). The Pearson's correlation coefficient for the two runs are 0.998 with p value <0.001. The numbers on the top of the histogram are shared homoplasmic mtDNA variant positions. The numbers at the bottom are mtDNA variant positions
distinguishing haplotypes H (hatched bars) and J (open and close bars).
DETAILED DESCRIPTION OF THE INVENTION
[0030] In keeping with long-standing patent law convention, the words "a" and "an" when used in the present specification in concert with the word comprising, including the claims, denote "one or more." Some embodiments of the invention may consist of or consist essentially of one or more elements, method steps, and/or methods of the invention. It is contemplated that any method or composition described herein can be implemented with respect to any other method or composition described herein.
[0031] As used herein, an "individual" is an appropriate individual for the method of the present invention. Individuals may also be referred to as "patients," or "subjects."
[0032] The term "essentially equal" or "about" as used herein, refers to equal values or values within the standard of error of measuring such values. The term "substantially," as used herein refers to an amount that is within 3%.
[0033] "Multiplex PCR," as used herein, refers to using multiple PCR primers to amplify the same pool of DNA. "Multiplex sequencing," as used herein, refers to pooling multiple subjects DNA and sequencing the pool in one run.
[0034] As used herein, "demultiplexing" or "demultiplexed" refers to a sequence that has been assigned to a subject. For example, in multiplexed sequencing each fragment of a subjects DNA is tagged with an identifying DNA fragment that corresponds to the subject. After multiple subjects DNA fragments are mixed together and sequenced, this ID tag is then used to identify which sequence belongs to which subject.
[0035] "Deep sequencing" or "deep coverage" refers to having a high amount of coverage for every nucleotide being sequenced. For example, deep sequencing has greater than 1,000 fold average reads per nucleotide, greater than 10,000 fold average reads per nucleotide, greater than 20,000 fold average reads per nucleotide, greater than 30,000 fold average reads per nucleotide, greater than 40,000 fold average reads per nucleotide, greater than 50,000 fold average reads per nucleotide, greater than 75,000 fold average reads per nucleotide, or greater than 100,000 fold average reads per nucleotide, in the context of mitochondrial deep sequencing. Additionally, in the context of mitochondrial deep sequencing the least read nucleotide in the run has at least 200 reads, at least 300 reads, at least 400 reads, at least 500 reads, at least 600 reads, at least 700 reads, at least 800 reads, at least 900 reads, at least 1000 reads, at least 1500 reads, at least 2000 reads or at least 3000 reads. In the context of nuclear deep sequencing, the deep sequencing has greater than 100 fold average reads per nucleotide, greater than 200 fold average reads per nucleotide, greater than 300 fold average reads per nucleotide, greater than 400 fold average reads per nucleotide, greater than 500 fold average reads per nucleotide, greater than 600 fold average reads per nucleotide, greater than 700 fold average reads per nucleotide, or greater than 800 fold average reads per nucleotide reads per nucleotide. Additionally, in the context of nuclear deep sequencing, the least read nucleotide has at least 40 reads, at least 50 reads, at least 60 reads, at least 70 reads, at least 80 reads, at least 90 reads, at least 100 reads, at least 200 reads, at least 300 reads, at least 400 reads, or at least 500 reads.
[0036] As used herein, the phrase "predetermined number" refers to a number that is calculated prior to the experiment. For instance, values from experimental runs may be compared to a predetermined number to determine if the experimental value is acceptable. In a specific embodiment, the deep sequencing index is calculate and a run is considered acceptable if the value of the DSI is greater than about 80, a predetermined number.
[0037] Additionally, a processor or processors can be used to implement the invention in some embodiments. For example, a processor or processors can be used in performance of the operations driven by tangible computer-readable media disclosed herein. Alternatively, the processor or processors can perform those operations under hardware control, or under a combination of hardware and software control. For example, the processor may be a processor specifically configured to carry out one or more those operations, such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA). The use of a processor or processors allows for the processing of information (e.g., data) that is not possible without the aid of a processor or processors, or at least not at the speed achievable with a processor or processors. Some embodiments of the performance of such operations may be achieved within a certain amount of time, such as an amount of time less than what it would take to perform the operations without the use of a computer system, processor, or processors, including no more than one hour, no more than 30 minutes, no more than 15 minutes, no more than 10 minutes, no more than one minute, no more than one second, and no more than every time interval in seconds between one second and one hour.
[0038] Some embodiments of the tangible computer-readable media may be, for example, a CD-ROM, a DVD-ROM, a flash drive, a hard drive, or any other physical storage device. Some embodiments of the present methods may include recording a tangible computer- readable medium with computer-readable code that, when executed by a computer, causes the computer to perform any of the operations discussed herein, including those associated with the present tangible computer-readable media. Recording the tangible computer-readable medium may include, for example, burning data onto a CD-ROM or a DVD-ROM, or otherwise populating a physical storage device with the data.
Mitochondrial diseases
[0039] Recent studies suggest that 1 in 200 healthy individuals are at risk to develop mitochondrial diseases due to mitochondrial DNA (mtDNA) mutations (1).
Mitochondrial disorders are clinically and genetically heterogeneous, with variable penetrance, expressivity, and differing age of onset (2, 3). Deleterious mtDNA mutations usually exist in a mixed state (4). The proportion of mutant mtDNA, known as mutant heteroplasmy, varies in different tissues. The type of mtDNA mutation, the tissue distribution of the mtDNA mutation and its degree of heteroplasmy contribute to the clinical phenotype and the severity of the disease. A female carrying a pathogenic mtDNA mutation is at risk of passing the mutation to her offspring, even if she is asymptomatic with low level heteroplasmy in somatic cells, due to the possible involvement of the germline. Detection and quantification of the degree of heteroplasmy among different tissues is necessary for an accurate diagnosis, proper genetic counseling, and management of the affected individual. In addition, there is growing evidence that somatic mtDNA alterations are associated with aging and a variety of common diseases, including diabetes and cancers (5-10). An embodiment of the invention is qualitative and quantitative evaluation of the entire mitochondrial genome used in assisting researchers and physicians in the identification of mtDNA alterations.
[0040] Current molecular diagnostic methods for the detection and quantification of mtDNA mutations are based on the screening of a panel of common point mutations followed by the quantification of the mutant load if specific probes for the mutations are available (11-13). If common mutations are not detected, sequencing of the whole mitochondrial genome is performed to find causative rare variants or novel mutations. While the gold standard, Sanger sequencing, reliably detects most nucleotide changes it does not provide quantitative information and is not adequately sensitive for detecting mutant heteroplasmy below 15% (14). Large mtDNA deletions are analyzed separately by Southern blotting (15) or by custom designed high density oligonucleotide array comparative genome hybridization (aCGH) (16, 17). Nevertheless, neither method by itself can provide detailed information of deletion breakpoints and mutant heteroplasmy. In order to obtain comprehensive information, a combination of these individual assays has to be performed. Even so, the inability to provide quantitative information regarding the degree of heteroplasmy at each nucleotide position, which is important in assessing disease correlations and genetic counseling, is still a serious limitation. An embodiment of the invention is an accurate, comprehensive and cost effective method for the molecular diagnosis of mtDNA- based diseases. Disclosed below is one embodied approach using massively parallel sequencing (MPS) that provides quantitative base calls, exact deletion junction sequences, and quantification of deletion heteroplasmy. In order to adapt this method for molecular diagnosis in a clinical diagnostic setting, qualitative and quantitative controls have been developed and may be included in the analysis of each indexed sample for quality assurance.
Nuclear genetic disease
[0041] Deep sequencing and the quality control checks described here may also be used to test for nuclear genetic diseases. For example, Progressive Familial Intrahepatic Cholestasis is a group of genetic disorders which result in disruption of bile formation and flow, and are inherited in an autosomal recessive manner. Four genes contribute to this genetic disease, ABACBl 1, ATP8B1, ABCB4, and JAGl. Another example of a nuclear genetic based disease is glycogen storage diseases, which can cause hypoglycaemia, hepatomegaly, developmental delay and muscle cramps. Early diagnosis with proper treatment can greatly improve the quality of life, reduce organ damage, and extend a subjects life span.
[0042] Additionally, while most mitochondrial diseases result from mutations in mitochondrial DNA (mtDNA), some are caused by mutations in nuclear genes that encode proteins which function in mitochondria. 80-95% of patients with clinically suspected primary mitochondrial disease do not harbour a pathogenic mutation in the initial screen of the mtDNA. These cases are normally further tested for mutations in nuclear-encoded mitochondrial genes that are associated with distinct clinical phenotypes. For example, POLG, DGUOK, MPV17, and ClOorfl mutations have been observed with mtDNA depletion and hepatoencephalopathy. TYMP mutations are associated with Mitochondrial NeuroGastro Intestinal Encephalomyopathy (MNGIE). TK2, SUCLA2, SUCLG1, and RRM2B have been observed with mtDNA depletion and encephalomyopathy. Myopathy with elevated creatine kinase is a frequent feature of TK2 deficiency. Correctly selecting a group of genes to be analyzed can maximize the chance of successfully obtaining a molecular diagnosis. However, due to the clinical heterogeneity of the mitochondrial diseases, many affected individuals do not fit into one particular category which becomes a challenge for clinicians to select the candidate genes. Clinical Deep Sequencing with Subject Muliplexing
[0043] For a diagnostic laboratory, one very important factor is the cost of the test. Due to the high throughput and high capacity of the MPS instruments and the reagent cost of a run, it is often desirable to pool samples from different individuals together to be analyzed in a single lane (Multiplex sequencing). When tens to hundreds of samples are pooled and sequenced simultaneously, the cost for analysis is greatly reduced, but the likelihood of sample mixed-up and errors in demultiplexing (assigning a sequence to a subject) is also increased. Embodiments of the invention include additional quality checks to insure the correct demultiplexing of subjects.
Deep Coverage
Deep coverage allows accurate base calls and quantification ofmtDNA heteroplasmy
[0044] For the identification of mutations in nuclear genes, coverage of greater than 30X sequence reads would usually be considered adequate for making homozygous or heterozygous base calls and the detection of small indel variations for research purposes (20, 21). However, taking 30X coverage as cutoff for nuclear DNA commonly leaves about 10-15% regions not well-covered. The not well-covered regions or no-covered regions will lead to false negative result, thus, compromising the sensitivity of the clinical test. As for the low coverage regions, the basecalling and detection of indel variations will not be completely accurate with low coverage, which would produce more false positive results. The sequencing errors from the specific platforms make it unrealistic to claim the high sensitivity of a test if the coverage is low.
[0045] Coverage depth of greater than about 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 fold may be used for the nuclear genome. As for mtDNA, the accuracy of calling an mtDNA pathogenic mutation and its mutant load (mutation heteroplasmy) has direct impact in the interpretation and any correlations with disease management, clinical outcomes, and genetic counseling. Therefore, in an embodiment of the invention, a much higher coverage is recommended in a diagnostic laboratory. A coverage depth of greater than about 10,000, 20,000, 30,000, 40,000, 50,000, or 100,000 fold may be used for the mitochondrial genome. Quantification of heteroplasmy
[0046] At 20,000 fold coverage of mtDNA, 1.0% heteroplasmy corresponds to 200 sequence reads, sufficient for the accurate measurement of mtDNA mutation heteroplasmy. Discrepancies in the degree of heteroplasmy measured by ARMS qPCR and deep sequencing have been observed, although most are consistent. This is most likely due to the fundamental differences in methodologies. ARMS qPCR is based on the discrimination of the 3' end nucleotide of either the forward or the reverse primer for the extension of DNA synthesis (11). The PCR reactions containing the wild type primers or the mutant primer are carried out in different tubes under the same conditions. The reported percentage of heteroplasmy may be influenced by the efficiency of PCR due to different primers. However, with MPS, the primers used for LR-PCR of the wild type and mutant mtDNA molecules may be identical. The primers used for sequencing of the wild type and mutant mtDNA may also be identical. The discrepancy between these two methods may also be due to the presence of mtDNA homologous regions in the human genome (NUMT), which is not amplified by LR-PCR.
[0047] The percentage of mtDNA deletion heteroplasmy calculated from the LR- PCR/MPS method is also not in perfect agreement with that estimated from oligonucleotide array CGH (16) (see Fig. 1). This is likely due to the preferential amplification of smaller molecules resulting in an over-estimation of deletion molecules, translating into a higher apparent heteroplasmy of mtDNA deletion. With deep coverage, examination of the breakpoints sequence should allow the detection of complex mtDNA rearrangements, including multiples deletion and partial duplication.
Deep coverage and limit of detection
[0048] Sanger sequencing analysis will not detect heteroplasmy below- 15% (14). With 100 bp single-end or pair-end sequencing using the Illumina HiSeq instrument and multiplex of 12 indexed samples per lane, the depth of coverage of the whole mitochondrial genome is very high (>60,000X). The deep coverage allows the detection of very low mutation heteroplasmy (Fig. 2, sample #062 with 1.1%) at any of the 16,569 nucleotide positions, with a limit of detection at about 1.33% (Fig. 3). Thus, this deep sequencing method can be easily applied to study somatic mtDNA alterations in any tissues, such as muscle, blood, or tumors. Somatic mtDNA alterations have recently been found in all types of cancers and may play a role in tumorigenesis (5, 22-25). However, relatively few cancers harbor detectable deleterious somatic mtDNA mutations (6, 26, 27). Since increased reactive oxygen species (ROS) production and the resulting oxidative DNA damage is one of the hallmarks of cancer, one non- limiting hypothesis is that there may be numerous random mtDNA mutations in different mtDNA molecules, each at a heteroplasmic level below the detection limit of Sanger sequencing. Nevertheless, the sum of this damage may contribute to tumorigenesis (5). This hypothesis may be studied by using the deep sequencing approach that may reach the detection level equivalent to single molecule analysis.
Quality Control
Internal Quality Control ("InQC")
[0049] Due to the high throughput and high capacity of the MPS instruments and the reagent cost of a run, it is often desirable to pool samples from different patients together to be analyzed in a single run. However, when tens to hundreds of samples are pooled and sequenced simultaneously, the likelihood of sample mixed-up during the experiment setup stage which is carried out by laboratory technicians and downstream result analysis is also increased. Thus, an internal identity control system ("InQC") was designed and incorporated into the analysis of each sample. The InQC comprises genotyping each sample prior to pooling samples and then comparing each of the demultiplexed sequences to the genotype to insure the final sequence contains the same genotype of the original sample.
[0050] Ensuring that every indexed sample is scored and demultiplexed correctly is important for a diagnostic laboratory. As an example, a set of 14 nuclear polymorphic markers may be genotyped for each sample before further preparation for MPS. These polymorphic markers may then be amplified in a single multiplex PCR and the resulting DNA fragments mixed with LR-PCR-enriched DNA fragments from the same individual for indexing and MPS. The markers tested before and after MPS should match, a good quality control in a clinical diagnostic setting, particularly when patients' samples are mixed for analyses using complex procedures. InQC is platform independent. That is, this quality control may be used in any NGS sequencing protocols or systems. Examples of 14 polymorphic markers are found in the table below.
Figure imgf000016_0001
[0052] In one embodiment of the invention, the nuclear polymorphic markers are chosen based on their random distribution throughout the nuclear DNA. The nuclear
polymorphic markers may be chosen from published forensic lists, such as the list from Forensic Science International 149 (2005) 279-286. Additionally, the primer areas may be sequenced or genotyped prior to deep sequencing to be sure that there are no SNPs in the primer area. If there are SNPs in the primer area the SNPs will be lost. As such, a different primer may be used if SNPs are found in the original primer location.
External Quality Control ("ExQC")
[0053] To evaluate the instrumental and experimental errors and to determine the limit of detection, DNA fragments with known sequences in various ratios were added to and indexed with each sample as a quality control measure, and processed exactly as the test sample. This approach helps in identifying the source of any errors and in setting the guidelines for the secondary analysis of the clinical samples, including mapping the reads, base calling and calculation of heteroplasmy. The results from Examples 3 and 4 demonstrate that the sequencing errors are largely stochastic, but some are not random. The base line error rate is around 0.3%, with a standard deviation of 0.335%. The inclusion of ExQC in samples may be used in a clinical diagnostic laboratory in order to assure the quality of performance and to determine the limit of detection for quantitative measurements of heteroplasmy, when used in mitochondrial DNA sequencing. [0054] This additional control is used to measure the machine error. Known artificial sequences with different percentages are spiked into the pooled subject samples at different known concentrations. For example, when different sequences fragments A,B,C,D,E..., are combined together at designed proportions, such as, the nucleotide position A is at 10%, nucleotide position B is at 20%, nucleotide position C is at 30 %, nucleotide position D is at 40%, nucleotide position E is at 50%, nucleotide position F is at 60%, nucleotide position G is at 70%, nucleotide position H is at 80% and nucleotide position I is at 90% and nucleotide position J is at 100% of the concentration of sample DNA. These sequences then form a ladder in which to measure machine error. For example, if the machine measures nucleotide position A at 5% instead of 10%, this is an indication that there may be something wrong with the run. In some embodiments of the invention, 3, 4, 5, 6, 7, 8, 9, 10 or more known sequences are used in the ladder. In some embodiments of the invention, a deviation of greater than about 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99% from the ladder is considered to be unacceptable error.
[0055] An example of an external quality control is to use seven synthetic DNA fragments, each 150 bp long with known variant at specific positions. These DNA fragments are amplified and then diluted. The diluted PCR products are pooled together according to the volumes a ExQC solution with specific percentages of wild type and variants, such as 0, 2, 5, 20, 50, 80, and 100 % of wild type. The ExQC DNA is then spiked into the sample DNA pool in order measure machine error. ExQC may be used in nuclear DNA sequencing to determine machine error, but also provides the added benefit of an accurate ladder when determining heteroplasmy in mitochondrial DNA. ExQC is platform independent. That is, this quality control may be used in any NGS sequencing protocols or systems.
Additional Error checking
[0056] Additional Sanger sequences may be done for each subject sample in order to confirm mutations. As another error check, a sample with between 0.5% and 15%
heteroplasmy is added to a machine run in order to be sure that the heteroplasmy is detected. If the heteroplasmy of the known sample is not detected, all samples run may be considered unacceptable for clinical diagnostics. In one embodiment of the invention the known sample has about 1% to 5% heteroplasmy. In a specific embodiment of the invention, the known sample has about 1% heteroplasmy. Deep Sequencing with Long Range PCR ("LR-PCR")
Advantages ofLR- PCR of the entire mitochondrial genome
[0057] The MITOMAP database
(www.mitomap.org/bin/view.pl/MITOMAP/PseudogeneList) has documented an extensive list of mtDNA homologous regions in the human genome (NUMT). Approximately 650 distinctive regions in the human genome have greater than 60% sequence similarity to mtDNA. The average sequence similarity in these regions is 80% + 7.4%, and the average length of homologous fragments is 944bp + 1569bp (19). Although the nuclear gene sequences can be removed by filtering the input set of sequence reads with intermediate homologous sequences, it is difficult to filter those nuclear gene sequences that are nearly identical to mtDNA, such as the 9p24.3, 12ql 1, and Ypl 1.3 regions. These NUMTs may contribute to the variable coverage distribution for capture-based methods. Thus, in order to improve the poorly covered regions, the depth of some of the well-covered regions may be vastly over-represented. This uneven coverage throughout the mitochondrial genome makes the detection of large deletions difficult. The presence of NMUT in the human genome not only interferes with the base calls, but also with the quantification of mutation heteroplasmy, which is another critical factor in the diagnosis of mtDNA disorders.
[0058] The multiplex PCR-based target gene enrichment can efficiently cover the entire mtDNA genome with overlapping PCR fragments, which may also be used with nuclear DNA amplifaction. However, due to the highly polymorphic nature of the mtDNA, the use of multiple pairs of primers increases the chance of a rare or novel variant at the primer binding sites, resulting in differential PCR efficiency among different amplicons and making the detection of large deletions almost impossible. LR-PCR, using one non-overlapping primer, is able to provide uniform coverage throughout the entire mitochondrial genome and, thus, allows for easy analysis of mtDNA large deletions (Fig. 4). Depending on the GC content, the probe density, and the 3 dimensional structure of DNA, the efficiency of capture in different regions of DNA differs widely, which may result in a very uneven coverage of the mitochondrial genome (Fig. 5). Due to the presence of NUMT in the human genome, target gene capture may also enrich the unwanted NUMT. These problems are solved by using only one pair of primers directed at a conserved region near the mtDNA origin of replication. If there are SNPs in one of the primer pairs affecting the LR-PCR, there will be insufficient PCR product. If this is the case, an alternate pair of primers is used. Thus, in contrast to the target gene capture and PCR enrichment methods, LR-PCR amplification of the whole mitochondrial genome eliminates the interference of NUMT sequences and provides a uniform coverage of the entire mitochondrial genome for deletion detection.
[0059] In one embodiment of the invention, deep coverage is achieved by adjusting the number of the multiplexed samples to be sequenced in one lane of Illumina flowcell according to the capacity of the instrument and sequencing chemistry. At least 10 million reads are expected to achieve for each sample to guarantee 20,000X~30,000X average coverage for the whole mitochondria genome. Nuclear DNA may also be amplified by methods other than LR- PCR, such as being amplified by PCR amplification in solution capture.
Elements of a Deep Sequencing Run
[0060] Fig. 6 represents an example of the work flow of a deep sequencing run. A sample is first obtained from a subject and information on the sample and the subject may be entered into a database. The sample may be from any type of cell or tissue, or a mixture of cells or tissues. For example, the sample may be from blood, muscle, liver, cultured cells, buccal swab, tumor sample, urine, and hair. The sample is then processed to extract the DNA of interest. The sample may also be separated so that two or more regions of interested are extracted separately. For example, the DNA of interest may be mitochondrial DNA, nuclear DNA, or a combination thereof. Nuclear DNA of interest may come from only one
chromosome, or may come from multiple chromosomes. The DNA of interest may come from only one region, or may come from multiple regions. The DNA of interest is then quantified by spectrophotometer, for example by NanoDrop. If necessary, the DNA may be separated into two or more different samples of the same subjects DNA.
[0061] With whole mitochondrial deep sequencing LR-PCR is done on one sample in order to amplify all mitochondrial DNA interest. Multiplex PCR may be done on the second sample in order to amplify nuclear DNA regions of interest. Genotyping may be done on a third sample as part of the InQC. The LR-PCR and Multiplex PCR are then mixed together and the ExQC containing known sequences with known concentrations is spiked into the mix. The mix is then fragmented, such as by ultrasound or by restriction enzyme digest. The samples are then quality controlled by Bioanalyzer, a library is generated, then the sample is quality controlled again by Bioanalyzer and then processed with qPCR. At this point multiple subjects' samples may be mixed together forming a multiplex sequencing sample. After which the samples are sequenced using NGS on a machine such as an illumina sequencer, Roche's 454 by pyro- sequencing, ABI SOLiD, Ion Torrent sequencer, Helicos Helioscope, or Pacific Bioscience's single molecular real time (SMRT) instrument. The sequence data is collected and the DNA is demultiplexed and matched to a subject.
[0062] At this point the quality control measures are used to determine if a run is accurate. The InQC from the multiple PCR nuclear DNA sample should match the nuclear DNA sequence. That is, the genotypes of the nuclear DNA for a subject should match the genotypes of the sequenced DNA for the nuclear DNA. Additionally, genotypes of the mitochondrial DNA for a subject should match the mitochondrial sequenced DNA of the subject. The percentages of ExQC should be similar to the percentages of ExQC that were added to the sample prior to fragmentation and sequencing. As an added check, Sanger sequencing may be performed to verify specific base calls.
[0063] The procedure of generating ExQC is the same between mitochondria DNA and nuclear DNA based runs. However, InQC may be generated differently. For mitochondria genome sequencing by MPS, InQC may be done by multiplex PCR and spiked in with LR-PCR products for library preparation. Whereas in nuclear DNA based MPS tests, probes for InQC regions are incorporated in the library and therefore InQC regions are captured simultaneously with other target regions during hybridization, for example.
Deep sequencing index
[0064] A "deep sequencing index" (DSI) is used evaluate the end performance of a sequencing run and to compare the quality of sequencing results among different gene enrichment methods (example shown in Fig. 4A and Fig. 7). This equation contains at least three of the following parameters, each with an empirically assigned weight. The parameters consist of i) the average number of reads of ExQC DNA, ii) the average number of sample reads normalized to the average number of reads of ExQC DNA, iii) the correlation coefficient of the expected versus observed values of ExQC DNA heteroplasmy, iv) the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, v) the specificity and the sensitivity of the run determined from the reads mapped to mtDNA, vi) the number of unmapped reads generated from the same run, vii) the average read for each nucleotide, viii) the number of unmapped nucleotides, and vii) the percentage of genotypes corresponding correctly to the sequence. The weighting can be empirically determined depending on the emphasis of experiments. In one embodiment, the DSI is defined as the sum of at least three of these weighted parameters normalized to the average DSI from previous runs. Thus, the DSI may be constantly monitored as the number of experiments increases, and quality of each experiment is always evaluated and compared to the average quality of all previous experiments. The DSI is useful in comparing the quality of performance among different platforms, laboratories, methods, runs, and even different technicians.
[0065] The depth of coverage is a parameter in one embodiment of the calculation of DSI. The deeper the coverage the higher the DSI in this embodiment. Thus, the coverage depth will be reflected in the quality measurement. Deep coverage also provides an unequivocal identification of mtDNA deletion breakpoints and the degree of heteroplasmy of any deletion (Fig. 3). In one embodiment, a DSI of greater than 80 is an indication of a quality sequence. Fig. 7 is an example of a DSI using six DSI parameters. Example standards for an acceptable run is an average coverage depth > 10,000, an average coverage uniformity with a standard deviation of less than about 20%, coverage on all bases in the region of interest, the linearity of
heteroplasmy as given by the ExQC correlation coefficient being greater than 95%, and a specific numeric value for the DSI for which to measure acceptable runs, for example.
Utility of the deep sequence index (DSI)
[0066] Various platforms for MPS and target gene enrichment methods have been used in different clinical diagnostic laboratories. Data quality may vary among laboratories and there are currently no standard guidelines for a performance comparison. Thus, it is important to develop a uniform approach to assess the performance of high throughput MPS in various laboratories. Such an approach will provide standardized evaluations to compare inter- laboratory results and intra-laboratory assessments. The DSI formula contains at least three parameters listed above for the evaluation of a sequence run and is a useful indicator to assure quality performance.
[0067] The results shown in Example 3, below, demonstrate that LR-PCR of the whole mitochondrial genome followed by deep sequencing may be considered as the new standard for the analysis of the mitochondrial genome. This comprehensive approach makes qualitative and quantitative calls of every nucleotide position of the 16,569 bp mtDNA, it detects large and small mtDNA deletions and insertions, identifying both the breakpoints and the degree of heteroplasmy, and it is cost effective. This novel approach will greatly facilitate the diagnosis of mitochondrial DNA diseases in a timely and cost effective fashion. The deep sequencing method can be adapted to study somatic DNA alterations in cancer or aging tissues, and mosaicism, for example. As an example, the deep sequencing techniques and quality controls can be used in cancer application. Some types of tumors experience mosaicism, and as such, the ability to quantify the percentages of mutations throughout a tumor may be important in diagnosis and treatment.
Kits of the Invention
[0068] Any of the compositions described herein may be comprised in a kit. In a non-limiting example, the DNA for an ExQC are found in a kit. The kits may comprise a suitably aliquoted ExQC DNA and the components of the kits may be packaged either in aqueous media or in lyophilized form. The container means of the kits will generally include at least one vial, test tube, flask, bottle, syringe or other container means, into which a component may be placed, and preferably, suitably aliquoted. Where there are more than one component in the kit, the kit also will generally contain a second, third or other additional container into which the additional components may be separately placed. However, various combinations of components may be comprised in a vial. The kits of the present invention also will typically include a means for containing the ExQC and any other reagent containers in close confinement for commercial sale. Such containers may include injection or blow molded plastic containers into which the desired vials are retained.
EXAMPLE 1
EXEMPLARY WHOLE MITOCHONDRIAL GENOME SEQUENCING
PROTOCOL
• Obtain sample from blood
• Sample quantitation and dilution
• Set up long-range PCR for each sample
• Set up multiplex PCR for each sample to be sequenced (InQC)
• Check LR-PCR product on agarose gel and quantify by Qubit
• Calculate concentration and mix LR-PCR, multiplex PCR and ExQC
• Fragment mixed sample
• Purify fragmented products
• Check fragmented products on Bioanalyzer • Perform end repair and purification on fragments
• Perform "A" tailing and purification
• Ligate adapter and purify
• Check that the adapter is ligated properly on a Bioanalyzer
• Perform lst-enrichment PCR
• Purify lst-enrichment PCR product and perform 2nd enrichment PCR
• Check enriched PCR products on Bioanalyzer
• Quantify samples by setting up regions on Bioanalyzer
• Dilute samples based on Bioanalyzer results
• Set up and run qPCR
• Analyze qPCR results and generate a sample mix table
• Mix twelve samples together
• Run samples on HiSeq2000 for 9 days (1 day for cluster generation and 8 days for 100 pb paired-end sequencing)
• Demultiplex
• Match sequence indices to subject
• Compare genotype from Sanger sequences to subjects demultiplexed sequence
• Calculate DSI
• Set up PCR for Sanger sequence validation (SaQC)
• Report mutation call list
EXAMPLE 2
EXEMPLARY NUCLEAR DNA SEQUENCING PROTOCOL
• Genotype samples
• Sample search and quantitation by NanoDrop
• Fragmentation
• Purify fragment products
• Check fragmented products
• Perform end repair and purification
• "A" tailing and purification
• Ligate adapter and purification
• Check adapter ligated products on Bioanalyzer
• Pre -capture PCR
• Purify Pre -capture PCR product • Check on bioanalyzer
• Set up hybridization
• 47 °C hybridization
• Harvest hybridization product and wash
• Post-capture PCR
• Purify PCR product
• Check on Bioanalyzer
• qPCR
• Calculate sample concentration and pool multiple samples for multiplex sequencing
• Run samples on Hi Seq2000
• Demultiplex sequences
• Compare demultiplexed sequences to genotypes
• Set up PCR for Sanger sequence validation
• Report mutation call list
EXAMPLE 3
EXAMPLE OF WHOLE MITOCHONDIRAL DNA SEQUENCING
[0069] Mitochondrial diseases are clinically and genetically heterogeneous, with variable penetrance, expressivity, and differing age of onset. Disease-causing point mutations and large deletions in the mitochondrial genome often exist in a heteroplasmic state. Current molecular analyses require multiple different and complementary methods for the detection and quantification of mitochondrial DNA mutations. The quantification of heteroplasmy is limited to only a few common mutations.
[0070] The following describes a comprehensive one- step approach to analyze the mitochondrial genome by massively parallel sequencing for clinical diagnostic applications, as disclosed above. The results demonstrate nearly 100% sensitivity and specificity of base calls compared to Sanger sequencing, with quantitative information at each nucleotide position of the 16,569 bp mitochondrial genome. This method is also able to simultaneously detect large deletions, map breakpoints, and quantify deletion heteroplasmy. This "deep" sequencing approach provides a one- step comprehensive molecular analysis of the whole mitochondrial genome for patients in whom a mitochondrial disease is suspected. Patients and DNA extraction
[0071] Patients were referred to the Mitochondrial Diagnostic Laboratory at Medical Genetics Laboratories of Baylor College of Medicine, for the mutational evaluation of mitochondrial disorders. Total DNA was isolated either from peripheral blood lymphocytes or muscle biopsy using a commercially available DNA isolation kit (Gentra Systems Inc.,
Minneapolis, MN) according to the manufacturer's protocols.
Molecular analytical methods
Sanger Sequencing
[0072] The entire mitochondrial genome was amplified using 24 pairs of sequence specific overlapping primers (28-30). Sequencing reactions were performed on purified PCR products using the BigDye Terminator Cycle Sequencing kit, and analyzed on an ABI3730XL automated DNA sequencer. Sequences were analyzed using Mutation Surveyor version 3.20. GenBank sequence NC_012920.1 was used as reference sequence for the mitochondrial genome (28-30).
Quantification of heteroplasmy
[0073] Common point mutations were quantified using ARMS qPCR according to published protocols (11, 12). Large mtDNA deletions were detected by Southern blot or targeted oligonucleotide array CGH and deletion breakpoints and heteroplasmy was determined accordingly (16) . Deletion breakpoints were confirmed by PCR across the breakpoints followed by Sanger sequence analysis.
Selection of DNA samples for massively parallel sequencing analysis
[0074] DNA samples that were evaluated by conventional methods listed above with or without positive mutations were analyzed on Illumina HiSeq 2000 platform. Eighteen samples with known deleterious point mutations identified by Sanger or ASO (allele specific oligonucleotide) dot blot screening (31) and quantified by ARMS qPCR (11) were verified by HiSeq sequencing. Eleven samples without any identified deleterious mutations by Sanger sequencing were also analyzed by this method. In addition, 4 samples with known mtDNA deletions were included. Target gene enrichment for Next Generation Sequence (NGS) Analysis
Long Range PCR (LR PCR)
[0075] The whole mitochondria genome (16569 bp) was amplified with a single pair of primers in the D-loop region, which is free of deletion events reported so far in mtDB database (32) (Fig. 9). The forward and reverse primers are (-): mtl6426F- 5'ccgcacaagagtgct actctcctc3' (SEQ ID NO: l) and mtl6425R- 5'gatattgatttcacggaggatggtg3' (SEQ ID NO:2). PCR was performed using TaKaRa LA Taq Hot Start polymerase kit (TaKaRa Bio Inc., Madison, WI, USA) and 100 ng of total genome DNA isolated from blood or 15 ng from skeleton muscle as template in a 50 μΐ PCR system. PCR product was analyzed on a 1.5% agarose gel with a 1 kb plus DNA ladder (Invitrogen, Carlsbad, CA, USA).
Multiplex PCR of 14 nuclear marker regions for internal quality controls (InQC)
[0076] Primers (see Fig. 10) were designed to amplify 14 different nuclear gene loci. PCR was performed with Roche FastStart Taq DNA polymerase (Roche Diagnostics, Indianapolis, IN, USA) using 7.5 ng of total genomic DNA template in a 50 μΐ PCR system.
PCR for external quality controls (ExQC)
[0077] External quality controls, ExQCs, are seven synthetic DNA fragments, each 150 bp long with known variant at specific positions. These DNA fragments were cloned into an EcoRV digested pBluescriptll SK(-) vector and amplified with primer pair MrfNGS-F (5 '-3' : gagagtaatctgtgctctggc) (SEQ ID NO:3) and MITNGS-R (5'-3': accgttagcgtggcag) (SEQ ID NO:4). PCR was performed using Roche FastStart Taq DNA polymerase (Roche diagnostics, Indianapolis, IN, USA). For quantitative purposes, the PCR products were brought up to 500 μΐ with TE buffer (10 mM Tris-Cl, pH 7.5, 1 mM EDTA). The diluted PCR products were pooled together according to the volumes listed in Fig. 11 to make 1 ml ExQC solution with 0, 2, 5, 20, 50, 80, and 100 % of wild type and variants 1, 2, 3, 4, 5, and 6 respectively.
DNA template library preparation for Illumina indexed sequencing
[0078] Indexed paired-end DNA libraries were prepared following the Multiplexing Sample Preparation Guide provided by the manufacturer (PE-930-1002, Illumina, CA, USA) with some modifications. Briefly, PCR products were quantified using the Qubit dsDNA HS assay (Invitrogen, Carlsbad, CA, USA). LR-PCR products were fragmented using Covaris S2 (Covaris Inc., Woburn, MA, USA) with default 200 bp program. Fragments were purified with AMPure XP beads (Beckman Coulter Inc., Brea, CA) and the size and distribution of the DNA fragments were evaluated by using BioAnalyzer (Agilent, Technologies, Foster City, CA, USA). After end repair, 3'-adenylation, and Illumina InPE adapter ligation, DNA samples were enriched by two-step PCR using Herculase II polymerase (Agilent Technologies, Foster City, CA, USA). Five μΐ of adapter- ligated products were amplified by Illumina InPEl.O and InPE2.0 primers in the first-enrichment PCR. One μΐ of purified PCR product was used as template in the second-enrichment PCR amplified with 2nd-PCR-Fl (5 '-3': aatgatacggcgacc (SEQ ID NO:5) and Illumina indexing primer #1 to #12. Twelve indexed DNA libraries were pooled together with equal molar ratio at 10 nM final concentration. Each pooled library was sequenced in a single lane of one flow cell on HiSeq2000 (Illumina Inc., San Diego, CA, USA) with 100 bp paired end (PE) or single end (SE) read chemistry.
Data filtering and Analysis
[0079] After de-multiplexing with Illumina CASAVA software the reads belonging to one index were filtered to remove any reads with median quality score below 25 by
NextGENe software (SoftGenetics, State College, PA). Reads with one unknown assigned base call were also removed. The data were processed with mutation filter percentages of 20% and 1%. The aligned reads were examined with NextGENe Viewer. The degree of deletion heteroplasmy is calculated based on the segmental average read depth of the deleted region and undeleted region.
Deep coverage allows accurate quantitative base calls
[0080] In order to examine each nucleotide of the entire 16.6 kb mitochondrial genome, a long range PCR (LR-PCR) with a set of non- overlapping back-to-back primers was used to amplify the entire mitochondrial genome, providing uniform coverage and sufficient depth for the quantification of heteroplasmy. An m. l630A>G (tRNA Val) variant was detected by MPS at a level of 3.7% heteroplasmy, but was not detected by Sanger sequencing (Figs. 12- 15) in the asymptomatic mother of a 2-year old affected child who harbored 33% heteroplasmy for the same variant. Studies of matrilineal family members and clinical correlation revealed that the m.l630A>G co-segregated with the disease and was likely to be a causative mutation. The detection of low level heteroplasmy of the m.1630A>G mutation in the mother also implies a higher recurrence risk than a simplex case. Fig. 16 shows the results of base calls for 12 samples analyzed by MPS compared to those of Sanger sequencing. Both methods identified concordant nucleotide changes with the exception of three heteroplasmic calls: the low heteroplasmy m. l630A>G described above, an m. l6194insC variant present at 15% heteroplasmy detected by MPS but missed by Sanger sequencing, and a high heteroplasmy of m.303insC/CC identified by MPS, but identified as homoplasmic by Sanger sequencing. Small insertions and deletions within a homopolymeric stretch, such as m.303_309insC, m.311_315insC and m. l6194insC, are also accurately identified by MPS. The MPS method not only detects all the changes detected by Sanger sequencing, but also identifies low heteroplasmic changes that are missed by the Sanger method (Fig. 16). In addition, the MPS method provides quantitative information for each base call.
Detection of mtDNA large deletion, its breakpoint and heteroplasmy
[0081] Amplification of the entire mitochondrial genome by LR-PCR using a single pair of primers provides uniform coverage and allows for easy detection of large deletions (Fig. 4A). Large mtDNA deletions with deletion breakpoints are clearly shown. The degree of deletion heteroplasmy can be calculated from the ratio of the average coverage at the deleted region to the average coverage at the non-deleted region. Several samples with known mtDNA deletions have been analyzed by LR-PCR, followed by deep sequencing (Fig. 4B). The results of deletion junctions, size of deletion, and degree of heteroplasmy detected by MPS are consistent with the results obtained from target array CGH/ Southern blotting (16, 18) for each sample tested (Fig. 4C). Enhanced sensitivity was observed for detecting large mtDNA deletions, which is likely due to the preferential amplification of the shorter fragment.
Quantification of heteroplasmic point mutations by MPS
[0082] The advantage of deep sequencing is the ability to accurately quantify each nucleotide. Samples with known heteroplasmic mutations; m.3243A>G, m.8993T>C, m. l l778G>A, and m. l4484T>C, m. 13513G>A, and m. l0191T>C were evaluated by deep sequencing. Fig. 2 shows the different levels of heteroplasmic point mutations determined by MPS across the mtDNA genome. In general, the results from MPS are in good agreement with those obtained using amplification refractory mutation system (ARMS) qPCR method (12). The deviations between the two methods are likely due to the intrinsic differences in the design of the methodologies. Due to the presence of a large number of nuclear mitochondrial DNA segments (NUMT or mtDNA pseudogenes, see discussion), the MPS method is believed to more accurately reflect the true mtDNA heteroplasmic status.
Assessment of analytical error, limit of detection and reproducibility
[0083] To assure that the quantification of heteroplasmy by multiplex MPS is reliable a set of cloned synthetic 150 bp control DNAs was spiked into each indexed sample as external quality controls ("ExQC"), as described above. To mimic a range of heteroplasmy different proportions of control DNAs with different nucleotide changes at specific positions were mixed to form a series of mixtures. A single QC DNA mixture contained different proportions (similar to 1, 5, 20, and 50% heteroplasmy) at a specific nucleotide position prior to sample spiking. To evaluate the reliability of quantitative measurements, the spiked control DNA was indexed together with each individual sample with the same barcode during sample preparation, according to manufacturer's instructions.
[0084] The control DNA sequences, except the nucleotide positions marked for heteroplasmy measurements, were used to calculate error rates. The instrument sequencing error was determined by the number of incorrect nucleotide reads of the control DNA sequences and the total number of nucleotide reads mapped to the control DNA sequences under the analytical setting. The experimental errors, defined as overall errors of a sample from sample preparation to sequence results, were calculated in a similar manner, except that the reference was the mtDNA sequence. The error rates for the indexed control DNAs and the 12 indexed samples are depicted in Fig. 3A. The sample DNA has an analytical error rate of 0.326+/-0.335%, as compared to 0.151+/-0.394% for the control DNAs. Thus, the limit of detection is calculated to be 1.33% under the current experimental, instrument, and analytical setups. As shown in Fig. 3B, the observed and the expected percentages of the variants at specific positions exhibit an excellent correlation. These results demonstrate that deep coverage MPS can provide reliable quantitative results.
[0085] An admixed sample of mitochondrial haplogroups H and J (Fig. 17) were analyzed in two independent Illumina runs, and the levels of heteroplasmy in these two runs were in excellent agreement; the Pearson's correlation coefficient for the two runs are 0.998 with p value <0.001, (Fig. 17). The quantitative reproducibility of this method has been demonstrated in multiple samples. [0086] As listed in Fig. 2, discrepancies in the degree of heteroplasmy measured by ARMS qPCR and deep sequencing are observed, although most of them are consistent. The percentage of mtDNA deletion heteroplasmy calculated from the LR-PCR/MPS method is also not in perfect agreement with that estimated from oligonucleotide array CGH (16) (see Fig. 1).
Performance evaluation by "deep sequencing index" (DSI)
[0087] Using the DSI described above, the quality of MPS performance among three different gene enrichment methods was evaluated. As shown in Fig. 5, panel B represents the result of gene enrichment using SureSelect solution-based sequence capture, with RNA probes showing an uneven coverage from as low as 1,000X to as high as 50,000x throughout the mtDNA genome. By mixing 24 overlapping PCR products of the mtDNA, followed by deep sequencing, the coverage is uniform within each PCR fragment itself but varies from about 5,000X to about 40,000X among different PCR fragments, with overlapping regions covered excessively at >80,000X (Fig. 5C). In contrast, Fig. 5D shows the uniform coverage throughout the entire mitochondrial genome when the mtDNA genome was enriched by a single LR-PCR amplification. Assuming that each of the 6 parameters is given the weight of 1, 2, 2, -3, 1, and 5, respectively, the DSI, for SureSelect capture-based, 24 overlapping PCR fragments, and LR- PCR gene enrichment methods was calculated to be about 36, 74, and 90, respectively. If, in addition to detecting point mutations, the detection of a large mtDNA deletion is required, SureSelect and 24 PCR mixtures are not satisfactory methods due to uneven coverage of different regions, which makes the detection of large deletions unreliable unless sophisticated analytical algorithms are developed. Amplification of the entire mitochondrial genome by a single LR-PCR method allows easy detection and quantification of the heteroplasmic mtDNA deletion (Fig. 4). This performance evaluation clearly demonstrates that the LR-PCR based method is superior to other methods for the preparation of mitochondrial genome samples for MPS.
Safeguard of indexed samples
[0088] A set of 14 nuclear polymorphic markers was genotyped for each sample before further preparation for MPS. Primer sets used in the production of InQC are shown in Fig. 10. The polymorphic markers were amplified in a single multiplex and the resulting DNA fragments were mixed with the LR-PCR-enriched mtDNA fragments from the same individual for indexing and MPS. The markers tested before and after MPS need to match, an important quality control absolutely required in a clinical diagnostic setting, particularly when patients' samples are mixed for analyses using complex procedures.
EXAMPLE 4
NUCLEAR DNA DEEP SEQUENCING - DEPLETION PANEL
[0089] Mitochondrial diseases result from dysfunction of the mitochondrial respiratory chain. It can be caused by mutations in mitochondrial DNA (mtDNA) or in nuclear genes that encode proteins function in mitochondria. 80-95% of patients with clinically suspected primary mitochondrial disease do not harbor a pathogenic mutation in the initial screen of the mtDNA. These cases are normally further tested for mutations in nuclear-encoded mitochondrial genes that are associated with distinct clinical phenotypes. For example, POLG, DGUOK, MPV17, and ClOorfl mutations have been observed with mtDNA depletion and hepatoencephalopathy. TYMP mutations are associated with Mitochondrial
NeuroGastroIntestinal Encephalomyopathy (MNGIE). TK2, SUCLA2, SUCLG1, and RRM2B have been observed with mtDNA depletion and encephalomyopathy. Myopathy with
elevated creatine kinase is a frequent feature of TK2 deficiency. Correctly selecting a group of genes to be analyzed can maximize the chance of successfully obtaining a molecular diagnosis. However, due to the clinical heterogeneity of the mitochondrial diseases, many affected individuals do not fit into one particular category which becomes a challenge for clinicians to select the candidate genes.
[0090] The Depletion Panel is a panel that may be performed using the deep sequencing technique described above. It contains 14 nuclear genes (C10ORF2, DGUOK, MPV17, OPA1, OP A3, POLG, POLG2, RRM2B, SLC25A4, SUCLA2, SUCLG1, SUCLG2, TK2 and TYMP) that are involved in the maintenance of mtDNA integrity and deoxynucleotide salvage pathway. These genes are analyzed by the "deep sequencing technique" by the application of Massive Parallel Sequencing (MPS) utility to the clinical diagnosis. The results demonstrate that all of the targeted regions are fully covered with at least 100X coverage. The mutations called by MPS have 100% concordance with the list generated by Sanger sequencing. For quality assurance, proper qualitative and quantitative controls were instituted to be analyzed along with each sample. The controls allow the determination of experimental errors which provide the estimation of limit of detection. Table 1 below shows the results of a depletion-panel test.
Figure imgf000032_0001
EXAMPLE 5
NUCLEAR DNA DEEP SEQUENCING - PFIC
[0093] The Progressive Familial Intrahepatic Cholestasis (PFIC) panel is another example of a nuclear genetic test which is prepared with the above deep sequencing technique. PFIC is a group of genetic disorders which result in disruption of bile formation and flow, and are inherited in an autosomal recessive manner. The estimated incidence is 1 per 50,000- 100,000 births. The PFIC panel consisting of four genes, ABCBl l, ATP8B 1, ABCB4, and JAGl offers a single, one-step and convenient test for molecular diagnosis of patients presenting with cholestasis, among other symptoms. By applying the Massive Parallel Sequencing (MPS) technique to the clinical diagnosis, all target regions were simultaneously analyzed from enriched from the human genome. All target regions are fully covered with at least 100X coverage. The results showed 99.99% specificity and sensitivity using Sanger as standard. EXAMPLE 6
NUCLEAR DNA DEEP SEQUENCING - GSD
[0094] Glycogen storage diseases (GSDs) are a group of inherited genetic defects of glycogen metabolism. GSDs are categorized into 14 subtypes, based on the specific enzyme deficiency. Common symptoms include hypoglycemia, hepatomegaly, developmental delay and muscle cramps. The outcome for untreated patients GSDs can be devastating, if early diagnosis is not made. Early diagnosis with proper treatment can greatly improve the quality of life, reduce organ damage, and extend patient's life span. Due to the genetic heterogeneity of GSDs and limited availability of enzyme studies, sequencing one gene at time by Sanger method is expensive and time consuming. Deep sequencing was used to analyze two panels of genes responsible for the liver and the muscle forms of GSD with massively parallel sequencing for effective molecular diagnosis of patients with suspected GSDs. A total of 294 coding exons of 16 genes; GYS2, GYS1, G6PC, SLC37A4, GAA, AGL, GBE1, PYGM, PYGL, PFKM, PHKA2, PHKB, PHKG2, PHKA1, PGAM2,and PGM1 were included. All exons were covered at > 50X with an average coverage of 700X. A total of 7 samples with known mutations were validated. The results demonstrated equal sensitivity and specificity compared to Sanger method. All disease causing mutations were identified correctly. The mutation types include single nucleotide substitution, small deletions and duplications. In addition, a homozygous intragenic deletion involving exons 3-5 of the G6PC gene (GSDIa) and a homozygous deletion of exon 16 in the GBE gene (GSDIV) was also detected. Furthermore, mutations were identified in patients who were previously tested negative in the selected limited number of GSD genes by Sanger sequencing. The GSD panel testing provides a cost effective diagnosis with fast turnaround time for patients with clinical indications and /or biochemical evidence suggesting a GSD.
[0095] Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present invention, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present invention. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.
REFERENCES
All references listed below and throughout are herein incorporated in full.
1. Elliott, H. R., Samuels, D. C, Eden, J. A., Relton, C. L., and Chinnery, P. F. (2008) Pathogenic Mitochondrial DNA Mutations Are Common in the General Population, Am J Hum Genet. 83, 254-260.
2. Haas, R. H., Parikh, S., Falk, M. J., Saneto, R. P., Wolf, N. I., Darin, N., and Cohen, B. H. (2007) Mitochondrial Disease: A Practical Approach for Primary Care Physicians, Pediatrics 120, 1326-1333.
3. Wong, L. J. (2010) Molecular genetics of mitochondrial disorders, Dev Disabil Res Rev 16, 154-162.
4. Wong, L.-J. C, Scaglia, F., Graham, B. H., and Craigen, W. J. (2010) Current molecular diagnostic algorithm for mitochondrial disorders, Mol Genet Metab. 100, 111-117.
5. Tan, D.-J., Bai, R.-K., and Wong, L.-J. C. (2002) Comprehensive Scanning of Somatic Mitochondrial DNA Mutations in Breast Cancer, Cancer Res. 62, 972-976.
6. Tan, D.-J., Chang, J., Chen, W.-L., Agress, L. J., Yeh, K.-T., Wang, B., and Wong, L.-J. C. (2003) Novel heteroplasmic frameshift and missense somatic mitochondrial DNA mutations in oral cancer of betel quid chewers, Genes Chromosomes Cancer. 37, 186-194.
7. Arnold, R. S., Sun, C. Q., Richards, J. C, Grigoriev, G., Coleman, I. M., Nelson, P. S., Hsieh, C.-L., Lee, J. K., Xu, Z., Rogatko, A., Osunkoya, A. O., Zayzafoon, M., Chung, L., and Petros, J. A. (2009) Mitochondrial DNA mutation stimulates prostate cancer growth in bone stromal environment, Prostate. 69, 1-11.
8. Dasgupta, S., Hoque, M. O., Upadhyay, S., and Sidransky, D. (2008) Mitochondrial Cytochrome B Gene Mutation Promotes Tumor Growth in Bladder Cancer, Cancer Res. 68, 700- 706.
9. Procaccio, V., Neckelmann, N., Paquis-Flucklinger, V., Bannwarth, S., Jimenez, R., Davila, A., Poole, J. C, and Wallace, D. C. (2006) Detection of Low Levels of the Mitochondrial tRNALeu(UUR) 3243A>G Mutation in Blood Derived from Patients with Diabetes, Mol Diagn Ther. 10, 381-389.
10. Wallace, D. C. (2010) Mitochondrial DNA mutations in disease and aging, Environ Mol Mutagen. 51, 440-450. 11. Wang, J., Venegas, V., Li, F., and Wong, L.-J. (2011) Analysis of Mitochondrial DNA Point Mutation Heteroplasmy by ARMS Quantitative PCR, John Wiley & Sons, Inc.
12. Bai, R.-K., and Wong, L.-J. C. (2004) Detection and Quantification of Heteroplasmic Mutant Mitochondrial DNA by Real-Time Amplification Refractory Mutation System
Quantitative PCR Analysis: A Single-Step Approach, Clin Chem 50, 996-1001.
13. White, H. E., Durston, V. J., Seller, A., Fratter, C, Harvey, J. F., and Cross, N. C. P. (2005) Accurate Detection and Quantitation of Heteroplasmic Mitochondrial Point Mutations by Pyrosequencing, Genetic Testing 9, 190-199.
14. Rohlin, A., Wernersson, J., Engwall, Y., Wiklund, L., Bjork, J., and Nordling, M. (2009) Parallel sequencing used in detection of mosaic mutations: Comparison with four diagnostic DNA screening techniques, Hum Mutat. 30, 1012-1020.
15. Shanske, S., and Wong, L.-J. C. (2004) Molecular analysis for mitochondrial DNA disorders, Mitochondrion 4, 403-415.
16. Chinault, A. C, Shaw, C. A., Brundage, E. K., Tang, L.-Y., and Wong, L.-J. C. (2009) Application of dual-genome oligonucleotidearray-based comparative genomic hybridization to the molecular diagnosis of mitochondrial DNA deletion and depletion syndromes, Genetics in Medicine 11, 518-526.
17. Wong, L.-J. C, Dimmock, D., Geraghty, M. T., Quan, R., Lichter-Konecki, U., Wang, J., Brundage, E. K., Scaglia, F., and Chinault, A. C. (2008) Utility of Oligonucleotide Array-Based Comparative Genomic Hybridization for Detection of Target Gene Deletions, Clin Chem 54, 1141-1148.
18. Sadikovic, B., Wang, J., El-Hattab, A., Landsverk, M., Douglas, G., Brundage, E. K., Craigen, W. J., Schmitt, E. S., and Wong, L.-J. C. Sequence Homology at the Breakpoint and Clinical Phenotype of Mitochondrial DNA Deletion Syndromes, PLoS ONE 5, e 15687.
19. Ruiz-Pesini, E., Lott, M. T., Procaccio, V., Poole, J. C, Brandon, M. C, Mishmar, D., Yi, C, Kreuziger, J., Baldi, P., and Wallace, D. C. (2006) An enhanced MITOMAP with a global mtDNA mutational phylogeny, Nucleic Acids Res. 35, D823-D828.
20. Ng, S. B., Turner, E. H., Robertson, P. D., Flygare, S. D., Bigham, A. W., Lee, C, Shaffer, T., Wong, M., Bhattacharjee, A., Eichler, E. E., Bamshad, M., Nickerson, D. A., and Shendure, J. (2009) Targeted capture and massively parallel sequencing of 12 human exomes, Nature 461, 272-276.
21. Roach, J. C, Glusman, G., Smit, A. F. A., Huff, C. D., Hubley, R., Shannon, P. T., Rowen, L., Pant, K. P., Goodman, N., Bamshad, M., Shendure, J., Drmanac, R., Jorde, L. B., Hood, L., and Galas, D. J. (2010) Analysis of Genetic Inheritance in a Family Quartet by Whole- Genome Sequencing, Science 328, 636-639. 22. Fliss, M. S., Usadel, H., Caballero, O. v. L., Wu, L., Buta, M. R., Eleff, S. M., Jen, J., and Sidransky, D. (2000) Facile Detection of Mitochondrial DNA Mutations in Tumors and Bodily Fluids, Science 287, 2017-2019.
23. Polyak, K., Li, Y., Zhu, H., Lengauer, C, Willson, J. K. V., Markowitz, S. D., Trush, M. A., Kinzler, K. W., and Vogelstein, B. (1998) Somatic mutations of the mitochondrial genome in human colorectal tumours, Nat Genet 20, 291-293.
24. Wong, L.-J. C, Lueth, M., Li, X.-N., Lau, C. C, and Vogel, H. (2003) Detection of Mitochondrial DNA Mutations in the Tumor and Cerebrospinal Fluid of Medulloblastoma Patients, Cancer Res 63, 3866-3871.
25. Kurtz, A., Lueth, M., Kluwe, L., Zhang, T., Foster, R., Mautner, V.-F., Hartmann, M., Tan, D.-J., Martuza, R. L., Friedrich, R. E., Driever, P. H. L, and Wong, L.-J. C. (2004) Somatic Mitochondrial DNA Mutations in Neurofibromatosis Type 1 -Associated Tumors, Mol Cancer Res. 2, 433-441.
26. Habano, W., Sugai, T., Yoshida, T., and Nakamura, S.-i. (1999) Mitochondrial gene mutation, but not large-scale deletion, is a feature of colorectal carcinomas with mitochondrial microsatellite instability, Int J Cancer. 83, 625-629.
27. Wong, L. J. C, Tan, D. J., Bai, R. K., Yeh, K. T., and Chang, J. (2004) Molecular alterations in mitochondrial DNA of hepatocellular carcinomas: is there a correlation with clinicopathological profile?, J Med Genet. 41, e65.
28. Brautbar, A., Wang, J., Abdenur, J. E., Chang, R. C, Thomas, J. A., Grebe, T. A., Lim, C, Weng, S.-W., Graham, B. H., and Wong, L.-J. (2008) The mitochondrial 13513G>A mutation is associated with Leigh disease phenotypes independent of complex I deficiency in muscle, Mol Genet Metab. 94, 485-490.
29. Wang, J., Brautbar, A., Chan, A. K., Dzwiniel, T., Li, F.-y., Waters, P. J., Graham, B. H., and Wong, L.-J. (2009) Two mtDNA mutations 14487T>C (M63V, ND6) and 12297T>C (tRNA Leu) in a Leigh syndrome family, Mol Genet Metab. 96, 59-65.
30. Ware, S. M., El-Hassan, N., Kahler, S. G., Zhang, Q., Y, W., Miller, E., Wong, B., Spicer, R. L., Craigen, W. J., Kozel, B. A., Grange, D. K., and Wong, L. J. (2009) Infantile cardiomyopathy caused by a mutation in the overlapping region of mitochondrial ATPase 6 and 8 genes, J Med Genet. 46, 308-314.
31. Wong, L.-J. C, and Senadheera, D. (1997) Direct detection of multiple point mutations in mitochondrial DNA, Clin Chem 43, 1857-1861.
32. Ingman, M., and Gyllensten, U. (2006) mtDB: Human Mitochondrial Genome Database, a resource for population genetics and medical sciences, Nucleic Acids Res. 34, D749-D751.

Claims

CLAIMS What is claimed is:
1. A method of quality control comprising:
adding to an unsequenced sample of DNA at least three known sequences, wherein each sequence is at a different known concentration.
2. The method of claim 1, wherein at least five known sequences with different known concentrations are added to the sample.
3. The method of claim 2, wherein at least seven known sequences with different known concentrations are added to the sample.
4. The method of claim 1, further comprising sequencing the DNA sample and comparing the percentages of the known sequences in the sequenced sample to the starting concentration.
5. The method of claim 4, further comprising rejecting the sequence if the correlation coefficient of the expected versus observed values of the concentrations is less than about 98%.
6. The method of claim 1, wherein the known sequences mimic
heteroplasmy.
7. The method of claim 1, wherein the known sequences vary by one nucleotide to a standard sequence.
8. A kit comprising: at least three known sequences of DNA, wherein each sequence is at a different known concentration.
9. The kit of claim 8, comprising at least five different known
sequences of DNA, wherein each sequence is at a different known concentration.
10. The kit of claim 8, wherein the known sequences mimic
heteroplasmy.
11. The kit of claim 8, wherein the known sequences vary by one nucleotide to a standard sequence.
12. A method of quality control comprising:
genotyping at least a first and second samples;
pooling the samples;
sequencing the pooled samples;
demultiplexing the samples; and
comparing the genotype of the first sample to the demultiplexed sequence of the first sample.
13. The method of claim 12, additionally comprising rejecting the demultiplexed sequence if the sequence does not match with at least 50% of the demultiplexed sequence.
14. The method of claim 13, comprising rejecting the demultiplexed sequence if the genotype does not match with at least 90% of the demultiplexed sequence.
15. The method of 14, comprising rejecting the demultiplexed
sequence if the genotype does not match with at least 98% of the demultiplexed sequence.
16. The method of claim 12, comprising rejecting the demultiplexed sequence if the genotype does not match.
17. A method for quality control of sequencing data comprising: receiving at least three parameters corresponding to DNA sequencing, wherein the parameters comprise the average number of reads of external control DNA, the average number of sample reads, the average number of sample reads normalized to the average number of reads of external control, the correlation coefficient of the expected versus observed values of external control, the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, the specificity determined from reads mapped and the sensitivity of reads mapped, or the number of unmapped reads;
determining, using a processor, a weighted summed value based on the received parameters, and
accepting results of the DNA sequencing if the value is over a predetermined number.
18. The method of claim 17, wherein at least six parameters are
received.
19. A system for quality control of sequencing data, the system
comprising a processor in communication with a memory where: the memory stores processor-executable code; and the processor is configured to be operable in conjunction with the processor-executable code to:
receive at least three parameters corresponding to DNA
sequencing, wherein the parameters are the average number of reads of external control DNA, the average number of sample reads, the average number of sample reads normalized to the average number of reads of external control, the correlation coefficient of the expected versus observed values of external control, the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, the specificity determined from reads mapped and the sensitivity of reads mapped, or the number of unmapped reads;
determine a weighted summed value based on the received parameters,
transmit the weighted summed value.
20. A non-transitory computer readable-medium comprising
computer-usable program code executable to perform operations comprising:
receiving at least three parameters corresponding to DNA sequencing, wherein the parameters are the average number of reads of external control DNA, the average number of sample reads, the average number of sample reads normalized to the average number of reads of external control, the correlation coefficient of the expected versus observed values of external control, the ratio of the standard deviation of sequence reads to the average number sequence reads per sample, the specificity determined from reads mapped and the sensitivity of reads mapped, or the number of unmapped reads;
determining a weighted summed value based on the received parameters,
transmitting the weighted summed value.
21. A method comprising :
receiving a plurality of DNA samples;
pooling the samples;
sequencing the sample on a next generation sequencer, wherein the sequencer has been adjusted to provide deep sequencing;
demultiplexing the samples; and
outputting the sequences of the demultiplexed samples.
22. The method of claim 21, wherein the DNA sample comprises mitochondrial DNA and the deep sequencing provides at least 10,000 average reads per nucleotide.
23. The method of claim 21, wherein the DNA sample comprises nuclear DNA and the deep sequencing provides at least 200 fold average reads per nucleotide.
PCT/US2012/029266 2011-03-16 2012-03-15 A method for comprehensive sequence analysis using deep sequencing technology WO2012125848A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201161453317P 2011-03-16 2011-03-16
US61/453,317 2011-03-16
US201261598439P 2012-02-14 2012-02-14
US61/598,439 2012-02-14

Publications (2)

Publication Number Publication Date
WO2012125848A2 true WO2012125848A2 (en) 2012-09-20
WO2012125848A3 WO2012125848A3 (en) 2014-04-24

Family

ID=46831356

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/029266 WO2012125848A2 (en) 2011-03-16 2012-03-15 A method for comprehensive sequence analysis using deep sequencing technology

Country Status (1)

Country Link
WO (1) WO2012125848A2 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015095840A1 (en) 2013-12-20 2015-06-25 Biomed Valley Discoveries, Inc. Cancer treatments using combinations of cdk and erk inhibitors
WO2015095819A2 (en) 2013-12-20 2015-06-25 Biomed Valley Discoveries, Inc. Cancer treatment using combinations of erk and raf inhibitors
WO2015095829A1 (en) 2013-12-20 2015-06-25 Biomed Valley Discoveries, Inc. Cancer treatments using combinations of pi3k/akt pathway and erk inhibitors
WO2015095825A1 (en) 2013-12-20 2015-06-25 Biomed Valley Discoveries, Inc. Cancer treatments using combinations of type 2 mek and erk inhibitors
CN105095686A (en) * 2014-05-15 2015-11-25 中国科学院青岛生物能源与过程研究所 High-flux transcriptome sequencing data quality control method based on multi-core CPU (Central Processing Unit) hardware
WO2016205671A1 (en) * 2015-06-17 2016-12-22 The Trustees Of Columbia University In The City Of New York Deoxynucleoside therapy for diseases caused by unbalanced nucleotide pools including mitochondrial dna depletion syndromes
CN109371119A (en) * 2018-11-12 2019-02-22 北京纳诺基生物医药科技有限公司 Primer pair, kit and its method deleting mutation for detecting human mtdna and exhausting
CN112280873A (en) * 2020-11-23 2021-01-29 江苏省家禽科学研究所 Mitochondrial microsatellite marker primer for pigeon genetic diversity analysis and detection method
CN112397143A (en) * 2020-10-30 2021-02-23 深圳思勤医疗科技有限公司 Method for predicting tumor risk value based on plasma multi-omic multi-dimensional features and artificial intelligence
US11104948B2 (en) 2014-02-10 2021-08-31 Vela Operations Singapore Pte. Ltd. NGS systems control and methods involving the same

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LI ET AL.: 'Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes.' AM J HUM GENET vol. 87, no. 2, 13 August 2010, pages 237 - 249 *
TANG ET AL.: 'Characterization of mitochondrial DNA heteroplasmy using a parallel sequencing system.' BIOTECHNIQUES vol. 48, no. 4, April 2010, pages 287 - 296 *
ZHANG ET AL.: 'Comprehensive 1-Step Molecular Analyses of Mitochondrial Genome by Massively Parallel Sequencing.' CLIN CHEM 09 July 2012, *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10668055B2 (en) 2013-12-20 2020-06-02 Biomed Valley Discoveries, Inc. Cancer treatment using combinations of ERK and RAF inhibitors
WO2015095819A2 (en) 2013-12-20 2015-06-25 Biomed Valley Discoveries, Inc. Cancer treatment using combinations of erk and raf inhibitors
WO2015095829A1 (en) 2013-12-20 2015-06-25 Biomed Valley Discoveries, Inc. Cancer treatments using combinations of pi3k/akt pathway and erk inhibitors
WO2015095825A1 (en) 2013-12-20 2015-06-25 Biomed Valley Discoveries, Inc. Cancer treatments using combinations of type 2 mek and erk inhibitors
EP4062917A1 (en) 2013-12-20 2022-09-28 Biomed Valley Discoveries, Inc. Cancer treatments using combinations of pi3k/akt pathway and erk inhibitors
EP4043017A1 (en) 2013-12-20 2022-08-17 Biomed Valley Discoveries, Inc. Cancer treatment using combinations of erk and raf inhibitors
EP3984537A1 (en) 2013-12-20 2022-04-20 Biomed Valley Discoveries, Inc. Cancer treatments using combinations of type 2 mek and erk inhibitors
WO2015095840A1 (en) 2013-12-20 2015-06-25 Biomed Valley Discoveries, Inc. Cancer treatments using combinations of cdk and erk inhibitors
US11007184B2 (en) 2013-12-20 2021-05-18 Biomed Valley Discoveries, Inc. Cancer treatments using combinations of type 2 MEK and ERK inhibitors
US11007183B2 (en) 2013-12-20 2021-05-18 Biomed Valley Discoveries, Inc. Cancer treatments using combinations of PI3K/Akt pathway and ERK inhibitors
US11104948B2 (en) 2014-02-10 2021-08-31 Vela Operations Singapore Pte. Ltd. NGS systems control and methods involving the same
CN105095686A (en) * 2014-05-15 2015-11-25 中国科学院青岛生物能源与过程研究所 High-flux transcriptome sequencing data quality control method based on multi-core CPU (Central Processing Unit) hardware
RU2721492C2 (en) * 2015-06-17 2020-05-19 Зе Трастис Оф Коламбия Юниверсити Ин Зе Сити Оф Нью-Йорк Deoxynucleoside therapy of diseases caused by unbalanced nucleotide pools, including mitochondrial dna depletion syndromes
US10471087B2 (en) 2015-06-17 2019-11-12 The Trustees Of Columbia University In The City Of New York Deoxynucleoside therapy for diseases caused by unbalanced nucleotide pools including mitochondrial DNA depletion syndromes
EP3505174A1 (en) * 2015-06-17 2019-07-03 The Trustees of Columbia University in the City of New York Deoxynucleoside therapy for diseases caused by unbalanced nucleotide pools including mitochondrial dna depletion syndromes
US11110111B2 (en) 2015-06-17 2021-09-07 The Trustees Of Columbia University In The City Of New York Deoxynucleoside therapy for diseases caused by unbalanced nucleotide pools including mitochondrial DNA depletion syndromes
WO2016205671A1 (en) * 2015-06-17 2016-12-22 The Trustees Of Columbia University In The City Of New York Deoxynucleoside therapy for diseases caused by unbalanced nucleotide pools including mitochondrial dna depletion syndromes
US11666592B2 (en) 2015-06-17 2023-06-06 The Trustees Of Columbia University In The City Of New York Deoxynucleoside therapy for diseases caused by unbalanced nucleotide pools including mitochondrial DNA depletion syndromes
CN109371119A (en) * 2018-11-12 2019-02-22 北京纳诺基生物医药科技有限公司 Primer pair, kit and its method deleting mutation for detecting human mtdna and exhausting
CN112397143A (en) * 2020-10-30 2021-02-23 深圳思勤医疗科技有限公司 Method for predicting tumor risk value based on plasma multi-omic multi-dimensional features and artificial intelligence
CN112397143B (en) * 2020-10-30 2022-06-21 深圳思勤医疗科技有限公司 Method for predicting tumor risk value based on plasma multi-omic multi-dimensional features and artificial intelligence
CN112280873A (en) * 2020-11-23 2021-01-29 江苏省家禽科学研究所 Mitochondrial microsatellite marker primer for pigeon genetic diversity analysis and detection method

Also Published As

Publication number Publication date
WO2012125848A3 (en) 2014-04-24

Similar Documents

Publication Publication Date Title
Mirkov et al. Genetics of inflammatory bowel disease: beyond NOD2
WO2012125848A2 (en) A method for comprehensive sequence analysis using deep sequencing technology
Rodríguez-Santiago et al. Mosaic uniparental disomies and aneuploidies as large structural variants of the human genome
Tang et al. Characterization of mitochondrial DNA heteroplasmy using a parallel sequencing system
CN102329876B (en) Method for measuring nucleotide sequence of disease associated nucleic acid molecules in sample to be detected
Zhang et al. Comprehensive one-step molecular analyses of mitochondrial genome by massively parallel sequencing
Li et al. Replication of TCF4 through association and linkage studies in late-onset Fuchs endothelial corneal dystrophy
EP2851432B1 (en) RCA locus analysis to assess susceptibility to AMD
Huang Next generation sequencing to characterize mitochondrial genomic DNA heteroplasmy
CA2922005A1 (en) Methods and compositions for screening and treating developmental disorders
JP2020524499A (en) Validation method and system for sequence variant calls
AU2017360993A1 (en) Validation methods and systems for sequence variant calls
US20190338350A1 (en) Method, device and kit for detecting fetal genetic mutation
WO2017112738A1 (en) Methods for measuring microsatellite instability
TWI675918B (en) Universal haplotype-based noninvasive prenatal testing for single gene diseases
Ferreira et al. Reproduction and immunity-driven natural selection in the human WFDC locus
US20190161808A1 (en) Method for predicting prognosis of breast cancer patients by using gene deletions
Dueker et al. Analysis of genetic linkage data for Mendelian traits
CN115885346A (en) Methods and systems for detection and phasing of complex genetic variants
CN103374627A (en) Congenital heart defect (CHD)-related gene polycystic kidney disease 1 like l (PKD1L1) and application thereof
US9139881B2 (en) Method for assessing breast cancer susceptibility
CN114514327A (en) Assessment of diffuse glioma and responsiveness to treatment using simultaneous marker detection
WO2015168252A1 (en) Mitochondrial dna copy number as a predictor of frailty, cardiovascular disease, diabetes, and all-cause mortality
WO2014121180A1 (en) Genetic variants in interstitial lung disease subjects
KR20180125911A (en) Method for providing the information for predicting or diagnosing of inflammatory bowel disease using single nucleotide polymorphism to be identified from next generation sequencing screening

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12757759

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 12757759

Country of ref document: EP

Kind code of ref document: A2