WO2010076789A1 - Detection and use of recurrent mutation combination pattern - Google Patents

Detection and use of recurrent mutation combination pattern Download PDF

Info

Publication number
WO2010076789A1
WO2010076789A1 PCT/IL2009/001224 IL2009001224W WO2010076789A1 WO 2010076789 A1 WO2010076789 A1 WO 2010076789A1 IL 2009001224 W IL2009001224 W IL 2009001224W WO 2010076789 A1 WO2010076789 A1 WO 2010076789A1
Authority
WO
WIPO (PCT)
Prior art keywords
cancer
mutations
hnscc
combinations
mitochondrial genome
Prior art date
Application number
PCT/IL2009/001224
Other languages
French (fr)
Inventor
Dan Mishmar
Eitan Rubin
Ilia Zhidkov
Erez Livneh
Original Assignee
Ben Gurion University Of The Negev R&D Authority
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ben Gurion University Of The Negev R&D Authority filed Critical Ben Gurion University Of The Negev R&D Authority
Publication of WO2010076789A1 publication Critical patent/WO2010076789A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/172Haplotypes

Definitions

  • the present invention relates to methods for the detection of recurrent mutation combinations in the mitochondrial genome for tumor characterization and/or confirmation.
  • Mitochondria are the only extra-nuclear organelles harboring their own independent genome, which is strictly maternally inherited in vertebrates. Each mitochondrion contains 2-10 mtDNA copies. Human cells contain 100-100,000 separate copies of mtDNA, with each double stranded mtDNA consisting of ⁇ 16,570 base pairs. The two strands of mtDNA have different nucleotide content: the heavy strand, enriched in guanine, encodes 28 genes, and the light strand, enriched in cytosine, encodes 9 genes.
  • the human mitochondrial genome comprises a circular double-stranded DNA molecule that encodes 13 polypeptides of the respiratory chain, 22 transfer RNAs (tRNAs), and two ribosomal RNAs (rRNAs) required for protein synthesis. Expression of the entire mitochondrial genome is necessary for the maintenance of mitochondrial functions.
  • tRNAs transfer RNAs
  • rRNAs ribosomal RNAs
  • Oxygen radicals can damage or change mtDNA molecules, leading to deletions, rearrangements, or mutations. Deletions and mutations due to free radicals have been associated with the aging process and degenerative diseases.
  • the invention provides a method of confirming the diagnosis of a cancer in a subject with a neoplasm comprising the steps of (a) comparing the mitochondrial genome of an affected tissue isolated from the subject with the mitochondrial genome of a healthy tissue isolated from the same subject; and (b) detecting the presence of recurring combinations of mutations in the mitochondrial genome of the affected tissue; wherein the presence of one or more recurring combinations of mutations in the mitochondrial genome of the affected tissue isolated from the subject indicates that the subject has cancer.
  • the recurring combinations of mutations localize in the inner nodes of the phylogenetic tree.
  • the recurring combinations of mutations comprise a K number of recurring combinations of mutations as defined herein.
  • each recurring combination of mutations comprises at least one homoplasmic mutation.
  • each recurring combination of mutations comprises between two and seven homoplasmic mutations. Even more preferably, each recurring combination of mutations comprises at least six homoplasmic mutations. Most preferably, each recurring combination of mutations comprises at least seven homoplasmic mutations.
  • the recurring combinations of mutations are neutral. In a different aspect of the invention, the recurring combinations of mutations cause the replacement of one or more amino acids in the encoded protein.
  • the methods of the invention allow confirming the diagnosis of a cancer selected from the group consisting of adenocarcinoma and squamous cell carcinoma.
  • the adenocarcinoma may be selected from the group consisting of colorectal cancer, lung cancer, prostate cancer, breast cancer, pancreas cancer, stomach cancer and esophageal cancer.
  • the adenocarcinoma comprises pancreas cancer.
  • the squamous cell carcinoma may be selected from the group consisting of cancer of the skin, head and neck cancer, esophageal cancer, lung cancer, prostate cancer and vaginal cancer.
  • the squamous cell carcinoma comprises head and neck cancer.
  • the invention provides a method for identifying subpopulations of individuals with common combinations of mutations within a defined population comprising: a) comparing the mitochondrial genome of a target tissue in two or more subjects with the mitochondrial genome of a reference tissue in the same subjects; (b) detecting the presence of mutations in the mitochondrial genome of the target tissue; (c) converting the mutations into a list of mutated positions; (d) identifying the positions in the mitochondrial genome that are mutated in two or more subjects; and (e) determining recurrent combinations of mutations in the mitochondrial genome of the target tissue in two or more subjects, wherein the presence of recurring combinations of mutations in the in the mitochondrial genome of the target tissue of two or more subjects indicates that the subjects belong to the same subpopulation.
  • the subjects have a disease. In the most preferred embodiment, the disease is cancer.
  • FIG. 1 is a schematic representation of the distribution of maximal depth of fixation (MDF) values in mtDNA positions harboring cancer mutations (A) and in healthy mtDNA positions (B).
  • MDF maximal depth of fixation
  • A cancer mutations
  • B healthy mtDNA positions
  • An MDF value was calculated for 3328 mtDNA positions that vary in their coding region sequences.
  • a sample of non-cancer positions was used, sampled such that their variability levels precisely matched the variability level of the cancer positions.
  • the p value estimate from a one-sided Mann-Whitney U test was averaged over 1000 independently generated variability matched samples.
  • FIG. 2 shows recurrent tumor-related mutation combinations in head and neck squamous cell carcinoma (HNSCC) and pancreatic cancer (PANC) patients. Tumor-related mutations reported in HNSCC and pancreatic cancer patients were scanned for recurrent mutation combinations, defined as recurrent mutations of at least two positions in two different patients.
  • A Mutations involved in the recurrent mutation combinations. For each tumor (left column), the cancer type and patient identifier are provided. Nucleotide positions of mutations are according to the revised Cambridge reference sequence (GenBank AC_000021.2).
  • COMs recurrent mutation combinations
  • Overlapping COMs are considered to be part of a larger COM group, designated using Greek letters ( ⁇ , ⁇ , ⁇ , ⁇ ).
  • COMs that belong to a certain group are named using a Greek letter in combination with a number.
  • the present invention answers to the need by providing methods for confirming the diagnosis of a cancer in a subject with a neoplasm, which comprise comparing the mitochondrial genome (mtDNA) of an affected tissue isolated from the subject to the mitochondrial genome of a healthy tissue isolated from the same subject, and detecting the presence of recurring combinations of mutations in the mitochondrial genome of the affected tissue, wherein the presence of one or more recurring combinations of mutations in the mitochondrial genome of the affected tissue isolated from the subject indicates that the subject has cancer.
  • mtDNA mitochondrial genome
  • the human mitochondrial genome presents the advantage that it is small (16.5 kb) and, because of its simple uniparental inheritance pattern, its variations are the result of mutation, rather than recombination. Accordingly, the invention conveniently provides fast and reliable methods for characterizing cancer or confirming a cancer diagnosis in a subject with a neoplasm based on whole mitochondrial genome sequences.
  • mitochondria DNA As referenced herein, by “mitochondrial DNA” or “mtDNA” it is intended the DNA located in the mitochondria and consisting of between about 16,500 base pairs.
  • mutation it is intended a change in a specific nucleotide position in the genome of an affected tissue isolated from a subject as compared to the same nucleotide position in the genome of a healthy tissue isolated from the same subject.
  • coding sub-genome of the nuclear genome it is intended all protein-encoding regions in the genome.
  • telomere By “contig” it is meant a contiguous sequence constructed from many sequence fragments, which align in a tile manner to form a longer desired sequence of interest.
  • phylogenetic tree By “phylogenetic tree” it is meant a tree that reflects the phylogenetic relationships of known mtDNA sequences sharing a common ancestor. Each branch in the tree represents a random inherited mutation or polymorphism and each node in the tree represents the most recent common ancestor of the descendants. A leaf node or external node has no children, whereas a non-leaf node or internal node has child nodes. The root node in the tree has no parents. The tip represents the most recent descendant. The depth of a node is the median length of the path from that node to all its descendants.
  • homoplasmic mutation By “homoplasmic mutation” it is meant a mutation affecting all mtDNA copies in the cell.
  • heteroplasmic mutation it is meant a mutation that affects a fraction of mtDNA copies in the cell.
  • haplogroup it is meant a cluster of individuals with a similar spectrum of polymorphisms. The set of polymorphisms of each individual is defined as “haplotype”.
  • maximum parsimony it is meant a method of identifying the potential phylogenetic tree that requires the smallest total number of evolutionary events to explain the observed sequence data.
  • MDF Maximum depth of mutation fixation
  • a non-variable position or a position mutated in the tree tips has an MDF value of zero.
  • the present inventors have developed an efficient and unique method to identify and detect the presence of recurring combinations of mutations in the mitochondrial genome. Specifically, the inventors have made the unexpected discovery that the detection of combinations of recurring mutations in the mitochondrial genome of two or more individuals may be used to determine whether the individuals belong to the same subpopulation within a defined population.
  • the invention provides a method for identifying subpopulations of individuals with common combinations of mutations within a defined population comprising: (a) comparing the mitochondrial genome of a target tissue in two or more subjects with the mitochondrial genome of a reference tissue in the same subjects; (b) detecting the presence of mutations in the mitochondrial genome of the target tissue; (c) converting the mutations into a list of mutated positions; (d) identifying the positions in the mitochondrial genome that are mutated in two or more subjects; and (e) determining recurrent combinations of mutations in the mitochondrial genome of the target tissue in two or more subjects, wherein the presence of recurring combinations of mutations in the target tissue of two or more subjects indicates that the subjects belong to the same subpopulation.
  • the method of the invention is particularly useful in the identification of subpopulations in subjects affected by a disease.
  • One aspect of the present invention is the identification of subpopulations sharing common combinations of mutations among subjects with a neoplasm.
  • the disease is cancer.
  • the invention provides a method of confirming the diagnosis of a cancer in a subject with a neoplasm comprising the steps of (a) comparing the mitochondrial genome of an affected tissue isolated from the subject with the mitochondrial genome of a healthy tissue isolated from the same subject; and (b) detecting the presence of recurring combinations of mutations in the mitochondrial genome of the affected tissue; wherein the presence of one or more recurring combination of mutations in the mitochondrial genome of the affected tissue indicates that the subject has cancer.
  • the recurring combination of mutations may localize in different nodes of a phylogenetic tree.
  • the recurring combinations of mutations localize in the inner nodes of the phylogenetic tree.
  • the recurring combinations of mutations on the phylogenetic tree occur at mtDNA nucleotide positions having a maximal depth of mutation fixation (MDF) value which is significantly higher than the MDF value of non-COM mutations.
  • the recurring combinations of mutations comprise a K number of recurring combinations of mutations as defined herein.
  • the recurring combinations of mutations comprise at least five recurring combinations of mutations.
  • the recurring combinations of mutations comprise at least seven combinations of mutations.
  • the recurring combinations of mutations may comprise heteroplasmic or homoplasmic mutations.
  • each recurring combination of mutations comprises at least one homoplasmic mutation.
  • each recurring combination of mutations comprises between one and seven homoplasmic mutations. Even more preferably, each recurring combination of mutations comprises al least six homoplasmic mutations. Most preferably, each recurring combination of mutations comprises at least seven homoplasmic mutations. [0045] The recurring combinations of mutations may be neutral mutations or mutations that cause the replacement of one or more amino acids in the encoded protein.
  • the methods of the present invention allow characterizing or confirming the diagnosis of a cancer.
  • the cancer may be an adenocarcinoma or a squamous cell carcinoma.
  • the adenocarcinoma may be selected from the group consisting of colonorectal cancer, lung cancer, prostate cancer, breast cancer, pancreas cancer, stomach cancer and esophageal cancer.
  • the adenocarcinoma comprises pancreas cancer.
  • the squamous cell carcinoma may be selected from the group consisting of cancer of the skin, head and neck cancer, esophageal cancer, lung cancer, prostate cancer and vaginal cancer.
  • the squamous cell carcinoma comprises head and neck cancer.
  • NJ Neighbor- Joining
  • NC_001643 Pan troglodytes
  • PHYLIP package distance matrix program
  • DNAPARS [Felsenstein, J., PHYLBP - Phylogeny Inference Package (Version 3.2). Cladistics, 1989. 5: p. 164-166] was used to infer mtDNA sequences at the inner nodes of the NJ phylogenetic tree described above.
  • the DNAPARS program was chosen as it provides a model that minimizes the number of mutational events.
  • the results of this analysis were analyzed using a perl/bioperl script developed for this purpose in our group.
  • MDFCALC that parses the DNAPARS output, identifies mutational events, assigns them to specific branches, and calculates their MDF values. This script first resolves under-determined mutational events, i.e.
  • DNAPARS uses IUPAC codes for multiple nucleotides to represent all possible solutions (i.e. ambiguous bases). These ambiguities are resolved by randomly assigning one of the possible bases to the first mutational event, and adjusting the tree to replace all other IUPAC codes, accordingly.
  • MDF was defined as the depth of the deepest inner node in which a nucleotide position was mutated.
  • the depth of all the nodes in which a tested position mutated was calculated as the median distance from the node to its terminal descendants.
  • the MDF was taken to be the maximal depth for any node in which the position was mutated.
  • COM identification was performed using ComsFinder, a PERL script written for this purpose.
  • ComsFinder takes a table of mutation-individual pairs as input, finds COMs, and reports all the mutations found at COM positions ( Figure 1).
  • Statistical testing is also carried out by ComsFinder, applying the following procedure: The input file was converted to a binary matrix, D, comprising 299 columns over 56 rows, in which the columns represent mtDNA positions that were de novo mutated in cancer and the rows represent the samples constituting our tumor compendium.
  • the value of each cell either 1 or 0, reflects the presence or absence, respectively, of a de novo mutation in the tumor sample.
  • I n ⁇ x (D) and Zc 1112x (D) were calculated, representing the maximal number of positions and the maximal number of samples involved in any COM within D, respectively.
  • a collection of 10 7 randomized matrices (D') was created, in which each matrix d is derived from D by column reshuffling.
  • d iy the Z n ⁇ x Cd 1 ) and k ⁇ ax ⁇ d t ) were calculated as for the un-shuffled matrix.
  • the probability of the null hypothesis was estimated from the fraction of reshuffled matrices, d, for which Z n ⁇ x (D) ⁇ I n ⁇ x (d) or k ⁇ Wi (D) ⁇ k ma (d) .
  • d fraction of reshuffled matrices
  • co-fixation events will increase the estimated rate of individual fixation, and thus possibly the expected rate of co- fixation.
  • the human evolution compendium was created by assembling 2400 publicly available non-identical mtDNA sequences, containing the entire coding region, into a phylogenetic tree and subsequently testing for the most parsimonious pattern of mutations that fits the tree.
  • the literature was scanned for appropriate screens in which such mutations were documented, comparing whole mtDNA sequences from tumors and corresponding healthy tissues.
  • Two mtDNA mutation surveys - one involving Head and Neck Squamous Cell Carcinoma (HNSCC) (Zhou et al. 2007), and another involving pancreatic cancer (PANC) (Kassauei et al. 2006) - were identified as the only studies that had directly compared numerous tumor and adjacent normal tissues over the entire mitochondrial genome, reporting all de novo mutations, including those occurring in known polymorphic sites.
  • HNSCC Head and Neck Squamous Cell Carcinoma
  • PANC pancreatic cancer
  • COM ai involving 7 de novo mutations (8701G, 9540C, 1O398G, 10873C, 12705T, 15301A and 16223T).
  • This COM is the exact mirror image (i.e. it presents mutations in the opposite direction) of the combination of variants defining macro- haplogroup R, a lineage encompassing all western Eurasian mtDNA haplogroups.
  • other COMs also recapitulate major events in human mitochondrial phylogeny.
  • COM ⁇ i perfectly recapitulates those mutations defining haplogroup U4, while COM ⁇ & reconstructs the exact path from haplogroup H to haplogroup V, skipping the intermediate stage observed in the HV cluster.
  • a T A T C G C 7 27 B4a(35/37), B4b(28/29), Bsl(19/24), Bs3(l/1), F(56/56), Hl(75/76),
  • a T G T C G C 1 86 Bs 1(4/24), Bs2(29/38), J(95/98), K(50/70), P(3/l 8), Rs 10(2/2), U2( 1/11),
  • a T G T T G T 1 9 1(17/18), Ns6(2/2)
  • a T A T G C 7Nlb(l/14), N9a(2/35), N9s(2/19), Ns4(l/6), Ns5(l/3)
  • T T 772 A(52/54), Bs2(l/38), C(19/20), Dl(2/2), D2sl(40/42), D2sl 1(2/2), D2sl2(l/1), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(7/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(41/43), D4b(19/19), D4s 10(4/4), D4sl 1(1/1), D4s 12(3/3), D4s 13(1/1), D4s 14(2/2), D4s2(12/12), D4s4(3/3), D4s5(51/53), D4s6(3/4), D4s7(8/8), D4s8(32/33), D4s9(9/10), D5(33/36), Dsl(l/1), Ds2(l/2), E(13/13),
  • G T A G 316Bsl(4/25),Bs2(38/38),D4s2(12/12), 1(31/31), J(130/141),K(84/109),M7a(l/47), Ns3(l/1), Ns6(2/2), P(3/18), RslO(2/2), U2(l/18), U6(l/41), Us2(l/2), Y(5/5)
  • G T C 110 D4a(l/43), J(2/98), K(70/70), Ns4(l/6), R31(l/3), U2(l l/l l), U3(6/6), U7(7/7), U8(3/3), Us 1(3/3), Us2(2/2), Us3(3/3)
  • Table 3 shows sequence variants found in the evolutionary compendium at positions involved in COMs.
  • the different sequence variants found in the evolutionary compendium at the same positions involved in each of the COMs (A-O) are reported.
  • the first 2-5 columns describe the nucleotides bases found at each COM position, with bold base names indicating variants observed in the cancer compendium.
  • the total number of sequences that contain each variant is reported (Occ), as well as the haplogroup assignments of all the sequences carrying this variant.
  • haplogroup the number of sequences found to carry the particular variant and the total number of sequences in the compendium that were assigned to this haplogroup are provided (in parenthesis).
  • COMs naming is as in Figure 2B. * For COMs involving D-loop nucleotide positions, only 1917 sequences for which the D-loop sequence was determined were included in the analysis.
  • haplogroups that overlap with COMs were associated with phenotypes. Cybrids carrying haplogroup N (overlapping with COM a ⁇ ) differed from haplogroup M cybrids in calcium uptake, while haplogroups H (COM c%) and U4 (COM ⁇ ) are associated with high and low sperm motility, respectively (Montiel-Sosa et al. 2006; Ruiz-Pesini et al. 2000). These results support our working hypothesis, suggesting that similar selective forces act on cancer and during human mitochondrial evolution. Moreover, the observation that several COMs occur at positions defining haplogroups associated with pathological phenotypes further suggests that COMs possess functional potential.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides methods for the detection of recurrent combinations of mutations in the mitochondrial genome and their use for tumor characterization and/or confirmation.

Description

DETECTION AND USE OF RECURRENT MUTATION COMBINATION PATTERN
CROSS REFERENCE TO RELATED APPLICATIONS
[001] This application claims the benefit of U.S. Serial No. 61/193.817. filed on December 29, 2008. (and entitled mtDNA Mutation Pattern in Tumors and Human Evolution Are Shaped by Similar Selective Constraints), which is incorporated in its entirety herein by reference.
FIELD OF THE INVENTION
[002] The present invention relates to methods for the detection of recurrent mutation combinations in the mitochondrial genome for tumor characterization and/or confirmation.
BACKGROUND OF THE INVENTION [003] Mitochondria are the only extra-nuclear organelles harboring their own independent genome, which is strictly maternally inherited in vertebrates. Each mitochondrion contains 2-10 mtDNA copies. Human cells contain 100-100,000 separate copies of mtDNA, with each double stranded mtDNA consisting of ~ 16,570 base pairs. The two strands of mtDNA have different nucleotide content: the heavy strand, enriched in guanine, encodes 28 genes, and the light strand, enriched in cytosine, encodes 9 genes. The human mitochondrial genome comprises a circular double-stranded DNA molecule that encodes 13 polypeptides of the respiratory chain, 22 transfer RNAs (tRNAs), and two ribosomal RNAs (rRNAs) required for protein synthesis. Expression of the entire mitochondrial genome is necessary for the maintenance of mitochondrial functions. [004] Although the mitochondrion has some DNA repair capability, the mtDNA is more susceptible to oxidative stress than nuclear DNA. Mutations accumulate to a greater extent in mtDNA than in nuclear DNA, mainly because mitochondria lack histones and harbor poor DNA repair systems.
[005] Oxygen radicals can damage or change mtDNA molecules, leading to deletions, rearrangements, or mutations. Deletions and mutations due to free radicals have been associated with the aging process and degenerative diseases.
[006] Single mtDNA mutations have been associated with primary human cancers, including colorectal, bladder and head and neck cancers, lung tumors, pancreas and hepato-cellular carcinoma. However, while some of the unique mutational patterns of tumors have been studied, little attention has been paid to comparing malignant and normal conditions, and detecting similarities between the two groups. [007] The main hurdle in such a study is the difficulty in performing such a comparison with the nuclear genome, because of the presence of a mosaic of multiple haplo-blocks, each having evolved independently in different lineages. Moreover, the tremendous efforts required for re- sequencing the nuclear genome make it difficult to test whether selection acts preferentially on specific mutations or on combinations of mutations, i.e. on de novo genetic backgrounds. There are currently not enough data to compare the nuclear genomic landscape of cancer cells to that of corresponding normal individuals.
[008] Accordingly, there is an urgent need in the art to devise new approaches that allow comparison of mutation combination patterns between cancer and healthy populations and a non- invasive diagnosis confirmation. The present application satisfies this need.
SUMMARY OF THE INVENTION
[009] It is therefore an object of the invention to provide solutions to the aforementioned deficiencies in the art.
[0010] Further to this object, the invention provides a method of confirming the diagnosis of a cancer in a subject with a neoplasm comprising the steps of (a) comparing the mitochondrial genome of an affected tissue isolated from the subject with the mitochondrial genome of a healthy tissue isolated from the same subject; and (b) detecting the presence of recurring combinations of mutations in the mitochondrial genome of the affected tissue; wherein the presence of one or more recurring combinations of mutations in the mitochondrial genome of the affected tissue isolated from the subject indicates that the subject has cancer.
[0011] In one embodiment, the recurring combinations of mutations localize in the inner nodes of the phylogenetic tree.
[0012] hi a preferred embodiment, the recurring combinations of mutations comprise a K number of recurring combinations of mutations as defined herein. [0013] In a preferred embodiment, each recurring combination of mutations comprises at least one homoplasmic mutation. Preferably, each recurring combination of mutations comprises between two and seven homoplasmic mutations. Even more preferably, each recurring combination of mutations comprises at least six homoplasmic mutations. Most preferably, each recurring combination of mutations comprises at least seven homoplasmic mutations. [0014] In one aspect of the invention, the recurring combinations of mutations are neutral. In a different aspect of the invention, the recurring combinations of mutations cause the replacement of one or more amino acids in the encoded protein. [0015] In one embodiment, the methods of the invention allow confirming the diagnosis of a cancer selected from the group consisting of adenocarcinoma and squamous cell carcinoma. [0016] The adenocarcinoma may be selected from the group consisting of colorectal cancer, lung cancer, prostate cancer, breast cancer, pancreas cancer, stomach cancer and esophageal cancer. In a specific embodiment, the adenocarcinoma comprises pancreas cancer.
[0017] The squamous cell carcinoma may be selected from the group consisting of cancer of the skin, head and neck cancer, esophageal cancer, lung cancer, prostate cancer and vaginal cancer. In a specific embodiment, the squamous cell carcinoma comprises head and neck cancer. [0018] In a different embodiment, the invention provides a method for identifying subpopulations of individuals with common combinations of mutations within a defined population comprising: a) comparing the mitochondrial genome of a target tissue in two or more subjects with the mitochondrial genome of a reference tissue in the same subjects; (b) detecting the presence of mutations in the mitochondrial genome of the target tissue; (c) converting the mutations into a list of mutated positions; (d) identifying the positions in the mitochondrial genome that are mutated in two or more subjects; and (e) determining recurrent combinations of mutations in the mitochondrial genome of the target tissue in two or more subjects, wherein the presence of recurring combinations of mutations in the in the mitochondrial genome of the target tissue of two or more subjects indicates that the subjects belong to the same subpopulation. In a preferred embodiment, the subjects have a disease. In the most preferred embodiment, the disease is cancer.
[0019] The foregoing general description and following brief description of the drawings and the detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed. Other objects, advantages, and novel features will be readily apparent to those skilled in the art from the following detailed description of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:
[0021] Figure 1 is a schematic representation of the distribution of maximal depth of fixation (MDF) values in mtDNA positions harboring cancer mutations (A) and in healthy mtDNA positions (B). An MDF value was calculated for 3328 mtDNA positions that vary in their coding region sequences. To control for differences in the variability levels of the two types of positions, a sample of non-cancer positions was used, sampled such that their variability levels precisely matched the variability level of the cancer positions. For estimating the statistical significant of the bias toward higher MDF values in cancer positions compared as to non-cancer positions ( p ), the p value estimate from a one-sided Mann-Whitney U test was averaged over 1000 independently generated variability matched samples. To ensure the repeated sampling is meaningful, highly variable positions (assigned to 13 or more inner nodes) were ignored, since these could not be matched between the cancer and non-cancer sets. [0022] Figure 2 shows recurrent tumor-related mutation combinations in head and neck squamous cell carcinoma (HNSCC) and pancreatic cancer (PANC) patients. Tumor-related mutations reported in HNSCC and pancreatic cancer patients were scanned for recurrent mutation combinations, defined as recurrent mutations of at least two positions in two different patients. (A) Mutations involved in the recurrent mutation combinations. For each tumor (left column), the cancer type and patient identifier are provided. Nucleotide positions of mutations are according to the revised Cambridge reference sequence (GenBank AC_000021.2). Heteroplasmic mutations are marked with an asterisk (*). (B) A proposed naming convention for recurrent mutation combinations (COMs). Overlapping COMs are considered to be part of a larger COM group, designated using Greek letters (α, β, γ, δ). In particular, COMs that belong to a certain group are named using a Greek letter in combination with a number.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
[0023] The characterization of genomic mutations in human cancer population, as compared to the genome of human natural population, should provide, in theory, a useful diagnostic tool. This comparison, however, is difficult to perform with the nuclear genome, because of the presence of multiple haploblocks and the problems involved in the human genome sequencing. Thus, there is a great need for an effective method for characterizing or confirming the diagnosis of cancer in subject with a neoplasm. The present invention answers to the need by providing methods for confirming the diagnosis of a cancer in a subject with a neoplasm, which comprise comparing the mitochondrial genome (mtDNA) of an affected tissue isolated from the subject to the mitochondrial genome of a healthy tissue isolated from the same subject, and detecting the presence of recurring combinations of mutations in the mitochondrial genome of the affected tissue, wherein the presence of one or more recurring combinations of mutations in the mitochondrial genome of the affected tissue isolated from the subject indicates that the subject has cancer.
[0024] The human mitochondrial genome presents the advantage that it is small (16.5 kb) and, because of its simple uniparental inheritance pattern, its variations are the result of mutation, rather than recombination. Accordingly, the invention conveniently provides fast and reliable methods for characterizing cancer or confirming a cancer diagnosis in a subject with a neoplasm based on whole mitochondrial genome sequences.
DEFINITIONS
[0025] As used in the specification and claims, the forms "a," "an" and "the" include singular as well as plural references unless the context clearly dictates otherwise.
[0026] Further, as used herein, the term "comprising" is intended to mean that the system includes the recited elements, but not excluding others which may be optional. By the phrase
"consisting essentially of it is meant a method that includes the recited elements but exclude other elements that may have an essential significant effect on the performance of the method. "Consisting of shall thus mean excluding more than traces of other elements. Embodiments defined by each of these transition terms are within the scope of this invention.
[0027] As referenced herein, by "mitochondrial DNA" or "mtDNA" it is intended the DNA located in the mitochondria and consisting of between about 16,500 base pairs.
[0028] By "mutation" it is intended a change in a specific nucleotide position in the genome of an affected tissue isolated from a subject as compared to the same nucleotide position in the genome of a healthy tissue isolated from the same subject.
[0029] By "combinations of mutations" or "COMs" it is intended a K number of any position in the genome that is mutated in two or more subjects, wherein K is larger than the expected number of mutated positions in a random shuffling test. [0030] By "common genetic signatures" it is meant combinations of mutations or COMs.
[0031] By "coding sub-genome of the nuclear genome" it is intended all protein-encoding regions in the genome.
[0032] By "contig" it is meant a contiguous sequence constructed from many sequence fragments, which align in a tile manner to form a longer desired sequence of interest. [0033] By "phylogenetic tree" it is meant a tree that reflects the phylogenetic relationships of known mtDNA sequences sharing a common ancestor. Each branch in the tree represents a random inherited mutation or polymorphism and each node in the tree represents the most recent common ancestor of the descendants. A leaf node or external node has no children, whereas a non-leaf node or internal node has child nodes. The root node in the tree has no parents. The tip represents the most recent descendant. The depth of a node is the median length of the path from that node to all its descendants. [0034] By "homoplasmic mutation" it is meant a mutation affecting all mtDNA copies in the cell.
[0035] By "heteroplasmic mutation" it is meant a mutation that affects a fraction of mtDNA copies in the cell. [0036] By "haplogroup" it is meant a cluster of individuals with a similar spectrum of polymorphisms. The set of polymorphisms of each individual is defined as "haplotype".
[0037] By "maximum parsimony" it is meant a method of identifying the potential phylogenetic tree that requires the smallest total number of evolutionary events to explain the observed sequence data. [0038] "Maximal depth of mutation fixation" or "MDF" is defined as the depth of the deepest inner node in which a given nucleotide position is mutated. A non-variable position or a position mutated in the tree tips has an MDF value of zero.
[0039] The present inventors have developed an efficient and unique method to identify and detect the presence of recurring combinations of mutations in the mitochondrial genome. Specifically, the inventors have made the unexpected discovery that the detection of combinations of recurring mutations in the mitochondrial genome of two or more individuals may be used to determine whether the individuals belong to the same subpopulation within a defined population. [0040] Thus, the invention provides a method for identifying subpopulations of individuals with common combinations of mutations within a defined population comprising: (a) comparing the mitochondrial genome of a target tissue in two or more subjects with the mitochondrial genome of a reference tissue in the same subjects; (b) detecting the presence of mutations in the mitochondrial genome of the target tissue; (c) converting the mutations into a list of mutated positions; (d) identifying the positions in the mitochondrial genome that are mutated in two or more subjects; and (e) determining recurrent combinations of mutations in the mitochondrial genome of the target tissue in two or more subjects, wherein the presence of recurring combinations of mutations in the target tissue of two or more subjects indicates that the subjects belong to the same subpopulation. The method of the invention is particularly useful in the identification of subpopulations in subjects affected by a disease. One aspect of the present invention is the identification of subpopulations sharing common combinations of mutations among subjects with a neoplasm. In the most preferred embodiment, the disease is cancer. [0041] In a different aspect, the invention provides a method of confirming the diagnosis of a cancer in a subject with a neoplasm comprising the steps of (a) comparing the mitochondrial genome of an affected tissue isolated from the subject with the mitochondrial genome of a healthy tissue isolated from the same subject; and (b) detecting the presence of recurring combinations of mutations in the mitochondrial genome of the affected tissue; wherein the presence of one or more recurring combination of mutations in the mitochondrial genome of the affected tissue indicates that the subject has cancer. [0042] The recurring combination of mutations may localize in different nodes of a phylogenetic tree. Preferably, the recurring combinations of mutations localize in the inner nodes of the phylogenetic tree. Even more preferably, the recurring combinations of mutations on the phylogenetic tree occur at mtDNA nucleotide positions having a maximal depth of mutation fixation (MDF) value which is significantly higher than the MDF value of non-COM mutations. [0043] The recurring combinations of mutations comprise a K number of recurring combinations of mutations as defined herein. Preferably, the recurring combinations of mutations comprise at least five recurring combinations of mutations. Even more preferably, the recurring combinations of mutations comprise at least seven combinations of mutations. [0044] The recurring combinations of mutations may comprise heteroplasmic or homoplasmic mutations. Preferably, each recurring combination of mutations comprises at least one homoplasmic mutation. More preferably, each recurring combination of mutations comprises between one and seven homoplasmic mutations. Even more preferably, each recurring combination of mutations comprises al least six homoplasmic mutations. Most preferably, each recurring combination of mutations comprises at least seven homoplasmic mutations. [0045] The recurring combinations of mutations may be neutral mutations or mutations that cause the replacement of one or more amino acids in the encoded protein.
[0046] The methods of the present invention allow characterizing or confirming the diagnosis of a cancer. The cancer may be an adenocarcinoma or a squamous cell carcinoma. [0047] The adenocarcinoma may be selected from the group consisting of colonorectal cancer, lung cancer, prostate cancer, breast cancer, pancreas cancer, stomach cancer and esophageal cancer. In a specific embodiment, the adenocarcinoma comprises pancreas cancer. [0048] The squamous cell carcinoma may be selected from the group consisting of cancer of the skin, head and neck cancer, esophageal cancer, lung cancer, prostate cancer and vaginal cancer. In a specific embodiment, the squamous cell carcinoma comprises head and neck cancer. [0049] In the following examples, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well- known methods, procedures, and components have not been described in detail so as not to obscure the present invention.
EXAMPLES
Material and Methods
Sequences and haplotype assignment
[0050] A dataset of 2655 human mtDNA coding-region sequences (-15.5 kb, nucleotide positions 577-16023) was compiled from Mitomaster (Brandon, M.C., et al., MITOMASTER: a bioinformatics tool for the analysis of mitochondrial DNA sequences. Hum Mutat, 2008.) and from in-house generated sequences (Feder et al "Differences in mtDNA haplogroup distribution among three Jewish populations alter susceptibility to T2DM complications", BMC Genomics, 2008). For newly generated sequences, haplogroup assignment was performed as described previously [Torroni, A. and D.C. Wallace, Mitochondrial DNA variation in human populations and implications for detection of mitochondrial DNA mutations of pathological significance. J Bioenerg Biomembr, 1994. 26(3): p. 261-71.]. Upon removing redundant (i.e. 100% identical) sequences (n=245), 2400 sequences were retained. The D-loop region was clipped from sequences for which it was available to ensure equivalent coverage of the mtDNA. Multiple sequence alignment and phylogentic reconstruction [0051] A multiple sequence alignment was generated, after removal of insertions and deletions, followed by the creation of a Neighbor- Joining (NJ) phylogenetic tree using the Pan troglodytes (NC_001643) mtDNA sequence as an outgroup, utilizing the distance matrix program (PHYLIP package) [Felsenstein, J., PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics, 1989. 5: p. 164-166]. Manual inspection of the resulting tree showed no significant deviations from accepted human mtDNA phylogeny at the haplogroups level, and only minor deviations from other published trees [Mishmar, D., et al., Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci U S A, 2003. 100(1): p. 171-6; Ruiz-Pesini, E., et al., Effects of purifying and adaptive selection on regional variation in human mtDNA. Science, 2004. 303(5655): p. 223-6]. Assigning sequence changes to specific tree branches.
[0052] DNAPARS [Felsenstein, J., PHYLBP - Phylogeny Inference Package (Version 3.2). Cladistics, 1989. 5: p. 164-166] was used to infer mtDNA sequences at the inner nodes of the NJ phylogenetic tree described above. The DNAPARS program was chosen as it provides a model that minimizes the number of mutational events. The results of this analysis were analyzed using a perl/bioperl script developed for this purpose in our group. We designed a script, MDFCALC, that parses the DNAPARS output, identifies mutational events, assigns them to specific branches, and calculates their MDF values. This script first resolves under-determined mutational events, i.e. events in which multiple patterns equally explain the underlying sequences, hi these cases, DNAPARS uses IUPAC codes for multiple nucleotides to represent all possible solutions (i.e. ambiguous bases). These ambiguities are resolved by randomly assigning one of the possible bases to the first mutational event, and adjusting the tree to replace all other IUPAC codes, accordingly.
Calculation of the Maximal Depth of Fixation (MDF)
[0053] MDF was defined as the depth of the deepest inner node in which a nucleotide position was mutated. The depth of all the nodes in which a tested position mutated was calculated as the median distance from the node to its terminal descendants. The MDF was taken to be the maximal depth for any node in which the position was mutated. These calculations were carried out by the MDFCALC script described above. Identification of mutational combinations and their statistical analysis
[0054] COM identification was performed using ComsFinder, a PERL script written for this purpose. ComsFinder takes a table of mutation-individual pairs as input, finds COMs, and reports all the mutations found at COM positions (Figure 1). Statistical testing is also carried out by ComsFinder, applying the following procedure: The input file was converted to a binary matrix, D, comprising 299 columns over 56 rows, in which the columns represent mtDNA positions that were de novo mutated in cancer and the rows represent the samples constituting our tumor compendium. The value of each cell, either 1 or 0, reflects the presence or absence, respectively, of a de novo mutation in the tumor sample. From this matrix, In^x (D) and Zc1112x (D) were calculated, representing the maximal number of positions and the maximal number of samples involved in any COM within D, respectively. To test the null hypothesis that /^1x (D) and ^ituix (^) 8TC not larger than expected by chance, assuming independence of fixation events, a collection of 107 randomized matrices (D') was created, in which each matrix d is derived from D by column reshuffling. For each of the reshuffled matrices, diy the Zn^x Cd1 ) and kπax {dt ) were calculated as for the un-shuffled matrix. The probability of the null hypothesis was estimated from the fraction of reshuffled matrices, d, for which Zn^x (D) ≤ In^x (d) or kπWi (D) < kma (d) . As the background fixation rate for each position is directly derived from the data, co-fixation events will increase the estimated rate of individual fixation, and thus possibly the expected rate of co- fixation. [0055] The mutational landscapes of mtDNA in cancer and natural populations were compared using two mutational compendia, one including mutations that occurred during human evolution and the other containing de novo cancer mutations. The human evolution compendium was created by assembling 2400 publicly available non-identical mtDNA sequences, containing the entire coding region, into a phylogenetic tree and subsequently testing for the most parsimonious pattern of mutations that fits the tree. To assemble the de novo cancer mutations compendium, the literature was scanned for appropriate screens in which such mutations were documented, comparing whole mtDNA sequences from tumors and corresponding healthy tissues. Two mtDNA mutation surveys - one involving Head and Neck Squamous Cell Carcinoma (HNSCC) (Zhou et al. 2007), and another involving pancreatic cancer (PANC) (Kassauei et al. 2006) - were identified as the only studies that had directly compared numerous tumor and adjacent normal tissues over the entire mitochondrial genome, reporting all de novo mutations, including those occurring in known polymorphic sites.
[0056] In this study a single cancer compendium of de novo mtDNA mutations, including 299 de novo point mutations identified in 83 tumor-normal tissue pairs from the HNSCC and 15 pairs from the PANC surveys were used (Table 1). Each de novo mutational event is reported in a row; mutations that occurred in the same position in different samples are reported in consecutive rows. The following details are provided for each mutations: Position - the mtDNA nucleotide position in the revised Cambridge Reference Sequence (rCRS : GenBank AC_000021.2) corresponding to the mutated base; sample ED - the sample ID as provided in the source papers; Cancer type - the type of tumor from which the sample was obtained (HNSCC = Head and Neck Squamous Carcinoma; PANC = Pancreatic Cancer); mtDNA locus - the mtDNA locus in which the mutation is located; From - the nucleotide observed in the non-cancerous sample; To - the nucleotide observed in the matching tumor sample. Sample Cancer mtDNA Sample Cancer mtDNA
Position From To Position From To ID type locus ID type locus
12308 Casel4 PANC TRNA A G 15369 1809 HNSCC CytB C A
12308 1836 HNSCC tRNALeu A G 15409 Case8 PANC CYTB C G
12308 2043 HNSCC tRNALeu A G 15433 1017 HNSCC CytB C T
12361 1836 HNSCC ND5 A G 15452 1493 HNSCC CytB C A
12372 Case 14 PANC ND5 G A 15487 2714 HNSCC CytB A T
12372 1836 HNSCC ND5 G A 15607 1493 HNSCC CytB A G
12372 2043 HNSCC ND5 G A 15693 Casel4 PANC CYTB T C
12406 1493 HNSCC ND5 G A 15693 Case 11 PANC CYTB T C
12406 1736 HNSCC ND5 G A 15693 1836 HNSCC CytB T C
12414 Case 12 PANC ND5 T C 15766 2828 HNSCC CytB A G
12603 2555 HNSCC ND5 C T 15884 Case 12 PANC D-loop G C
12612 Case9 PANC ND5 A G 15904 1691 HNSCC tRNAThr C T
12612 1691 HNSCC ND5 A G 15904 2455 HNSCC tRNAThr C T
12633 1493 HNSCC ND5 C A 15928 1493 HNSCC tRNAThr G A
12705 1736 HNSCC ND5 C T 15937 1817 HNSCC D-loop A G
12705 2051 HNSCC ND5 T C 16069 1691 HNSCC D-loop C T
12705 2714 HNSCC ND5 C T 16086 2714 HNSCC D-loop T C
12705 2828 HNSCC ND5 C T 16093 2039 HNSCC D-loop T C
12937 Case 14 PANC ND5 A G 16124 2828 HNSCC D-loop T C
13105 2828 HNSCC ND5 A G 16126 1493 HNSCC D-loop T C
13241 Case 10 PANC ND5 T C 16126 1691 HNSCC D-loop T C
13263 2714 HNSCC ND5 A G 16129 Casel PANC D-loop G A
13368 1493 HNSCC ND5 G A 16134 Case 14 PANC D-loop C T
13414 2043 HNSCC ND5 G A 16134 Casel 1 PANC D-loop C T
13563 1836 HNSCC ND5 A G 16147 1858 HNSCC D-loop C T
13617 2043 HNSCC ND5 T C 16153 Case7 PANC D-loop G A
13708 1691 HNSCC ND5 G A 16153 2828 HNSCC D-loop G A
13789 1809 HNSCC ND5 T G 16172 1736 HNSCC D-loop T C
13886 2828 HNSCC ND5 T C 16223 1736 HNSCC D-loop C T
13970 1356 HNSCC ND5 G A 16223 2714 HNSCC D-loop C T
14284 2828 HNSCC ND6 C T 16223 2828 HNSCC D-loop C T
14318 2714 HNSCC ND6 T C 16261 1535 HNSCC D-loop C T
14353 1836 HNSCC ND6 T C 16270 2043 HNSCC D-loop C T
14524 2714 HNSCC ND6 A G 16278 1836 HNSCC D-loop C T
14543 Case8 PANC ND6 A G 16278 2714 HNSCC D-loop C T
14543 Case5 PANC ND6 A G 16298 2455 HNSCC D-loop T C
14620 Case 14 PANC ND6 C T 16298 2714 HNSCC D-loop T C
14620 1836 HNSCC ND6 C T 1631 1 Case 12 PANC D-loop T C
14766 Case 15 PANC CYTB T C 1631 1 1736 HNSCC D-loop T C
14793 2043 HNSCC CytB A G 1631 1 2051 HNSCC D-loop C T
14798 1691 HNSCC CytB T C 16319 1565 HNSCC D-loop G A
14815 1809 HNSCC CytB C A 16320 1565 HNSCC D-loop C T
14884 Case 1 1 PANC CYTB C T 16320 1736 HNSCC D-loop C T
14905 1736 HNSCC CytB G A 16325 2714 HNSCC D-loop T C
14969 1493 HNSCC CytB T C 16356 Case 14 PANC D-loop T C
15017 Case 10 PANC CYTB T C 16356 1836 HNSCC D-loop T C
15043 2714 HNSCC CytB G A 16428 2043 HNSCC D-loop G A
15058 2828 HNSCC CytB C T 16488 3538 HNSCC D-loop C T
15218 2043 HNSCC CytB A G 16519 Case 12 PANC D-loop T C
15257 1565 HNSCC CytB G A 16519 1493 HNSCC D-loop T C
15301 1736 HNSCC CytB G A 16519 2555 HNSCC D-loop T C
15301 2714 HNSCC CytB G A
15301 2828 HNSCC CytB G A
15307 2760 HNSCC CytB C T
Table 1 [0057] Remarkable similarities were observed between the cancer and human mitochondrial evolution mutational landscapes. When considering individual positions (i.e., a one-dimensional analysis), de novo mutations in the cancer compendium preferentially co-localized with mutations in the inner nodes rather than the external branches (tips) of the human phylogenetic tree (p< 1.4x1025, Chi square test). Hence, cancer mutations preferentially co-localize with variants that were fixed during ancient times (inner nodes) as opposed to recent changes (tips). Moreover, positions mutated in cancer preferentially coincide with ancient variants, as reflected in their Maximal Depth of mutation Fixation (MDF) values. MDF at a given mtDNA nucleotide position was defined as the maximal depth of any branch in which it was mutated (based on a maximum parsimony model), with non-variable positions and positions mutated in the tree tips having a MDF value of zero (for a complete description of the MDF calculation, see Methods). Comparing the MDF values of all positions harboring de novo cancer mutations in the assembled cancer compendium with those of positions not included in the compendium revealed that de novo cancer mutations preferentially occur at positions involving deeper branches (p<=1.7e 31, Mann- Whitney test). Moreover, when non-cancer positions were sampled, such that they presented precisely the same distribution of natural variability as at those nucleotide positions mutated in our cancer compendium, a significant bias in MDF values toward deeper branches in the cancer positions was observed (Figure 1). This observation, in agreement with previous predictions (Brandon et al. 2006), suggests that de novo somatic mtDNA mutations in the cancer compendium correlate with ancient mutations in human phylogeny.
[0058] In this study a complex pattern of combinations was revealed, with 15 partially overlapping COMs, involving 25 mtDNA positions and 14 individual samples, being identified (Figure 2a). These COMs could be clustered into three defined patterns (i.e., a, β and γ) (Figure 2b) by combining any two COMs that overlap in at least one position. Most strikingly, two COMs, or/ and βi, involved 7 different homoplasmic mtDNA mutations, with COM a/ containing two HNSCC tumor mutations and βj including mutations seen in HNSCC and pancreatic cancer samples. Six of the 7 mutations comprising COM a/ were shared by an additional HNSCC tumor, thus defining a pattern of six de novo mutations that recurred in 3 individuals. These patterns represent 'ridges' in a multi-dimensional mutational landscape which are highly unlikely to have occurred by chance. Indeed, reshuffling tests suggest that randomly obtaining even one COM of length 7 is highly unlikely (p=2.9xlθ"4). In addition, the number of positions involved in the COMs (n=25), was significantly larger than the expected value (p=4xlθ~3). Hence, rejecting the null hypothesis (i.e. that the observed COMs pattern is a coincidence, potentially resulting from elevated mutation rate at COM positions) suggests that the observed correlation is due to selection. It is noteworthy that the reshuffling test was performed based on the observed mutation rates in cancer, and may, therefore, overestimate the length of COMs that are expected to occur by chance. We thus conclude, that the bias in the mutational landscape of mtDNA toward non- random recurrence of COMs is best explained by an elevated fitness of cells harboring particular combinations of mutations during the development and/or progression of cancer (Mithani et al. 2007). [0059] Moreover, these results suggest that when a multi-dimensional mutational landscape is considered, a significant fraction of the reported mutations are probably under selection, and hence bear functional potential. In other studies, the conservation index (CI) was used to assess the functionality of amino acid changes (Ruiz-Pesini et al. 2004) and lineage-defining mutations in mitochondrial RNA genes (Ruiz-Pesini and Wallace 2006). However, only 5/25 and 6/25 of the COM mutations replaced amino acids or occurred in RNA genes, respectively. Intriguingly, 9 of the 25 COM mutations are synonymous, and an additional 5 involve non-coding D-loop mutations. Of these positions, synonymous and D-loop mutations also occurred in the 7 position- long a/ and βi COMs. Hence, in evolutionary terms, a large proportion of the functional potential of COMs could affect fitness without altering protein sequence. This observation supports previous suggestions that synonymous mutations should not be regarded as non-functional. Thus the current results call for a re-evaluation of the functional potential of common genetic variants, at least in the case of the mtDNA.
[0060] The basic hypothesis in this study was that similarities in the selective forces acting on the two genetic landscapes (i.e. in cancerous and natural populations) would be reflected in a fixation of variants at the same nucleotide positions in both systems. When the compendium of mutations that occurred during human evolution was screened for the presence of mutations comprising the cancer mtDNA COMs, it was revealed that most (24/25) of the cancer mtDNA mutations corresponding to mtDNA COMs matched parts of natural mtDNA haplogroups (Tables 2, 3). Perhaps the most striking example is COM ai, involving 7 de novo mutations (8701G, 9540C, 1O398G, 10873C, 12705T, 15301A and 16223T). This COM is the exact mirror image (i.e. it presents mutations in the opposite direction) of the combination of variants defining macro- haplogroup R, a lineage encompassing all western Eurasian mtDNA haplogroups. Moreover, it appears that other COMs also recapitulate major events in human mitochondrial phylogeny. COM βi perfectly recapitulates those mutations defining haplogroup U4, while COM Ω& reconstructs the exact path from haplogroup H to haplogroup V, skipping the intermediate stage observed in the HV cluster.
Table 2
Wild-type Mutated Amino Occurrence In Control
Mutation Region Amino Acid* Acid* Population
T94A D-loop - - 14
A630T tRNA-Phe - - O
C1811G 16S RNA - - 172
G2352C 16S RNA - - 38
C2706G 16S RNA - - 1914
T4580A ND2 M M 57
G4639C ND2 I T 16
A4639T ND2 T I 2384
G4646C ND2 Y Y 15
C7028T COI A A 1930
G7028C COI A A 469
C7424G COI E D 12
C8701G ATP6 A T 784
G9540C com L L 791
C10398G ND3 T A 1093
G10873C ND4 P P 788
C12308G tRNA-Leu(CUN) - - 287
T12372A ND5 L L 329
G12705C ND5 I I 1353
A12705T ND5 I I 1046
A14620T ND6 G G 13
T15301A Cytb L L 740
G15693C Cytb M T 13
A15904T tRNA-Thr - - 58
A16134T D-loop - - 2
A16223T D-loop - - 939
G16311C D-loop - - 377
A16311T D-loop - - 1539
G16356C D-loop - - 26
Table 3
A. αl *"*cc Occurs in haplogroups (# with variant/total)
Figure imgf000016_0001
A T A T C G C 7 27 B4a(35/37), B4b(28/29), Bsl(19/24), Bs3(l/1), F(56/56), Hl(75/76),
H10(6/6), H13(12/12), H2(11/11), H4(12/12), H5(14/15), H6(7/7), H7(12/12), HVl(2/2), HVsl(l/l), HVs2(13/13), HVs3(2/2), Hsl(20/20), HslO(2/2), HsI 1(7/7), Hs 12(7/7), Hsl3(2/2), Hsl4(l/1), Hsl6(2/2), Hsl7(32/32), Hsl8(l/1), Hs20(7/7), Hs21(2/2), Hs22(l/1), Hs23(l/1), Hs24(l/1), Hs25(3/3), Hs26(6/6), Hs28(2/2), Hs29(l/1), Hs3(l/1), Hs6(l/1), Hs7(2/2), Hs8(2/2), Hs9(l/1), J(3/98), K(14/70), P(14/18), R31(3/3), R5(10/10), RsI (5/5), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(l/2), Rs6(l/2), Rs7(5/5), Rs8(3/3), Rs9(l/1), Tl (19/20), T2(51/51), Ts(2/2), U 1(4/5), U2(10/l 1), U3(6/6), U4(6/6), U5(45/46), U6(38/39), U7(7/7), U8(3/3), UsI (3/3), Us2(l/2), Us3(3/3), V(52/52)
G C G C T A T 654 C(20/20), Dl(2/2), D2sl(40/42), D2slO(l/l), D2sl 1(2/2), D2sl2(l/1),
D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(8/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(42/43), D4b(18/19), D4slO(2/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4s 14(2/2), D4s3(l/1), D4s4(3/3), D4s5(52/53), D4s6(3/4), D4s7(8/8), D4s8(21/33), D4s9(8/10), D5(33/36), Dsl(l/1), Ds2(2/2), E(13/13), G(64/64), Ll(2/32), L2a(33/34), L2b(7/7), L2c(8/8), L2d(2/2), L3a(18/19), L3s 1(17/18), L3s2(5/5), Ml (47/50), M 10(7/7), Ml 1(8/8), Ml(JP), M7a(45/47), M7b(34/36), M7c(6/7), M8a(l 1/11), MsI (2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Qsl(3/3), Qs2(3/6), Z(15/15)
A T G T C G C 1 86 Bs 1(4/24), Bs2(29/38), J(95/98), K(50/70), P(3/l 8), Rs 10(2/2), U2( 1/11),
U6(l/39), Us2(l/2)
A T A T T G T 1 7 6 A(53/54), HVs4(l/2), Nlb(13/14), N9a(32/35), N9s(16/19), Nsl(l/1),
Ns2(l/1), Ns4(5/6), Ns5(2/3), W(27/27), X(25/25)
G C G C T G T 5 1 L0(24/28), Ll (27/32)
G C G C T A C 3 7 D2sl(2/42), D2s6(l/9), D4a(l/43), D4b(l/19), D4s 10(2/4), D4s8(12/33), D4s9(2/10), D5(3/36), L0(l/28), L2a(l/34), L2s(l/1), M 1(3/50), M7a(l/47), M7b(2/36), M7c(l/7), Qs2(3/6)
A T G T T G T 1 9 1(17/18), Ns6(2/2)
A C G C T A T 1 2 D4s2(l l/12), M7a(l/47)
A T G T C G T 1 2 Bs2(7/38), K(5/70)
A T A T C G T 7 B4a(2/37), H5(l/15), N9s(l/19), P(l/18), Rs5(l/2), Tl(l/20)
A T A T T G C 7 Nlb(l/14), N9a(2/35), N9s(2/19), Ns4(l/6), Ns5(l/3)
A T G T T G C 7 Bs2(l/38), 1(1/18), Y(5/5)
A T A T C A C 3 B4b(l/29), Bsl(l/24), U5(l/46)
G C A C T G C 3 Ll (3/32)
G T A T C G C 3 Hl(l/76), Rs6(l/2), Ul(l/5)
G C G C T G C 2 L0(2/28)
G C G T T A T 2 D4s5(l/53), L3a(l/19)
G T A T T G T 2 A(l/54), N9a(l/35)
A C G C T A C 1 D4s2(l/12)
A C G T C G T 1 Bs2(l/38)
A T G T C A C 1 K(l/70)
G C A C T G T 1 L0(l/28)
G C G C A A T 1 D4s6(l/4)
G C G C C A T 1 L3sl(l/18)
G T A T T G C 1 HVs4(l/2) B. α2*
O N>
-J ^o -J U) K) Occ Occurs in haplogroups (# with variant/total)
O J-. O O to
O
A T A C G C 727B4a(35/37), B4b(28/29), BsI (19/24), Bs3(l/1), F(56/56), Hl(75/76), H10(6/6), H13(12/12), H2(l l/l l), H4(12/12), H5(14/15), H6(7/7), H7(12/12), HVl(2/2), HVsl(l/l), HVs2(13/13), HVs3(2/2), Hsl(20/20), HslO(2/2), HsI 1(7/7), Hsl2(7/7), Hsl3(2/2), Hsl4(l/1), Hsl6(2/2), Hsl7(32/32), Hsl8(l/1), Hs20(7/7), Hs21(2/2), Hs22(l/1), Hs23(l/1), Hs24(l/1), Hs25(3/3), Hs26(6/6), Hs28(2/2), Hs29(l/1), Hs3(l/1), Hs6(l/1), Hs7(2/2), Hs8(2/2), Hs9(l/1), J(3/98), K( 14/70), P(14/18), R31(3/3), R5(10/10), Rsl(5/5), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(l/2), Rs6(l/2), Rs7(5/5), Rs8(3/3), Rs9(l/1), Tl(19/20), T2(51/51), Ts(2/2), Ul(4/5), U2(10/l l), U3(6/6), U4(6/6), U5(45/46), U6(38/39), U7(7/7), U8(3/3), Us 1(3/3), Us2(l/2), Us3(3/3), V(52/52)
G C G T A T 656C(20/20), Dl(2/2), D2sl(40/42), D2sl0(l/l), D2sl 1(2/2), D2sl2(l/l), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(8/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(42/43), D4b(18/19), D4slO(2/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4sl4(2/2), D4s3(l/1), D4s4(3/3), D4s5(53/53), D4s6(3/4), D4s7(8/8), D4s8(21/33), D4s9(8/10), D5(33/36), Dsl(l/1), Ds2(2/2), E(13/13), G(64/64), Ll(2/32), L2a(33/34), L2b(7/7), L2c(8/8), L2d(2/2), L3a(19/19), L3sl(17/18), L3s2(5/5), M 1(47/50), M10(7/7), Ml 1(8/8), M2(7/7), M7a(45/47), M7b(34/36), M7c(6/7), M8a(l 1/11), Ms 1(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Qsl(3/3), Qs2(3/6), Z(15/15) A T G C G C 186Bsl(4/24), Bs2(29/38), J(95/98), K(50/70), P(3/18), Rsl0(2/2), U2(l/l l),
U6(l/39), Us2(l/2) A T A T G T 176A(53/54), HVs4(l/2), Nlb(13/14), N9a(32/35), N9s(16/19), NsI(Ul), Ns2(l/1),
Ns4(5/6), Ns5(2/3), W(27/27), X(25/25)
G C G T G T 5l L0(24/28), Ll (27/32)
G C G T A C 37D2sl(2/42), D2s6(l/9), D4a(l/43), D4b(l/19), D4slO(2/4), D4s8(12/33), D4s9(2/10), D5(3/36), L0(l/28), L2a(l/34), L2s(l/1), Ml(3/50), M7a(l/47), M7b(2/36), M7c(l/7), Qs2(3/6)
A T G T G T 191(17/18), Ns6(2/2)
A C G T A T 12Bs2(7/38), D4s2(l l/12), K(5/70), M7a(l/47)
A T A C G T 7B4a(2/37), H5(l/15), N9s(l/19), P(l/18), Rs5(l/2), Tl(l/20)
A T A T G C 7Nlb(l/14), N9a(2/35), N9s(2/19), Ns4(l/6), Ns5(l/3)
A T G T G C 7Bs2(l/38), 1(1/18), Y(5/5)
A T A C A C 3B4b(l/29), Bsl(l/24), U5(l/46)
G C A T G C 3 Ll (3/32)
G T A C G C 3Hl(l/76), Rs6(l/2), Ul(l/5)
G C G T G C 2L0(2/28)
G T A T G T 2A(l/54), N9a(l/35)
A C G C G T 1 BS2(1/38)
A C G T A C l D4s2(l/12)
A T G C A C 1 K(1/7O)
G C A T G T I LO(I /28)
G C G A A T 1 D4S6(1/4)
G C G C A T l L3sl(l/18)
G T A T G C 1 HVS4(1/2) C. α3 to 2J S Occ Occurs in haplogroups (# with variant/total) to — ex
T A A 1292 A(76/77), B4a(37/37), B4b(45/45), Bsl(21/25), Bs3(l/1), F(56/56), Hl(136/138), HKXIO/IO), H13(16/16), H2(28/28), H4(22/22), H5(26/26), H6(16/16), H7(17/18), HVl(3/3), HVsl(3/3), HVs2(19/19), HVs3(2/2), HVs4(l/2), Hsl(33/33), HslO(2/2), HsI 1(10/10), Hsl2(8/8), Hsl3(4/4), Hsl4(3/3), Hsl5(2/2), Hsl6(2/2), Hsl7(52/52), Hsl8(2/2), Hsl9(2/2), Hs2(2/2), Hs20(21/21), Hs21(4/4), Hs22(l/1), Hs23(2/2), Hs24(l/1), Hs25(6/6), Hs26(10/10), Hs27(l/1), Hs28(3/3), Hs29(l/1), Hs3(2/2), Hs6(l/1), Hs7(4/4), Hs8(3/3), Hs9(5/5), J(7/141), K(25/109), Nlb(15/15), N9a(34/35), N9s(19/19), Nsl(l/1), Ns2(l/1), Ns4(6/6), Ns5(3/3), P(15/18), R31(3/3), R5(10/10), Rsl(5/5), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(l/2), Rs7(5/5), Rs8(3/3), Rs9(2/2), Tl (29/30), T2(85/85), Ts(2/2), Ul(5/6), U2(17/18), U3(9/9), U4(12/12), U5(73/73), U6(38/41), U7(9/9), U8(3/3), Usl(3/3), Us2(l/2), Us3(3/3), V(58/58), W(35/35), X(25/25)
T G G 739 C(31/31), D1(11/11), D2sl(42/42), D2slO(l/l), D2sl 1(2/2), D2sl2(l/1), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(43/43), D4b(19/19), D4slO(4/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4sl4(2/2), D4s3(l/1), D4s4(3/3), D4s5(53/53), D4s6(4/4), D4s7(8/8), D4s8(33/33), D4s9(10/10), D5(36/36), Dsl(l/1), Ds2(2/2), E(15/15), G(64/64), L0(27/28), Ll(14/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3sl(18/18), L3s2(5/5), Ml(51/51), M10(7/7), Ml 1(8/8), M2(7/7), M7a(46/47), M7b(36/36), M7c(8/8), M8a(12/12), Ms 1(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Qsl(3/3), Qs2(6/6), Z(15/15)
T A G 319Bsl(4/25),Bs2(38/38),D4s2(12/12), 1(31/31), J(134/141),K(83/109),M7a(l/47), Ns3(l/1), Ns6(2/2), P(3/18), RslO(2/2), U2(l/18), U6(l/41), Us2(l/2), Y(5/5)
C G G 34Ll(15/32),L3a(19/19)
T G A HA(l/77),Hl(l/138),HVs4(l/2),L0(l/28),Ll(3/32),N9a(l/35),Rs6(l/2),Tl(l/30), Ul(l/6)
C A A 3H1(1/138), U6(2/41)
C A G 1K(1/1O9)
T A T 1H7(1/18)
D. α4
-I^ -J Occ Occurs in haplogroups (# with variant/total)
A C 1350B4a(37/37), B4b(45/45), Bsl(25/25), Bs2(37/38), Bs3(l/1), F(56/56), Hl(138/138), H10(10/10), H13(16/16), H2(28/28), H4(22/22), H5(26/26), H6(16/16), H7(18/18), HVl(3/3), HVslQ/3), HVs2(19/19), HVs3(2/2), Hsl(33/33), Hsl0(2/2), HsI 1(10/10), Hsl2(8/8), Hsl3(4/4), Hsl4(3/3), Hsl5(2/2), Hsl6(2/2), Hsl7(51/52), Hsl8(2/2), Hsl9(2/2), Hs2(2/2), Hs20(21/21), Hs21(4/4), Hs22(l/1), Hs23(2/2), Hs25(6/6), Hs26(10/10), Hs27(l/1), Hs28(3/3), Hs29(l/1), Hs3(2/2), Hs6(l/1), Hs7(4/4), Hs8(3/3), Hs9(5/5), J(141/141), K(109/109), L3sl(l/18), N9s(l/19), P(18/18), R31(3/3), R5(10/10), Rsl(5/5), RslO(2/2), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(5/5), Rs8(3/3), Rs9(l/2), Tl(30/30), T2(85/85), Ts(2/2), Ul(6/6), U2(18/18), U3(9/9), U4(12/12), U5(73/73), U6(41/41), U7(9/9), U8(3/3), Usl(3/3), Us2(2/2), Us3(3/3), V(58/58)
A T 1037 A(77/77), Bs2(l/38), C(31/31), Dl(11/11), D2sl(42/42), D2slO(l/l), D2sl 1(2/2),
D2sl2(l/1), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(43/43), D4b(19/19), D4slO(4/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4sl4(2/2), D4s2(12/12), D4s3(l/1), D4s4(3/3), D4s5(53/53), D4s6(3/4), D4s7(8/8), D4s8(33/33), D4s9(10/10), D5(36/36), Dsl(l/1), Ds2(l/2), E(15/15), G(64/64), HVs4(2/2), 1(31/31), L0(28/28), Ll(32/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3a(19/19), L3sl(9/18), L3s2(5/5), Ml(51/51), M10(7/7), Ml 1(8/8), M2(7/7), M7a(47/47), M7b(36/36), M7c(8/8), M8a(12/12), Msl(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Nlb(15/15), N9a(35/35), N9s(18/19), Nsl(l/1), Ns2(l/1), Ns3(l/1), Ns4(6/6), Ns5(3/3), Ns6(2/2), Qsl(3/3), Qs2(6/6), W(35/35), X(25/25), Y(5/5), Z(15/15)
G T 9Ds2(l/2), L3sl(8/18)
G C 3 Hsl7(l/52), Hs24(l/l), Rs9(l/2) A A 1 D4S6(1/4)
E. α5*
^ ω Occ Occurs in haplogroups (# with variant/total)
C T 766 B4a(30/37), B4b(29/29), Bsl(9/24), Bs2(37/38), Bs3(l/1), F(30/56), Hl(74/76), H10(6/6), H13(9/12), H2(10/l l), H4(12/12), H5(14/15), H6(7/7), H7(12/12), Hsl(20/20), HslO(2/2), HsI 1(3/7), Hsl2(7/7), Hsl3(2/2), Hsl4(l/1), Hsl6(2/2), Hsl7(28/32), Hsl8(l/1), Hs20(6/7), Hs21(2/2), Hs22(l/1), Hs23(l/1), Hs25(3/3), Hs26(6/6), Hs28(l/2), Hs29(l/1), Hs3(l/1), Hs6(l/1), Hs7(2/2), Hs8(2/2), Hs9(l/1), HVl(2/2), HVsl(l/l), HVs2(8/13), HVs3(l/2), J(95/98), K(2/70), L3sl(l/18), N9s(l/19), P(15/18), R31(3/3), R5(8/10), Rsl(4/5), Rs2(l/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(4/5), Rs8(3/3), Tl(18/20), T2(51/51), Ts(2/2), Ul(5/5), U2(10/l l), U3(6/6), U4(6/6), U5(42/46), U6(29/39), U7(7/7), U8(3/3), Usl(3/3), Us2(l/2), Us3(2/3), V(50/52)
T T 772 A(52/54), Bs2(l/38), C(19/20), Dl(2/2), D2sl(40/42), D2sl 1(2/2), D2sl2(l/1), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(7/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(41/43), D4b(19/19), D4s 10(4/4), D4sl 1(1/1), D4s 12(3/3), D4s 13(1/1), D4s 14(2/2), D4s2(12/12), D4s4(3/3), D4s5(51/53), D4s6(3/4), D4s7(8/8), D4s8(32/33), D4s9(9/10), D5(33/36), Dsl(l/1), Ds2(l/2), E(13/13), G(63/64), HVs4(2/2), 1(7/18), L0(l/28), Ll(2/32), L2a(33/34), L2b(6/7), L2c(8/8), L2d(l/2), L3a(13/19), L3sl(13/18), Ml 1(6/8), M2(6/7), M7a(47/47), M7b(35/36), M7c(7/7), M8a(7/l l), Msl(l/2), Ms4(2/2), Nlb(13/14), N9a(34/35), N9s(17/19), Ns2(l/1), Ns4(6/6), Ns5(2/3), W(26/27), X(25/25), Y(3/5), Z(14/15)
T C 203 A(2/54), C(l/20), D2sl(2/42), D2slO(l/l), D2s2(5/5), D2s6(2/9), D4a(2/43), D4s3(l/1), D4s5(2/53), D4s8(l/33), D4s9(l/10), D5(3/36), Ds2(l/2), G(l/64), 1(11/18), L0(27/28), Ll(30/32), L2a(l/34), L2b(l/7), L2d(l/2), L2s(l/1), L3a(6/19), L3sl(4/18), L3s2(5/5), M1(5O/5O), M10(7/7), Ml 1(2/8), M2(l/7), M7b(l/36), M8a(4/l l), Msl(l/2), Ms2(l/1), Ms3(4/4), Nlb(l/14), N9a(l/35), N9s(l/19), Nsl(l/1), Ns5(l/3), Ns6(2/2), Qsl(3/3), Qs2(6/6), W(l/27), Y(2/5), Z(1/15)
C C 174 B4a(7/37), Bsl(15/24), F(26/56), Hl(2/76), H13(3/12), H2(l/11), H5(l/15), HsI 1(4/7), Hsl7(4/32), Hs20(l/7), Hs24(l/1), Hs28(l/2), HVs2(5/13), HVs3(l/2), J(3/98), K(68/70), P(3/18), R5(2/10), Rsl(l/5), RslO(2/2), Rs2(l/2), Rs7(l/5), Rs9(l/1), Tl(2/20), U2(l/l l), U5(4/46), U6(10/39), Us2(l/2), Us3(l/3), V(l/52)
A T 1 D4s6(l/4)
F. α6
!j 5 o Sc Occ Occurs in haplogroups (# with variant/total)
O\ O OO J-.
G G T C 1849 A(77/77), B4a(37/37), B4b(45/45), Bs 1(24/25), Bs2(38/38), Bs3(l/1), C(31/31),
Dl(11/11), D2sl(42/42), D2slO(l/l), D2sl 1(2/2), D2sl2(l/1), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(43/43), D4b(19/19), D4slO(4/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4sl4(2/2), D4s2(12/12), D4s3(l/1), D4s5(52/53), D4s6(4/4), D4s7(8/8), D4s8(33/33), D4s9(10/10), D5(36/36), Dsl(l/1), Ds2(2/2), E(15/15), F(55/56), G(64/64), HVl(3/3), HVsl(3/3), HVs2(19/19), HVs3(2/2), HVs4(2/2), 1(31/31), J(137/141), K(109/109), L0(19/28), Ll(32/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3a(19/19), L3sl(18/18), L3s2(5/5), Ml(49/51), M10(7/7), Ml 1(4/8), M2(7/7), M7a(47/47), M7b(36/36), M7c(8/8), M8a(12/12), Msl(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Nlb(15/15), N9a(35/35), N9s(19/19), Nsl(l/1), Ns2(l/1), Ns3(l/1), Ns4(6/6), Ns5(3/3), Ns6(2/2), P(18/18), Qsl(2/3), Qs2(6/6), R31(3/3), R5(10/10), Rsl(5/5), RslO(2/2), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(5/5), Rs8(3/3), Rs9(2/2), Tl (30/30), T2(83/85), Ts(2/2), Ul(6/6), U2(18/18), U3(8/9), U4(12/12), U5(73/73), U6(41/41), U7(9/9), U8(3/3), Us2(2/2), Us3(3/3), W(35/35), X(25/25), Y(5/5), Z(15/15) A G C C 463 Hl (138/138), H10(10/10), H13(15/16), H2(28/28), H4(22/22), H5(26/26), H6(16/16), H7(18/18), Hs 1(33/33), Hs 10(2/2), HsI 1(10/10), Hs 12(8/8), HsI 3(4/4), Hs 14(3/3), Hsl5(2/2), Hs 16(2/2), Hsl7(52/52), Hsl8(2/2), Hsl9(2/2), Hs2(2/2), Hs20(21/21), Hs21(4/4), Hs22(l/1), Hs23(2/2), Hs24(l/1), Hs25(6/6), Hs26(10/10), Hs27(l/1), Hs28(3/3), Hs3(2/2), Hs6(l/1), Hs7(4/4), Hs8(3/3), Hs9(5/5), Ml 1(4/8)
G A T T 57V(57/58)
A G T C 23Bsl(l/25),D4s4(3/3),D4s5(l/53),F(l/56),J(3/141),L0(9/28),Ml(2/51), Usl(3/3)
G G C C 6H13(l/16),Hs29(l/l),J(l/141),Qsl(l/3), T2(l/85),U3(l/9)
G G T T lV(l/58)
G. α7 Occ Occurs in haplogroups (# with variant/total)
Figure imgf000020_0001
oo
G T A A 828A(76/77), B4a(37/37), B4b(45/45), Bsl(20/25), Bs3(l/1), F(55/56), HVl (3/3),
HVsI (3/3), HVs2( 19/19), HVs3(2/2), HVs4(l/2), J(7/141), K(25/109), Nlb(15/15), N9a(34/35), N9s(19/19), NsI(Ul), Ns2(l/1), Ns4(6/6), Ns5(3/3), P(15/18), R31(3/3), R5(10/10), Rsl(5/5), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(l/2), Rs7(5/5), Rs8(3/3), Rs9(2/2), Tl(29/30), T2(83/85), Ts(2/2), Ul(5/6), U2(17/18), U3(8/9), U4( 12/12), U5(73/73), U6(40/41), U7(9/9), U8(3/3), Us2(l/2), Us3(3/3), V(58/58), W(35/35), X(25/25)
G T G G 754C(31/31), Dl(11/11), D2sl(42/42), D2slO(l/l), D2sl 1(2/2), D2sl2(l/1), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(43/43), D4b(19/19), D4s 10(4/4), D4sl 1(1/1), D4s 12(3/3), D4sl3(l/1), D4sl4(2/2), D4s3(l/1), D4s5(52/53), D4s6(4/4), D4s7(8/8), D4s8(33/33), D4s9(10/10), D5(36/36), Dsl(l/1), Ds2(2/2), E(15/15), G(64/64), L0(19/28), Ll(29/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3a(19/19), L3sl(18/18), L3s2(5/5), Ml(49/51), M 10(7/7), Ml 1(4/8), M2(7/7), M7a(46/47), M7b(36/36), M7c(8/8), M8a(12/12), Ms 1(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Qsl(2/3), Qs2(6/6), Z(15/15)
A C A A 457Hl(137/138),H10(10/10), H13(15/16), H2(28/28), H4(22/22), H5(26/26), H6(16/16), H7(17/18), Hsl(33/33), HslO(2/2), HsI 1(10/10), Hsl2(8/8), Hsl3(4/4), Hsl4(3/3), HsI 5(2/2), Hsl6(2/2), Hsl7(52/52), Hsl8(2/2), Hsl9(2/2), Hs2(2/2), Hs20(21/21), Hs21(4/4), Hs22(l/1), Hs23(2/2), Hs24(l/1), Hs25(6/6), Hs26(10/10), Hs27(l/1), Hs28(3/3), Hs3(2/2), Hs6(l/1), Hs7(4/4), Hs8(3/3), Hs9(5/5)
G T A G 316Bsl(4/25),Bs2(38/38),D4s2(12/12), 1(31/31), J(130/141),K(84/109),M7a(l/47), Ns3(l/1), Ns6(2/2), P(3/18), RslO(2/2), U2(l/18), U6(l/41), Us2(l/2), Y(5/5)
A T G G 14D4s4(3/3),D4s5(l/53),L0(8/28), Ml(2/51)
G T G A 9A(l/77), HVs4(l/2), Ll(3/32),N9a(l/35), Rs6(l/2),T1(1/3O), Ul(l/6)
A T A A 5Bsl(l/25),F(l/56), Usl(3/3)
A C G G 4 Ml 1(4/8)
G C A A 4H13(1/16), Hs29(l/l),T2(l/85), U3(l/9)
A T A G 3J(3/141)
A C A T 1 H7(l/18)
A C G A 1H1(1/138)
A T G A 1L0O/28)
G C A G 1J(1/141)
G C G G lQsl(l/3)
H. cc8
!j o S Occ Occurs in haplogroups (# with variant/total)
O NJ VD σv oo oo
G T G 1070Bsl(4/25),Bs2(38/38),C(31/31), Dl(11/11), D2sl(42/42),D2sl0(l/l),D2sl 1(2/2),
D2sl2(l/1), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(43/43), D4b(19/19), D4slO(4/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4sl4(2/2), D4s2(12/12), D4s3(l/1), D4s5(52/53), D4s6(4/4), D4s7(8/8), D4s8(33/33),
ZOZ. D4s9(10/10), D5(36/36), Dsl(l/1), Ds2(2/2), E(15/15), G(64/64), 1(31/31), J(130/141), K(84/109), L0(19/28), Ll(29/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3a(19/19), L3sl(18/18), L3s2(5/5), M 1(49/51), M 10(7/7), Ml 1(4/8), Ml(JH), M7a(47/47), M7b(36/36), M7c(8/8), M8a(12/12), Msl(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Ns3(l/1), Ns6(2/2), P(3/18), Qsl(2/3), Qs2(6/6), RslO(2/2), U2(l/18), U6(l/41), Us2(l/2), Y(5/5), Z(15/15)
G T A 837 A(77/77), B4a(37/37), B4b(45/45), Bs 1(20/25), Bs3(l/1), F(55/56), HV 1(3/3), HVs 1(3/3), HVs2(19/19), HVs3(2/2), HVs4(2/2), J(7/141), K(25/109), Ll(3/32), Nlb(15/15), N9a(35/35), N9s(19/19), Ns 1(1/1), Ns2(l/1), Ns4(6/6), Ns5(3/3), P(15/18), R31(3/3), R5(10/10), Rsl(5/5), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(5/5), Rs8(3/3), Rs9(2/2), Tl(30/30), T2(83/85), Ts(2/2), Ul(6/6), U2(17/18), U3(8/9), U4(12/12), U5(73/73), U6(40/41), U7(9/9), U8(3/3), Us2(l/2), Us3(3/3), V(58/58), W(35/35), X(25/25) A C A 458 Hl(138/138), H10(10/10), H13(15/16), H2(28/28), H4(22/22), H5(26/26), H6(16/16), H7(17/18), Hsl(33/33), HslO(2/2), HsI 1(10/10), Hsl2(8/8), Hsl3(4/4), Hsl4(3/3), Hsl5(2/2), Hsl6(2/2), Hsl7(52/52), Hsl8(2/2), Hsl9(2/2), Hs2(2/2), Hs20(21/21), Hs21(4/4), Hs22(l/1), Hs23(2/2), Hs24(l/1), Hs25(6/6), Hs26(10/10), Hs27(l/1), Hs28(3/3), Hs3(2/2), Hs6(l/1), Hs7(4/4), Hs8(3/3), Hs9(5/5)
A T G 17 D4s4(3/3), D4s5(l/53), J(3/141), L0(8/28), Ml(2/51)
A T A 6 Bsl(l/25), F(l/56), L0(l/28), Usl(3/3)
A C G 4 Ml 1(4/8)
G C A 4 H13(1/16), Hs29(l/1), T2(l/85), U3(l/9)
G C G 2 J(1/141), Qsl(l/3)
A C T 1 H7(1/18) α9
Occ Occurs in haplogroups (# with variant/total)
G T 1907 A(77/77), B4a(37/37), B4b(45/45), Bsl(24/25), Bs2(38/38), Bs3(l/1), C(31/31), Dl(11/1 1),
D2sl(42/42), D2slO(l/l), D2sl 1(2/2), D2sl2(l/1), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(43/43), D4b(19/19), D4slO(4/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4sl4(2/2), D4s2(12/12), D4s3(l/1), D4s5(52/53), D4s6(4/4), D4s7(8/8), D4s8(33/33), D4s9(10/10), D5(36/36), Dsl(l/1), Ds2(2/2), E(15/15), F(55/56), G(64/64), HVl(3/3), HVsl(3/3), HVs2(19/19), HVs3(2/2), HVs4(2/2), 1(31/31), J(137/141), K(109/109), L0(19/28), Ll(32/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3a(19/19), L3sl(18/18), L3s2(5/5), Ml(49/51), M10(7/7), Ml 1(4/8), M2(7/7), M7a(47/47), M7b(36/36), M7c(8/8), M8a(12/12), Msl(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Nlb(15/15), N9a(35/35), N9s(19/19), Nsl(l/1), Ns2(l/1), Ns3(l/1), Ns4(6/6), Ns5(3/3), Ns6(2/2), P(18/18), Qsl(2/3), Qs2(6/6), R31(3/3), R5(10/10), Rsl(5/5), RslO(2/2), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(5/5), Rs8(3/3), Rs9(2/2), T1(3O/3O), T2(83/85), Ts(2/2), Ul(6/6), U2(18/18), U3(8/9), U4(12/12), U5(73/73), U6(41/41), U7(9/9), U8(3/3), Us2(2/2), Us3(3/3), V(58/58), W(35/35), X(25/25), Y(5/5), Z(15/15)
A C 463 Hl(138/138), H10(10/10), H13(15/16), H2(28/28), H4(22/22), H5(26/26), H6(16/16),
H7(18/18), Hsl(33/33), HslO(2/2), HsI 1(10/10), Hsl2(8/8), Hsl3(4/4), Hsl4(3/3), Hsl5(2/2), Hsl6(2/2), Hsl7(52/52), Hsl8(2/2), Hsl9(2/2), Hs2(2/2), Hs20(21/21), Hs21(4/4), Hs22(l/1), Hs23(2/2), Hs24(l/1), Hs25(6/6), Hs26(10/10), Hs27(l/1), Hs28(3/3), Hs3(2/2), Hs6(l/1), Hs7(4/4), Hs8(3/3), Hs9(5/5), Ml 1(4/8)
A T 23 Bsl(l/25), D4s4(3/3), D4s5(l/53), F(l/56), J(3/141), L0(9/28), Ml(2/51), Usl(3/3)
G C 6 H13(l/16), Hs29(l/l), J(l/141), Qsl(l/3), T2(l/85), U3(l/9) J. αlO
■t-
2 § Occ Occurs in haplogroups (# with variant/total)
VO OO
T T 1916A(77/77), B4a(37/37), B4b(44/45), Bsl(25/25), Bs2(38/38), Bs3(l/1), C(31/31), Dl(l l/11), D2sl(42/42), D2slO(l/l), D2sl 1(2/2), D2sl2(l/1), D2s2(5/5), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(43/43), D4b(19/19), D4slO(4/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4s 14(2/2), D4s2(12/12), D4s3(l/1), D4s4(3/3), D4s5(53/53), D4s6(4/4), D4s7(8/8), D4s8(33/33), D4s9(10/10), D5(36/36), Dsl(l/1), Ds2(2/2), E(15/15), F(56/56), G(64/64), HVl(3/3), HVsl(3/3), HVs2(19/19), HVs3(2/2), HVs4(2/2), 1(31/31), J(140/141), K(109/109), L0(28/28), Ll(32/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3a(19/19), L3sl(18/18), L3s2(5/5), Ml(51/51), M10(7/7), Ml 1(4/8), Ml(IH), M7a(47/47), M7b(36/36), M7c(8/8), M8a(12/12), Msl(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Nlb(15/15), N9a(35/35), N9s(19/19), Nsl(l/1), Ns2(l/1), Ns3(l/1), Ns4(6/6), Ns5(3/3), Ns6(2/2), P(18/18), Qsl(l/3), Qs2(6/6), R31(3/3), R5(10/10), Rsl(5/5), RslO(2/2), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(5/5), Rs8(3/3), Rs9(2/2), Tl(30/30), T2(83/85), Ts(2/2), Ul(6/6), U2(18/18), U3(8/9), U4(12/12), U5(73/73), U6(41/41), U7(9/9), U8(3/3), Usl(3/3), Us2(2/2), Us3(3/3), V(47/58), W(35/35), X(25/25), Y(5/5), Z(15/15)
T C 467 Hl(136/138), H10(10/10), H13(16/16), H2(28/28), H4(22/22), H5(26/26), H6(16/16),
H7(18/18), Hsl(33/33), HslO(2/2), HsI 1(10/10), Hsl2(8/8), Hsl3(4/4), Hsl4(3/3), Hsl5(2/2), Hs 16(2/2), Hs 17(52/52), Hs 18(2/2), Hs 19(2/2), Hs2(2/2), Hs20(21/21), Hs21(4/4), Hs22(l/1), Hs23(2/2), Hs24(l/1), Hs25(6/6), Hs26(10/10), Hs27(l/1), Hs28(3/3), Hs29(l/1), Hs3(2/2), Hs6(l/1), Hs7(4/4), Hs8(3/3), Hs9(5/5), J(l/141), Ml 1(4/8), Qsl(l/3), T2(l/85), U3(l/9)
C T 14B4b(l/45), D2s3(l/l), Qsl(l/3), V(11/58)
C C 2H1(2/138)
K. βl* Occ Occurs in haplogroups (# with variant/total)
Figure imgf000022_0001
A T A G C T T 1648 A(53/54), B4a(37/37), B4b(29/29), Bsl(24/24), Bs2(38/38), Bs3(l/l),
C( 18/20), D 1 (2/2), D2s 1 (42/42), D2s 10( 1 /1 ), D2s 11 (2/2), D2s 12( 1 /1 ), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(42/43), D4b(19/19), D4slO(4/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4sl4(2/2), D4s2(12/12), D4s3(l/1), D4s4(3/3), D4s5(52/53), D4s6(4/4), D4s7(4/8), D4s8(32/33), D4s9(10/10), D5(36/36), Dsl(l/1), Ds2(2/2), E(13/13), F(56/56), G(64/64), Hl(73/76), H10(6/6), H13(12/12), H2(l 1/11), H4(12/12), H5(15/15), H6(7/7), H7(12/12), HVl(2/2), HVsl(l/l), HVs2(12/13), HVs3(2/2), HVs4(2/2), Hsl(20/20), HslO(2/2), HsI 1(7/7), Hsl2(7/7), Hsl3(2/2), Hsl4(l/1), Hsl6(2/2), Hsl7(32/32), Hsl8(l/1), Hs20(7/7), Hs21(2/2), Hs22(l/1), Hs23(l/1), Hs24(l/1), Hs25(3/3), Hs26(5/6), Hs28(2/2), Hs29(l/1), Hs3(l/1), Hs6(l/1), Hs7(2/2), Hs8(2/2), Hs9(l/1), 1(18/18), J(96/98), L0(27/28), Ll(32/32), L2a(34/34), UIb(IH), L2c(8/8), L2d(2/2), L2s(l/1), L3a(18/19), L3sl(18/18), L3s2(5/5), Ml(50/50), M10(7/7), Ml 1(8/8), Ml(IH), M7a(47/47), M7b(36/36), UIc(IH), M8a(l 1/11), Msl(l/2), Ms3(4/4), Ms4(2/2), Nlb(13/14), N9s(19/19), Nsl(l/1), Ns4(4/6), Ns5(3/3), Ns6(2/2), P(18/18), Qsl(3/3), Qs2(6/6), R31(2/3), R5(2/10), Rsl(5/5), RslO(2/2), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(5/5), Rs8(3/3), Rs9(l/1), Tl(20/20), T2(51/51), Ts(2/2), Ul(l/5), V(52/52), W(27/27), X(25/25), Y(5/5), Z(15/15)
G T G A C T T 104 K(70/70), U2(l 1/11), U3(5/6), U7(7/7), U8(3/3), Us 1(3/3), Us2(2/2),
Us3(3/3)
A T G A C T T 89ui(4/5),U5(46/46), U6(39/39)
A T A A C T T 4lD4s7(4/8),Msl(l/2),Nlb(l/14),N9a(35/35)
A T A G C T C 19A(l/54), C(2/20), D4s8( 1/33), H 1(3/76), HVs2(l/13), Ms2(l/1), Ns2(l/1),
Ns4(l/6), R5(8/10) G C G A T C C 6 U4(6/6)
G T A G C T T 5 D4a(l/43), J(2/98), Ns4(l/6), R31(l/3)
A C A G C T T 2 D4s5(l/53), Hs26(l/6)
G T G A C T C 1 U3(l/6)
A T A G C C T 1 L3a(l/19)
A T A G T T T 1 L0(l/28)
L. β2
SS J£ Occ Occurs in haplogroups (# with variant/total) ι-> \Ω
A T 2227 A(77/77), B4a(37/37), B4b(45/45), Bsl(25/25), Bs2(38/38), Bs3(l/1), C(31/31), Dl(11/1 1),
D2sl(42/42), D2slO(l/l), D2sl 1(2/2), D2sl2(l/1), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(42/43), D4b(19/19), D4slO(4/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4sl4(2/2), D4s2(12/12), D4s3(l/1), D4s4(3/3), D4s5(53/53), D4s6(4/4), D4s7(8/8), D4s8(33/33), D4s9(10/10), D5(36/36), Dsl(l/1), Ds2(2/2), E(15/15), F(56/56), G(64/64), Hl(138/138), H10(10/10), H13(16/16), H2(28/28), H4(22/22), H5(26/26), H6(16/16), H7(18/18), HVl (3/3), HVsI (3/3), HVs2(19/19), HVs3(2/2), HVs4(2/2), Hsl(33/33), HslO(2/2), HsI 1(10/10), Hsl2(8/8), HsI 3(4/4), Hsl4(3/3), HsI 5(2/2), HsI 6(2/2), Hsl7(52/52), HsI 8(2/2), Hsl9(2/2), Hs2(2/2), Hs20(21/21), Hs21(4/4), Hs22(l/1), Hs23(2/2), Hs24(l/1), Hs25(6/6), Hs26(10/10), Hs27(l/1), Hs28(3/3), Hs29(l/1), Hs3(2/2), Hs6(l/1), Hs7(4/4), Hs8(3/3), Hs9(5/5), 1(31/31), J(139/141), K(l/109), L0(28/28), Ll(32/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3a(18/19), L3sl(18/18), L3s2(5/5), Ml(51/51), M10(7/7), Ml 1(8/8), M2(7/7), M7a(47/47), M7b(36/36), M7c(8/8), M8a(12/12), Ms 1(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Nlb(15/15), N9a(35/35), N9s(19/19), Nsl(l/1), Ns2(l/1), Ns3(l/1), Ns4(5/6), Ns5(3/3), Ns6(2/2), P(18/18), Qsl(3/3), Qs2(6/6), R31(2/3), R5(10/10), Rsl(5/5), Rsl0(2/2), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(5/5), Rs8(3/3), Rs9(2/2), Tl(30/30), T2(85/85), Ts(2/2), Ul(6/6), U5(73/73), U6(41/41), V(58/58), W(35/35), X(25/25), Y(5/5), Z(15/15)
G T 160 D4a(l/43), J(2/141), K(108/109), Ns4(l/6), R31(l/3), U2(18/18), U3(9/9), U7(9/9), U8(3/3), Us 1(3/3), Us2(2/2), Us3(3/3)
G C 12 U4(12/12)
A C l L3a(l/19)
M. β3 ω oj Occ Occurs in haplogroups (# with variant/total)
O -J
A G 2071 A(77/77), B4a(37/37), B4b(45/45), Bsl(25/25), Bs2(38/38), Bs3(l/l), C(30/31), Dl(l l/l l),
D2sl(42/42), D2slO(l/l), D2sl 1(2/2), D2sl2(l/1), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(43/43), D4b(19/19), D4slO(4/4), D4sl 1(1/1), D4sl 2(3/3), D4sl3(l/1), D4sl4(2/2), D4s2(12/12), D4s3(l/1), D4s4(3/3), D4s5(53/53), D4s6(4/4), D4s7(4/8), D4s8(33/33), D4s9(10/10), D5(36/36), Dsl(l/1), Ds2(2/2), E(15/15), F(56/56), G(64/64), Hl(138/138), H10(10/10), H13(16/16), H2(28/28), H4(22/22), H5(26/26), H6(16/16), H7(18/18), HVl(3/3), HVsl(3/3), HVs2(19/19), HVs3(2/2), HVs4(2/2), Hsl(33/33), HslO(2/2), HsI 1(10/10), Hs 12(8/8), Hs 13(4/4), Hs 14(3/3), Hs 15(2/2), Hs 16(2/2), Hs 17(52/52), Hs 18(2/2), Hsl9(2/2), Hs2(2/2), Hs20(21/21), Hs21(4/4), Hs22(l/1), Hs23(2/2), Hs24(l/1), Hs25(6/6), Hs26(10/10), Hs27(l/1), Hs28(3/3), Hs29(l/1), Hs3(2/2), Hs6(l/1), Hs7(4/4), Hs8(3/3), Hs9(5/5), 1(31/31), J(141/141), L0(28/28), Ll(32/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3a(19/19), L3sl(18/18), L3s2(5/5), Ml(51/51), M 10(7/7), Ml 1(8/8), M2(7/7), M7a(47/47), M7b(36/36), M7c(8/8), M8a(12/12), Msl(l/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Nlb(14/15), N9s(19/19), Nsl(l/1), Ns2(l/1), Ns3(l/1), Ns4(6/6), Ns5(3/3), Ns6(2/2), P(18/18), Qsl(3/3), Qs2(6/6), R31(3/3), R5(10/10), Rsl(5/5), RslO(2/2), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(5/5), Rs8(3/3), Rs9(2/2), Tl (30/30), T2(85/85), Ts(2/2), Ul (1/6), V(58/58), W(35/35), X(25/25), Y(5/5), Z(15/15)
G A 287 K(109/109), Ul(5/6), U2(18/18), U3(9/9), U4(12/12), U5(73/73), U6(41/41), U7(9/9), U8(3/3), Us 1(3/3), Us2(2/2), Us3(3/3) A A 42 C(l/31), D4s7(4/8), Msl(l/2), Nlb(l/15), N9a(35/35)
N. β4* Occ Occurs in haplogroups (# with variant/total) U)
Figure imgf000024_0001
A T C 1798 A(54/54), B4a(37/37), B4b(29/29), Bs 1(24/24), Bs2(38/38), Bs3(l/1), C(20/20), Dl (2/2),
D2sl(42/42), D2slO(l/l), D2s 11(2/2), D2sl 2(1/1), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(42/43), D4b(19/19), D4s 10(4/4), D4sl 1(1/1), D4sl2(3/3), D4sl3(l/1), D4s 14(2/2), D4s2(12/12), D4s3(l/1), D4s4(3/3), D4s5(53/53), D4s6(4/4), D4s7(8/8), D4s8(33/33), D4s9( 10/10), D5(36/36), DsI(Vl), Ds2(2/2), E(13/13), F(56/56), G(64/64), Hl (76/76), H 10(6/6), Hl 3(12/12), H2(l 1/11), H4( 12/12), H5(15/15), H6(7/7), H7(12/12), HVl (2/2), HVsl(l/l), HVs2(13/13), HVs3(2/2), HVs4(2/2), Hs 1(20/20), HslO(2/2), HsI 1(7/7), HsI 2(7/7), Hsl3(2/2), Hsl4(l/1), Hs 16(2/2), Hsl7(32/32), Hsl8(l/1), Hs20(7/7), Hs21(2/2), Hs22(l/1), Hs23(l/1), Hs24(l/1), Hs25(3/3), Hs26(6/6), Hs28(2/2), Hs29(l/1), Hs3(l/1), Hs6(l/1), Hs7(2/2), Hs8(2/2), Hs9(l/1), 1(18/18), J(96/98), L0(28/28), Ll (32/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3a(18/19), L3sl(18/18), L3s2(5/5), M 1(50/50), M10(7/7), Ml 1(8/8), Ml(IH), M7a(47/47), M7b(36/36), M7c(7/7), M8a(l l/l l), Msl(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Nlb(14/14), N9a(35/35), N9s(19/19), Nsl(l/1), Ns2(l/1), Ns4(5/6), Ns5(3/3), Ns6(2/2), P(18/18), Qsl(3/3), Qs2(6/6), R31(2/3), R5(10/10), Rsl(5/5), RslO(2/2), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(5/5), Rs8(3/3), Rs9(l/1), Tl (20/20), T2(51/51), Ts(2/2), Ul(5/5), U5(46/46), U6(37/39), V(52/52), W(27/27), X(25/25), Y(5/5), Z(15/15)
G T C 110 D4a(l/43), J(2/98), K(70/70), Ns4(l/6), R31(l/3), U2(l l/l l), U3(6/6), U7(7/7), U8(3/3), Us 1(3/3), Us2(2/2), Us3(3/3)
G C C 6 U4(6/6)
A T T 2 U6(2/39)
A C C l L3a(l/19)
O. γl*
VD S Occ Occurs in haplogroups (# with variant/total)
G C 1903 A(54/54), B4a(37/37), B4b(29/29), Bs 1(24/24), Bs2(38/38), Bs3(l/1), C(20/20), D 1(2/2),
D2sl(36/42), D2slO(l/l), D2sl 1(1/2), D2sl2(l/1), D2s2(5/5), D2s3(l/1), D2s4(l/1), D2s5(l/1), D2s6(9/9), D2s7(5/5), D2s8(l/1), D2s9(l/1), D4a(43/43), D4b(19/19), D4s 10(4/4), D4sl 1(1/1), D4sl 2(3/3), D4sl3(l/1), D4sl4(2/2), D4s2( 12/12), D4s3(l/1), D4s4(3/3), D4s5(53/53), D4s6(4/4), D4s7(8/8), D4s8(33/33), D4s9( 10/10), D5(36/36), DsI(Ul), Ds2(2/2), E(13/13), F(55/56), G(64/64), Hl (76/76), H 10(6/6), H13(12/12), H2(l l/l l), H4(12/12), H5(15/15), H6(7/7), H7(12/12), HV 1(2/2), HVsI(Ul), HVs2(13/13), HVs3(2/2), HVs4(2/2), HsI (20/20), Hs 10(2/2), HsI 1(7/7), Hs 12(7/7), HsI 3(2/2), HsH(Ul), Hs 16(2/2), Hsl7(32/32), Hsl8(l/1), Hs20(7/7), Hs21(2/2), Hs22(l/1), Hs23(l/1), Hs24(l/1), Hs25(3/3), Hs26(6/6), Hs28(2/2), Hs29(l/1), Hs3(l/1), Hs6(l/1), Hs7(2/2), Hs8(2/2), Hs9(l/1), 1(18/18), J(98/98), K(69/70), L0(28/28), Ll (32/32), L2a(34/34), L2b(7/7), L2c(8/8), L2d(2/2), L2s(l/1), L3a(19/19), L3sl(18/18), L3s2(5/5), Ml(50/50), M10(7/7), Ml 1(8/8), M2(7/7), M7a(47/47), M7b(36/36), M7c(7/7), M8a(l 1/11), Ms 1(2/2), Ms2(l/1), Ms3(4/4), Ms4(2/2), Nlb(14/14), N9a(35/35), N9s(14/19), NsI(Ul), Ns2(l/1), Ns4(6/6), Ns5(3/3), Ns6(2/2), P(18/18), QsI (3/3), Qs2(6/6), R31(3/3), R5(10/10), RsI (5/5), Rs 10(2/2), Rs2(2/2), Rs3(2/2), Rs4(2/2), Rs5(2/2), Rs6(2/2), Rs7(5/5), Rs8(3/3), Rs9(l/1), Tl (20/20), T2(51/51), Ts(2/2), U 1(5/5), U2(l 1/11), U3(6/6), U4(6/6), U5(46/46), U6(39/39), U7(7/7), U8(3/3), UsI (3/3), Us2(2/2), Us3(3/3), V(52/52), W(27/27), X(25/25), Y(5/5), Z(15/15)
A C 14 D2sl(6/42), D2sl 1(1/2), F(l/56), K(l/70), N9s(5/19)
[0061] Specifically, Table 3 shows sequence variants found in the evolutionary compendium at positions involved in COMs. The different sequence variants found in the evolutionary compendium at the same positions involved in each of the COMs (A-O) are reported. For each variant, the first 2-5 columns describe the nucleotides bases found at each COM position, with bold base names indicating variants observed in the cancer compendium. The total number of sequences that contain each variant is reported (Occ), as well as the haplogroup assignments of all the sequences carrying this variant. For each haplogroup, the number of sequences found to carry the particular variant and the total number of sequences in the compendium that were assigned to this haplogroup are provided (in parenthesis). COMs naming is as in Figure 2B. *For COMs involving D-loop nucleotide positions, only 1917 sequences for which the D-loop sequence was determined were included in the analysis.
[0062] Interestingly, some of the haplogroups that overlap with COMs were associated with phenotypes. Cybrids carrying haplogroup N (overlapping with COM a{) differed from haplogroup M cybrids in calcium uptake, while haplogroups H (COM c%) and U4 (COM β{) are associated with high and low sperm motility, respectively (Montiel-Sosa et al. 2006; Ruiz-Pesini et al. 2000). These results support our working hypothesis, suggesting that similar selective forces act on cancer and during human mitochondrial evolution. Moreover, the observation that several COMs occur at positions defining haplogroups associated with pathological phenotypes further suggests that COMs possess functional potential. While it is tempting to propose that some human haplogroups carry mutations that alter their propensity to cancer, the functional consequences of COMs are most likely affected by the genetic backgrounds upon which they occurred, a hypothesis that will be tested upon the creation of the relevant datasets. [0063] Finally, it was noticed that tumors containing COMs comprise only 24% of tumors for which any de novo mtDNA mutations were reported in our tested compendium and only 14% of all tumors tested. What could be the reason for this apparent paucity? One possible explanation is that some tumors have followed adaptation routes that do not involve mtDNA changes at all.
Accordingly, 41 of the 83 tumors analyzed from the HNSCC group completely lacked de novo mtDNA mutations.
[0064] In summary, the data provided herein of de novo cancer mtDNA mutations in two types of tumors to mutations that have been fixed during human evolution have revealed notable similarity in terms of their mutational landscapes. Thus a novel approach as presented for the analysis of mutational landscapes, i.e. the search for COMs, and present complex and non- random patterns in the de novo cancer mutational pattern. This mutational pattern is more likely to reflect a response to positive selection pressures in cancer, rather than random non-adaptive processes. Many of the COMs recapitulate combinations of mutations that occurred during human mitochondrial evolution, leading to the establishment of mtDNA haplogroups, suggesting a similarity in the patterns of mutations in normal human evolution and cancer. These findings thus pave the path towards the development of new approaches for assessing the functional potential of naturally occurring changes and for investigating basic principles in cancer and natural genomic evolution.
[0065] While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.

Claims

CLAIMSWhat is claimed is:
1. A method of confirming the diagnosis of a cancer selected from the group consisting of adenocarcinoma and squamous cell carcinoma in a subject with a neoplasm comprising the steps of (a) comparing the mitochondrial genome of an affected tissue isolated from the subject with the mitochondrial genome of a healthy tissue isolated from the same subject; and (b) detecting the presence of recurring combinations of mutations in the mitochondrial genome of the affected tissue; wherein the presence of one or more recurring combinations of mutations in the mitochondrial genome of the affected tissue indicates that the subject has cancer.
2. The method of claim 1 , wherein the recurring combinations of mutations localize in the inner nodes of a phylogenetic tree.
3. The method of claim 1, wherein the recurring combinations of mutations comprise a K number of recurring mtDNA combinations of mutations, and each recurring mtDNA combination of mutation comprises at least one homoplasmic mutation.
4. The method of claim 1 , wherein one or more recurring combination of mutations are selected from the group consisting of neutral mutations and mutations causing the replacement of one or more amino acids in the encoded protein.
5. The method of claim 1, wherein the adenocarcinoma is selected from the group consisting of colorectal cancer, lung cancer, prostate cancer, breast cancer, pancreas cancer, stomach cancer and esophageal cancer.
6. The method of claim 1, wherein the squamous cell carcinoma is selected from the group consisting of cancer of the skin, head and neck cancer, esophageal cancer, lung cancer, prostate cancer and vaginal cancer.
7. The method of claim 6, wherein the adenocarcinoma is pancreas cancer.
8. The method of claim 7, wherein the squamous cell carcinoma is head and neck cancer.
9. A method for identifying subpopulations of individuals with common combinations of mutations within a defined population comprising: a) comparing the mitochondrial genome of a target tissue in two or more subjects with the mitochondrial genome of a reference tissue in the same subjects; (b) detecting the presence of mutations in the mitochondrial genome of the target tissue; (c) converting the mutations into a list of mutated positions; (d) identifying the positions in the mitochondrial genome that are mutated in two or more subjects; and (e) determining recurrent combinations of mutations in the mitochondrial genome of the target tissue in two or more subjects, wherein the presence of recurring combinations of mutations in the mitochondrial genome of the target tissue in two or more subjects indicates that the subjects belong to the same subpopulation.
10. The method of claim 9, wherein the two or more subjects have a disease.
11. The method of claim 10, wherein the disease is cancer.
12. The method of claim 11, wherein the cancer is selected from the group consisting of adenocarcinoma and squamous cell carcinoma.
13. The method of claim 12, wherein the adenocarcinoma is selected from the group consisting of colorectal cancer, lung cancer, prostate cancer, breast cancer, pancreas cancer, stomach cancer and esophageal cancer.
14. The method of claim 12, wherein the squamous cell carcinoma is selected from the group consisting of cancer of the skin, head and neck cancer, esophageal cancer, lung cancer, prostate cancer and vaginal cancer.
PCT/IL2009/001224 2008-12-29 2009-12-29 Detection and use of recurrent mutation combination pattern WO2010076789A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US19381708P 2008-12-29 2008-12-29
US61/193,817 2008-12-29

Publications (1)

Publication Number Publication Date
WO2010076789A1 true WO2010076789A1 (en) 2010-07-08

Family

ID=42309888

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2009/001224 WO2010076789A1 (en) 2008-12-29 2009-12-29 Detection and use of recurrent mutation combination pattern

Country Status (1)

Country Link
WO (1) WO2010076789A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070083334A1 (en) * 2001-09-14 2007-04-12 Compugen Ltd. Methods and systems for annotating biomolecular sequences
US7255993B1 (en) * 2002-11-06 2007-08-14 The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services Detection of mutational frequency and related methods

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070083334A1 (en) * 2001-09-14 2007-04-12 Compugen Ltd. Methods and systems for annotating biomolecular sequences
US7255993B1 (en) * 2002-11-06 2007-08-14 The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services Detection of mutational frequency and related methods

Similar Documents

Publication Publication Date Title
Liu et al. Genome-wide evolutionary analysis of natural history and adaptation in the world’s tigers
Wildschutte et al. Discovery of unfixed endogenous retrovirus insertions in diverse human populations
Hipp et al. A framework phylogeny of the American oak clade based on sequenced RAD data
Parfrey et al. Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life
Tennessen et al. Evolutionary origins and dynamics of octoploid strawberry subgenomes revealed by dense targeted capture linkage maps
Okajima et al. Mitochondrial genomes of acrodont lizards: timing of gene rearrangements and phylogenetic and biogeographic implications
Renoult et al. Morphology and nuclear markers reveal extensive mitochondrial introgressions in the Iberian Wall Lizard species complex
Garg et al. Pervasive cis effects of variation in copy number of large tandem repeats on local DNA methylation and gene expression
Gvoždík et al. Speciation history and widespread introgression in the European short-call tree frogs (Hyla arborea sensu lato, H. intermedia and H. sarda)
Hussin et al. Haplotype allelic classes for detecting ongoing positive selection
Carvalho et al. Molecular phylogeny of Banjo catfishes (Ostaryophisi: Siluriformes: Aspredinidae): A continental radiation in South American freshwaters
Meimberg et al. A new amplicon based approach of whole mitogenome sequencing for phylogenetic and phylogeographic analysis: An example of East African white-eyes (Aves, Zosteropidae)
CN106460062A (en) Methods and compositions for SCD, CRT, CRT-D, or SCA therapy identification and/or selection
CN114921572B (en) SNP molecular marker for identifying Taihe black-bone chicken variety and application thereof
WO2010076789A1 (en) Detection and use of recurrent mutation combination pattern
Dogan et al. Idahoa and Subularia: Hidden polyploid origins of two enigmatic genera of crucifers
CN104789572B (en) GPRASP2 mutated genes, its authentication method and detection kit
Bourke Genetic mapping in polyploids
Xie et al. A practical parameterised algorithm for the individual haplotyping problem MLF
Sio et al. Mining polymorphic SSRs from individual genome sequences
Zhang et al. UCE Phylogenomics, detection of a putative hybrid population, and one older mitogenomic node age of Batrachuperus salamanders
Simonov et al. Traditional multilocus phylogeny fails to fully resolve Palearctic ground squirrels (Spermophilus) relationships but reveals a new species endemic to West Siberia
Günter et al. A chicken DNA methylation clock for the prediction of broiler health
GIPPNER et al. The effect of hybrids on phylogenomics and subspecies delimitation in Salamandra, a highly diversified amphibian genus.
Duitama et al. Genetic diversity and comparative genomics across Leishmania (Viannia) species

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09836174

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09836174

Country of ref document: EP

Kind code of ref document: A1