WO2006048266A2 - Gene expression profiling of leukemias with mll gene rearrangements - Google Patents

Gene expression profiling of leukemias with mll gene rearrangements Download PDF

Info

Publication number
WO2006048266A2
WO2006048266A2 PCT/EP2005/011732 EP2005011732W WO2006048266A2 WO 2006048266 A2 WO2006048266 A2 WO 2006048266A2 EP 2005011732 W EP2005011732 W EP 2005011732W WO 2006048266 A2 WO2006048266 A2 WO 2006048266A2
Authority
WO
WIPO (PCT)
Prior art keywords
mll
genes
expression
networks
cell
Prior art date
Application number
PCT/EP2005/011732
Other languages
French (fr)
Other versions
WO2006048266A3 (en
Inventor
Torsten Haferlach
Martin Dugas
Wolfgang Kern
Alexander Kohlmann
Susanne Schnittger
Claudia Schoch
Original Assignee
Roche Diagnostics Gmbh
F. Hoffmann-La Roche Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Roche Diagnostics Gmbh, F. Hoffmann-La Roche Ag filed Critical Roche Diagnostics Gmbh
Publication of WO2006048266A2 publication Critical patent/WO2006048266A2/en
Publication of WO2006048266A3 publication Critical patent/WO2006048266A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to the detection and classification of leukemia and accordingly, provides diagnostic and/or prognostic information in certain embodiments.
  • Leukemias are generally classified into four different groups or types: acute myeloid (AML), acute lymphatic (ALL), chronic myeloid (CML) and chronic lymphatic leukemia (CLL). Within these groups, several subcategories or subtypes can be identified using various approaches. These different subcategories of leukemia are associated with varying clinical outcomes and therefore can serve as guides to the selection of different treatment strategies. The importance of highly specific classification may be illustrated for AML as a very heterogeneous group of diseases. Effort has been aimed at identifying biological entities and to distinguish and classify subgroups of AML that are associated with, e.g., favorable, intermediate or unfavorable prognoses. In 1976, for example, the FAB classification was proposed by the French-American-British co-operative group that utilizes cytomorphology and cytochemistry to separate AML subgroups according to the morphological appearance of blasts in the blood and bone marrow.
  • leukemic blasts genetic abnormalities occurring in leukemic blasts were recognized as having a major impact on the morphological picture and on prognosis.
  • the karyotype of leukemic blasts is commonly used as an independent prognostic factor regarding response to therapy as well as survival.
  • a combination of methods is typically used to obtain the diagnostic information in leukemia.
  • the analysis of the morphology and cytochemistry of bone marrow blasts and peripheral blood cells is commonly used to establish a diagnosis.
  • immunophenotyping is also utilized to separate an undifferentiated AML from acute lymphoblastic leukemia and from CLL.
  • leukemia subtypes can be diagnosed by cytomorphology alone, but this typically requires that an expert review sample smears.
  • genetic analysis based on, e.g., chromosome analysis, fluorescence in situ hybridization (FISH), or reverse transcription PCR (RT-PCR) and immunophenotyping is also generally used to accurately assign cases to the correct category.
  • FISH fluorescence in situ hybridization
  • RT-PCR reverse transcription PCR
  • An aim of these techniques, aside from diagnosis, is to determine the prognosis of the leukemia under consideration.
  • viable cells are generally necessary, as the cells used for genetic analysis need to divide in vitro in order to obtain metaphases for the analysis.
  • Another exemplary problem is the long lag period (e.g., 72 hours) that typically occurs between the receipt of the materials to be analyzed in the laboratory and the generation of results.
  • great experience in preparing chromosomes and analyzing karyotypes is generally needed to obtain correct results in most cases.
  • hematological malignancies can be separated into CML, CLL, ALL, and AML.
  • CML CLL
  • ALL chromosome
  • AML hematological malignancies
  • prognostically relevant subtypes have been identified. This further sub-classification commonly relies on genetic abnormalities of leukemic blasts and is associated with different prognoses.
  • the sub-classification of leukemias is used increasingly as a guide to the selection of appropriate therapies.
  • the development of new, specific drugs and treatment approaches often includes the identification of specific subtypes that may benefit from a distinct therapeutic protocol and thus, improve the outcomes of distinct subsets of leukemia.
  • the therapeutic drug (STI571) inhibits the CML specific chimeric tyrosine kinase BCR-ABL generated from the genetic defect observed in CML, the BCR-ABL-rearrangement due to the translocation between chromosomes 9 and 22 (t(9;22) (q34;qll)).
  • the therapy response is dramatically higher as compared to other drugs that have previously been used.
  • Another example is a subtype of acute myeloid leukemia,
  • AML M3 and its variant M3v which both include the karyotype t(15;17)(q22;qll- 12).
  • ATRA all-trans retinoic acid
  • Golub et al. Science, 1999, 286, 531-7, which is incorporated by reference
  • gene expression profiles can be used for class prediction and discriminating AML from ALL samples.
  • the selection of the two different subgroups was performed using exclusively morphologic-phenotypical criteria.
  • the present invention relates to rapid, cost effective, and reliable approaches to detecting and genotyping leukemia.
  • methods are provided for genotyping acute leukemia cells with t(l Iq23)/MLL.
  • the invention also provides methods for distinguishing acute myeloid leukemia (AML) cells with t(l Iq23)/MLL from acute lymphoblastic leukemia (ALL) cells with t(l Iq23)/MLL in some embodiments. Aside from providing diagnostic information to patients, these distinctions can also assist in selecting appropriate therapies and in prognostication.
  • these methods include profiling the expression of selected populations of genes using real-time
  • the invention also provides related kits and systems.
  • the invention provides a method of genotyping a leukemia cell.
  • the method includes detecting an expression level of at least one set of genes in or derived from at least one target human leukemia cell.
  • the target human leukemia cell is generally obtained from a subject.
  • the set of genes is selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14.
  • the set of genes in or derived from the target human leukemia cell comprises at least about 10, 100, 1000, 10000, or more members.
  • the method also includes correlating a detected differential expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell lacking t(l Iq23)/MLL with the target human leukemia cell having t(l Iq23)/MLL; correlating a detected substantially identical expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell lacking t(l Iq23)/MLL with the target human leukemia cell lacking t(l Iq23)/MLL; correlating a detected differential expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell having t(l Iq23)/MLL with the target human leukemia cell lacking t(l Iq23)/MLL; or correlating a detected substantially identical expression of one or more genes of the target human
  • the reference human leukemia cell lacking t(l Iq23)/MLL comprises a precursor B-ALL cell with t(9;22), a precursor B-ALL cell with t(8;14), a precursor T-ALL cell, an AML cell with t(8;21), an AML cell with t(15;17), an AML cell with inv(16), or an AML cell with a complex aberrant karyotype.
  • the detected differential expression of the genes comprises at least about a 5% difference, whereas the detected substantially identical expression of the genes comprises less than about a 5% difference.
  • the method includes correlating a detected differential expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference ALL cell having t(l Iq23)/MLL with the target human acute leukemia being an AML cell; or correlating a detected substantially identical expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference AML cell having t(l Iq23)/MLL with the target human acute leukemia being an AML cell.
  • the method includes correlating a detected differential expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference AML cell having t(l Iq23)/MLL with the target human acute leukemia being an ALL cell; or correlating a detected substantially identical expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference ALL cell having t(l Iq23)/MLL with the target human acute leukemia being an ALL cell.
  • markers described herein are also optionally used for cross-lineage comparisons, such as ALL with t(l Iq23)/MLL compared to AML without t(l Iq23)/MLL, AML with t(l Iq23)/MLL compared to ALL without t(l Iq23)/MLL, and the like.
  • Expression levels are detected using essentially any gene expression profiling technique.
  • the expression level is detected using an array, a robotics system, and/or a microfluidic device.
  • the expression level of the set of genes is detected by amplifying nucleic acid sequences associated with the genes to produce amplicons and detecting the amplicons.
  • the amplicons are generally detected using a process that comprises one or more of: hybridizing the amplicons to an oligonucleotide array, digesting the amplicons with a restriction enzyme, or real-time polymerase chain reaction (PCR) analysis.
  • PCR real-time polymerase chain reaction
  • the expression level of the set of genes is detected by, e.g., measuring quantities of transcribed polynucleotides (e.g., mRNAs, cDNAs, etc.) or portions thereof expressed or derived from the genes. In some embodiments, the expression level is detected by, e.g., contacting polynucleotides or polypeptides expressed from the genes with compounds (e.g., aptamers, antibodies or fragments thereof, etc.) that specifically bind the polynucleotides or polypeptides.
  • compounds e.g., aptamers, antibodies or fragments thereof, etc.
  • the invention provides a method of producing a reference data bank for genotyping leukemia cells.
  • the method includes (a) compiling a gene expression profile of a patient sample by detecting the expression level of one or more genes of at least one human leukemia cell, which genes are selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14, and (b) classifying the gene expression profile using a machine learning algorithm.
  • the invention provides a kit that includes one or more probes that correspond to at least portions of genes or expression products thereof, which genes are selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14.
  • at least one solid support comprises the probes.
  • the kit also includes one or more additional reagents to perform real-time PCR analyses.
  • the kit also includes instructions for correlating detected expression levels of polynucleotides and/or polypeptides in at least one target leukemia cell from a human subject, which polynucleotides and/or polypeptides are targets of one or more of the probes, with the target leukemia cell comprising a t(l Iq23)/MLL.
  • the invention provides a system that includes one or more probes that correspond to at least portions of genes or expression products thereof, which genes are selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14.
  • at least one solid support comprises the probes.
  • the system includes one or more additional reagents and/or components to perform real-time PCR analyses.
  • the system also includes at least one reference data bank for correlating detected expression levels of polynucleotides and/or polypeptides in at least one target leukemia cell from a human subject, which polynucleotides and/or polypeptides are targets of one or more of the probes, with the target leukemia cell comprising a t(l Iq23)/MLL.
  • the reference data bank is generally produced by, e.g., (a) compiling a gene expression profile of a patient sample by detecting the expression level at least one of the genes, and (b) classifying the gene expression profile using a machine learning algorithm.
  • the machine learning algorithm is generally selected from, e.g., a weighted voting algorithm, a K-nearest neighbors algorithm, a decision tree induction algorithm, a support vector machine, a feed-forward neural network, etc.
  • Figure 1 is a schematic that provides a biological network node shape description.
  • Figure 2 is a schematic that provides biological network edge labels.
  • Figure 3 is a schematic that shows biological network edge types.
  • Figure 5 is a graphical display of biological network 1 referred to in Figure 4.
  • the network is graphically displayed with genes/gene products as nodes and the biological relationships between the nodes as edges.
  • the intensity of the node color indicates the degree of differential gene expression.
  • Green intensities correspond to a lower expression (downregulated fold change) in t(l Iq23)/MLL cases compared to AML subtypes (inv(16), t(8;21), t(15;17), complex karyotypes) or ALL subtypes (t(9;22), t(8;14), T-ALL), respectively.
  • Red intensities correspond to a higher expression in t(l Iq23)/MLL cases (upregulated fold change), respectively.
  • Nodes are displayed using various shapes that represent the functional class of the gene product.
  • Edges are displayed with various labels that describe the nature of the relationship between the nodes (e.g., B for binding, T for transcription). The length of an edge reflects the evidence supporting that node-to- node relationship, in that edges supported by more articles from the literature are shorter.
  • Focus genes were included in the original text format file derived from the list of differentially expressed genes. Non-focus genes were derived from queries for interactions between focus genes and all other gene objects stored in the Ingenuity knowledge data base. Gene expression raw data is provided in Tables 2- 6.
  • Figure 6 is a graphical display of biological network 2 referred to in Figure 4.
  • Figure 7 is a graphical display of biological network 3 referred to in Figure 4.
  • Figure 8 is a graphical display of biological network 4 referred to in Figure 4.
  • Figure 10 graphically shows differentially expressed genes between ALL with t(l Iq23)/MLL and AML with t(l Iq23)/MLL, and corresponds to network 1 referred to in Figure 9.
  • a biological network is displayed graphically. Additional details for the legend are provided in Figures 1-3.
  • Green intensities correspond to a lower expression in ALL with t(l Iq23)/MLL cases compared to AML with t(l Iq23)/MLL samples (downregulated fold change).
  • Red intensities correspond to a higher expression in ALL with t(l Iq23)/MLL cases compared to
  • Figure 11 is a graphically displayed biological network corresponding to network 2 referred to in Figure 9.
  • Figure 12 is a graphically displayed biological network corresponding to network 3 referred to in Figure 9.
  • Figure 13 is a graphically displayed biological network corresponding to network
  • Figure 14 is a graphically displayed biological network corresponding to network
  • Figure 15 is a graphically displayed biological network corresponding to network
  • Figure 16 is a graphically displayed biological network corresponding to network
  • Figure 17 is a graphically displayed biological network corresponding to network 8 referred to in Figure 9.
  • Figures 18 A and B are principal component analyses including various acute leukemia subtypes.
  • the leukemia samples are plotted in a three-dimensional space using the three components capturing most of the variance in the original data set. Each patient sample is represented by a single color-coded sphere. The labels and coloring of the classes were added after the analysis for means for better visualization.
  • the normalized expression value for each gene (given in rows) is coded by color (standard deviation from mean). Red cells indicate high expression and green cells indicate low expression.
  • the coloring of the groups is identical to Figures 18 A and B.
  • the t(l Iq23)/MLL leukemias are highlighted by arrows.
  • Figures 20 A and B show an unsupervised analysis of adult ALL and AML t(l Iq23)/MLL samples. Unsupervised analysis using a selection of 5,000 genes that showed the largest variance across all samples. (Panel A) In the three- dimensional PCA plot data points with similar characteristics will cluster together.
  • each patient's expression pattern is represented by a single color-coded sphere.
  • ALL with t(l Iq23)/MLL samples are labeled mauve
  • AML with t(l Iq23)/MLL are labeled turquoise, respectively.
  • the labels and coloring of the classes were added after the analysis for means for better visualization.
  • (Panel B) Enlarged dendrogram of ALL and AML t(l Iq23)/MLL samples when analyzed by unsupervised hierarchical clustering (cluster algorithm: Ward; selected coefficient: Euclidean distance).
  • cluster algorithm Ward
  • selected coefficient Euclidean distance
  • FIG. 21 graphically shows the supervised identification of differentially expressed genes.
  • the right plot shows a supervised comparison of ALL with t(l Iq23)/MLL versus AML with t(l Iq23)/MLL. Red dots indicate genes with higher expression in AML with t(l Iq23)/MLL and green dots indicate higher expressed genes in ALL with t(l Iq23)/MLL.
  • Figure 22 shows an unsupervised analysis of adult ALL and AML t(l Iq23)/MLL samples.
  • the unsupervised analysis is based on 5,000 genes that showed the largest variance across all samples. For better visualization the labels and coloring of the classes were added after the analysis.
  • the three-dimensional PCA plot data points with similar characteristics will cluster together.
  • each patient's expression pattern is represented by a single color-coded sphere.
  • MLL fusion partner gene as confirmed by FISH and/or PCR-based molecular analyses is given.
  • MLL-X indicates samples with unknown partner genes.
  • Ilq23/MLL refers an 1 Iq23 rearrangement of the human MLL gene.
  • An “antibody” refers to a polypeptide substantially encoded by at least one immunoglobulin gene or fragments of at least one immunoglobulin gene, which can participate in specific binding with a ligand.
  • the term “antibody” includes polyclonal and monoclonal antibodies and biologically active fragments thereof including among other possibilities "univalent” antibodies (Glennie et al. (1982)
  • Fab proteins including Fab 1 and F(ab') 2 fragments whether covalently or non-covalently aggregated; light or heavy chains alone, typically variable heavy and light chain regions (V H and V L regions), and more typically including the hypervariable regions (otherwise known as the complementarity determining regions (CDRs) of the V H and V L regions); F c proteins; "hybrid” antibodies capable of binding more than one antigen; constant-variable region chimeras; "composite” immunoglobulins with heavy and light chains of different origins; "altered” antibodies with improved specificity and other characteristics as prepared by standard recombinant techniques, by mutagenic techniques, or other directed evolutionary techniques known in the art.
  • scFvs chimeric and humanized antibodies. See, e.g., Harlow and Lane, Antibodies, a laboratory manual, CSH Press (1988), which is incorporated by reference.
  • detection of polypeptides using antibodies or fragments thereof there are a variety of methods known to a person skilled in the art, which are optionally utilized. Examples include immunoprecipitations, Western blottings,
  • Enzyme-linked immuno sorbent assays ELISA
  • radioimmunoassays RIA
  • dissociation-enhanced lanthanide fluoro immuno assays DELFIA
  • scintillation proximity assays SPA
  • an antibody is typically labeled by one or more of the labels described herein or otherwise known to persons skilled in the art.
  • an "array” or “microarray” refers to a linear or two- or three dimensional arrangement of preferably discrete nucleic acid or polypeptide probes which comprises an intentionally created collection of nucleic acid or polypeptide probes of any length spotted onto a substrate/solid support.
  • a collection of nucleic acids or polypeptide spotted onto a substrate/solid support also under the term "array”.
  • a microarray usually refers to a miniaturized array arrangement, with the probes being attached to a density of at least about 10, 20, 50, 100 nucleic acid molecules referring to different or the same genes per cm 2 .
  • an array can be referred to as "gene chip”.
  • the array itself can have different formats, e.g., libraries of soluble probes or libraries of probes tethered to resin beads, silica chips, or other solid supports.
  • complementary and “complementarity”, respectively, can be described by the percentage, i.e., proportion, of nucleotides that can form base pairs between two polynucleotide strands or within a specific region or domain of the two strands.
  • complementary nucleotides are, according to the base pairing rules, adenine and thymine (or adenine and uracil), and cytosine and guanine.
  • Complementarity may be partial, in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or, there may be a complete or total complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has effects on the efficiency and strength of hybridization between nucleic acid strands.
  • Two nucleic acid strands are considered to be 100% complementary to each other over a defined length if in a defined region all adenines of a first strand can pair with a thymine (or an uracil) of a second strand, all guanines of a first strand can pair with a cytosine of a second strand, all thymine (or uracils) of a first strand can pair with an adenine of a second strand, and all cytosines of a first strand can pair with a guanine of a second strand, and vice versa.
  • the degree of complementarity is determined over a stretch of about 20 or 25 nucleotides, i.e., a 60% complementarity means that within a region of 20 nucleotides of two nucleic acid strands 12 nucleotides of the first strand can base pair with 12 nucleotides of the second strand according to the above base pairing rules, either as a stretch of 12 contiguous nucleotides or interspersed by non-pairing nucleotides, when the two strands are attached to each other over the region of 20 nucleotides.
  • the degree of complementarity can range from at least about 50% to full, i.e., 100% complementarity.
  • Two single nucleic acid strands are said to be "substantially complementary" when they are at least about 80% complementary, and more typically about 90% complementary or higher.
  • substantial complementarity is generally utilized.
  • Two nucleic acids “correspond” when they have substantially identical or complementary sequences, when one nucleic acid is a subsequence of the other, or when one sequence is derived naturally or artificially from the other.
  • differential gene expression refers to a gene or set of genes whose expression is activated to a higher or lower level in a subject suffering from a disease, (e.g., cancer) relative to its expression in a normal or control subject. Differential gene expression can also occur between different types or subtypes of diseased cells. The term also includes genes whose expression is activated to a higher or lower level at different stages of the same disease. It is also understood that a differentially expressed gene may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product.
  • Differential gene expression may include a comparison of expression between two or more genes or their gene products, or a comparison of the ratios of the expression between two or more genes or their gene products, or even a comparison of two differently processed products of the same gene, which differ between, e.g., normal subjects and subjects suffering from a disease, various stages of the same disease, different types or subtypes of diseased cells, etc.
  • Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a gene or its expression products among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages.
  • “differential gene expression” is considered to be present when there is at least an about two-fold, typically at least about four-fold, more typically at least about six ⁇ fold, most typically at least about ten-fold difference between, e.g., the expression of a given gene in normal and diseased subjects, in various stages of disease development in a diseased subject, different types or subtypes of diseased cells, etc.
  • expression refers to the process by which mRNA or a polypeptide is produced based on the nucleic acid sequence of a gene, i.e., "expression” also includes the formation of mRNA in the process of transcription.
  • the term “determining the expression level” refers to the determination of the level of expression of one or more markers.
  • genotype refers to a description of the alleles of a gene or genes contained in an individual or a sample. As used herein, no distinction is made between the genotype of an individual and the genotype of a sample originating from the individual. Although, typically, a genotype is determined from samples of diploid cells, a genotype can be determined from a sample of haploid cells, such as a sperm cell.
  • gene refers to a nucleic acid sequence encoding a gene product.
  • the gene optionally comprises sequence information required for expression of the gene (e.g., promoters, enhancers, etc.).
  • gene expression data refers to one or more sets of data that contain information regarding different aspects of gene expression.
  • the data set optionally includes information regarding: the presence of target-transcripts in cell or cell- derived samples; the relative and absolute abundance levels of target transcripts; the ability of various treatments to induce expression of specific genes; and the ability of various treatments to change expression of specific genes to different levels.
  • Such conditions are, for example, hybridization in 6x SSC, pH 7.0 / 0.1 % SDS at about 45°C for 18-23 hours, followed by a washing step with 2x SSC/1 % SDS at 50°C.
  • the salt concentration in the washing step can, for example, be chosen between 2x SSC/0.1 % SDS at room temperature for low stringency and 0.2x SSC/0.1 % SDS at 50°C for high stringency.
  • the temperature of the washing step can be varied between room temperature (ca. 22°C), for low stringency, and 65°C to 7O 0 C for high stringency.
  • polynucleotides that hybridize at lower stringency hybridization conditions.
  • Changes in the stringency of hybridization and signal detection are primarily accomplished through the manipulation of, e.g., formamide concentration (lower percentages of formamide result in lowered stringency), salt conditions, or temperature.
  • washes performed following stringent hybridization can be done at higher salt concentrations (e.g., 5x SSC).
  • Variations in the above conditions may be accomplished through the inclusion and/or substitution of alternate blocking reagents used to suppress background in hybridization experiments.
  • the inclusion of specific blocking reagents may require modification of the hybridization conditions described herein, due to problems with compatibility. An extensive guide to the hybridization of nucleic acids is found in
  • label refers to a moiety attached (covalently or non-covalently), or capable of being attached, to a molecule (e.g., a polynucleotide, a polypeptide, etc.), which moiety provides or is capable of providing information about the molecule (e.g., descriptive, identifying, etc. information about the molecule) or another molecule with which the labeled molecule interacts (e.g., hybridizes, etc.).
  • a molecule e.g., a polynucleotide, a polypeptide, etc.
  • information about the molecule e.g., descriptive, identifying, etc. information about the molecule
  • another molecule with which the labeled molecule interacts e.g., hybridizes, etc.
  • Exemplary labels include fluorescent labels (including, e.g., quenchers or absorbers), non-fluorescent labels, colorimetric labels, chemiluminescent labels, bioluminescent labels, radioactive labels (such as 3 H, 35 S, 32 P, 125 1, 57 Co or 14 C), mass-modifying groups, antibodies, antigens, biotin, haptens, digoxigenin, enzymes (including, e.g., peroxidase, phosphatase, etc.), and the like.
  • fluorescent labels including, e.g., quenchers or absorbers
  • non-fluorescent labels include colorimetric labels, chemiluminescent labels, bioluminescent labels, radioactive labels (such as 3 H, 35 S, 32 P, 125 1, 57 Co or 14 C), mass-modifying groups, antibodies, antigens, biotin, haptens, digoxigenin, enzymes (including, e.g., peroxidase, phosphatase, etc.), and the like.
  • fluorescent labels may include dyes that are negatively charged, such as dyes of the fluorescein family, or dyes that are neutral in charge, such as dyes of the rhodamine family, or dyes that are positively charged, such as dyes of the cyanine family.
  • Dyes of the fluorescein family include, e.g., FAM, HEX, TET, JOE, NAN and ZOE.
  • Dyes of the rhodamine family include, e.g., Texas Red, ROX, Rl 10, R6G, and TAMRA.
  • FAM, HEX, TET, JOE, NAN, ZOE, ROX, Rl 10, R6G, and TAMRA are commercially available from, e.g., Perkin-Elmer, Inc. (Wellesley, MA, USA), and Texas Red is commercially available from, e.g., Molecular Probes, Inc. (Eugene,
  • Dyes of the cyanine family include, e.g., Cy2, Cy3, Cy3.5, Cy5, Cy5.5, and Cy7, and are commercially available from, e.g., Amersham Biosciences Corp. (Piscataway, NJ, USA). Suitable methods include the direct labeling (incorporation) method, an amino-modified (amino-allyl) nucleotide method (available e.g. from Ambion, Inc. (Austin, TX, USA), and the primer tagging method (DNA dendrirner labeling, as kit available e.g. from Genisphere, Inc. (Hatfield, PA, USA)).
  • biotin or biotinylated nucleotides are used for labeling, with the latter generally being directly incorporated into, e.g., the cRNA polynucleotide by in vitro transcription.
  • the term “lower expression” refers an expression level of one or more markers from a target that is less than a corresponding expression level of the markers in a reference. In certain embodiments, "lower expression” is assigned to all by numbers and Affymetrix Id. definable polynucleotides the t-values and fold change (fc) values of which are negative.
  • the term “higher expression” refers an expression level of one or more markers from a target that is more than a corresponding expression level of the markers in a reference.
  • “higher expression” is assigned to all by numbers and Affymetrix Id. definable polynucleotides the t-values and fold change (fc) values of which are positive.
  • a “machine learning algorithm” refers to a computational-based prediction methodology, also known to persons skilled in the art as a “classifier”, employed for characterizing a gene expression profile.
  • the signals corresponding to certain expression levels which are obtained by, e.g., microarray-based hybridization assays, are typically subjected to the algorithm in order to classify the expression profile.
  • Supervised learning generally involves "training” a classifier to recognize the distinctions among classes and then “testing" the accuracy of the classifier on an independent test set. For new, unknown samples the classifier can be used to predict the class in which the samples belong.
  • markers refers to a genetically controlled difference that can be used in the genetic analysis of a test or target versus a control or reference sample for the purpose of assigning the sample to a defined genotype or phenotype.
  • markers refer to genes, polynucleotides, polypeptides, or fragments or portions thereof that are differentially expressed in, e.g., different leukemia types and/or subtypes.
  • the markers can be defined by their gene symbol name, their encoded protein name, their transcript identification number (cluster identification number), the data base accession number, public accession number and/or GenBank identifier. Markers can also be defined by their
  • Affymetrix identification number chromosomal location, UniGene accession number and cluster type, and/or LocusLink accession number.
  • the Affymetrix identification number (affy id) is accessible for anyone and the person skilled in the art by entering the "gene expression omnibus" internet page of the National Center for Biotechnology Information (NCBI) on the world wide web at ncbi.nlm.nih.gov/geo/ as of 11/4/2004.
  • NCBI National Center for Biotechnology Information
  • the affy id's of the polynucleotides used for certain embodiments of the methods described herein are derived from the so-called human genome Ul 33 chip (Affymetrix, inc., Santa Clara, CA, USA).
  • sequence data of each identification number can be viewed on the world wide web at, e.g., ncbi.nlm.nih.gov/projects/geo/ as of 11/4/2004 using the accession number GPL96 for U133A annotational data and accession number GPL97 for U133B annotational data.
  • the expression level of a marker is determined by the determining the expression of its corresponding polynucleotide.
  • normal karyotype refers to a state of those cells lacking any visible karyotype abnormality detectable with chromosome banding analysis.
  • nucleic acid refers to a polymer of monomers that can be corresponded to a ribose nucleic acid (RNA) or deoxyribose nucleic acid (DNA) polymer, or analog thereof. This includes polymers of nucleotides such as RNA and DNA, as well as modified forms thereof, peptide nucleic acids (PNAs), locked nucleic acids (LN ATMs), and the like.
  • the nucleic acid can be a polymer that includes multiple monomer types, e.g., both RNA and DNA subunits.
  • a nucleic acid can be or include, e.g., a chromosome or chromosomal segment, a vector (e.g., an expression vector), an expression cassette, a naked DNA or RNA polymer, the product of a polymerase chain reaction (PCR) or other nucleic acid amplification reaction, an oligonucleotide, a probe, a primers, etc.
  • a nucleic acid can be e.g., single-stranded or double-stranded. Unless otherwise indicated, a particular nucleic acid sequence optionally comprises or encodes complementary sequences, in addition to any sequence explicitly indicated.
  • Oligonucleotides e.g., probes, primers, etc.
  • Oligonucleotides of a defined sequence may be produced by techniques known to those of ordinary skill in the art, such as by chemical or biochemical synthesis, and by in vitro or in vivo expression from recombinant nucleic acid molecules, e.g., bacterial or retroviral vectors.
  • Oligonucleotides which are primer and/or probe sequences, as described below, may comprise DNA, RNA or nucleic acid analogs such as uncharged nucleic acid analogs including but not limited to peptide nucleic acids (PNAs) which are disclosed in International Patent Application WO 92/20702 or morpholino analogs which are described in U.S. Pat. Nos. 5,185,444, 5,034,506, and 5,142,047 all of which are incorporated by reference.
  • PNAs peptide nucleic acids
  • Such sequences can routinely be synthesized using a variety of techniques currently available. For example, a sequence of DNA can be synthesized using conventional nucleotide phosphoramidite chemistry and the instruments available from Applied Biosystems, Inc, (Foster City, CA, USA);
  • a nucleic acid, nucleotide, polynucleotide or oligonucleotide can comprise the five biologically occurring bases (adenine, guanine, thymine, cytosine and uracil) and/or bases other than the five biologically occurring bases.
  • a polynucleotide of the invention can contain one or more modified, non-standard, or derivatized base moieties, including, but not limited to, N 6 -methyl-adenine, N 6 -tert-butyl-benzyl-adenine, imidazole, substituted imidazoles, 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5- iodouracil, hypoxanthine, xanthine, 4-acetyl cytosine, 5-(carboxyhydroxymethyl)uracil, 5-carboxymethylarninomethyl-2-thiouridine,
  • modified, non-standard, or dervatized base moieties may be found in U.S. Patent Nos. 6,001,611, 5,955,589, 5,844,106, 5,789,562, 5,750,343, 5,728,525, and 5,679,785, each of which is incorporated by reference.
  • nucleic acid, nucleotide, polynucleotide or oligonucleotide can comprise one or more modified sugar moieties including, but not limited to, arabinose, 2-fluoroarabinose, xylulose, and hexose.
  • a nucleic acid, nucleotide, polynucleotide or oligonucleotide can comprise phosphodiester linkages or modified linkages including, but not limited to phosphotriester, phosphoramidate, siloxane, carbonate, carboxymethylester, acetamidate, carbamate, thioether, bridged phosphoramidate, bridged methylene phosphonate, phosphorothioate, methylphosphonate, phosphorodithioate, bridged phosphorothioate or sulfone linkages, and combinations of such linkages.
  • polynucleotide refers to a DNA, in particular cDNA, or RNA, in particular a cRNA, or a portion thereof. In the case of RNA (or cDNA), the polynucleotide is formed upon transcription of a nucleotide sequence that is capable of expression.
  • Polynucleotide fragments refer to fragments of between at least 8, such as 10, 12, 15 or 18 nucleotides and at least 50, such as 60, 80, 100, 200 or 300 nucleotides in length, or a complementary sequence thereto, e.g., representing a consecutive stretch of nucleotides of a gene, cDNA or mRNA.
  • polynucleotides also include any fragment (or complementary sequence thereto) of a sequence corresponding to or derived from any of the markers defined herein.
  • probe refers to an oligonucleotide having a hybridization specificity sufficient for the initiation of an enzymatic polymerization under predetermined conditions, for example in an amplification technique such as polymerase chain reaction (PCR), in a process of sequencing, in a method of reverse transcription and the like.
  • probe refers to an oligonucleotide having a hybridization specificity sufficient for binding to a defined target sequence under predetermined conditions, for example in an amplification technique such as a 5'-nuclease reaction, in a hybridization-dependent detection method, such as a Southern or Northern blot, and the like.
  • probes correspond at least in part to selected markers.
  • Primers and probes may be used in a variety of ways and may be defined by the specific use.
  • a probe can be immobilized on a solid support by any appropriate means, including, but not limited to: by covalent bonding, by adsorption, by hydrophobic and/or electrostatic interaction, or by direct synthesis on a solid support (see in particular patent application WO 92/10092).
  • a probe may be labeled by means of a label chosen, for example, from radioactive isotopes, enzymes, in particular enzymes capable of acting on a chromogenic, fluorescent or luminescent substrate (in particular a peroxidase or an alkaline phosphatase), chromophoric chemical compounds, chromogenic, fluori genie or luminescent compounds, analogues of nucleotide bases, and ligands such as biotin.
  • a label chosen, for example, from radioactive isotopes, enzymes, in particular enzymes capable of acting on a chromogenic, fluorescent or luminescent substrate (in particular a peroxidase or an alkaline phosphatase), chromophoric chemical compounds, chromogenic, fluori genie or luminescent compounds, analogues of nucleotide bases, and ligands such as biotin.
  • Illustrative fluorescent compounds include, for example, fluorescein, carboxyfluorescein, tetrachlorofluorescein, hexachlorofluorescein, Cy3, tetramethylrhodamine, Cy3.5, carboxy-x-rhodamine, Texas Red, Cy5, and Cy5.5.
  • Illustrative luminescent compounds include, for example, luciferin and 2,3- dihydrophthalazinediones, such as luminol. Other suitable labels are described herein or are otherwise known to those of skill in the art.
  • Oligonucleotides may be modified with chemical groups to enhance their performance or to facilitate the characterization of amplification products.
  • chemical groups e.g., backbone-modified oligonucleotides such as those having phosphorothioate or methylphosphonate groups which render the oligonucleotides resistant to the nucleolytic activity of certain polymerases or to nuclease enzymes may allow the use of such enzymes in an amplification or other reaction.
  • Non-nucleotide linkers e.g., Arnold, et al., "Non- Nucleotide Linking Reagents for Nucleotide Probes", EP 0 313 219, which is incorporated by reference
  • Amplification oligonucleotides may also contain mixtures of the desired modified and natural nucleotides.
  • a “reference” in the context of gene expression profiling refers to a cell and/or genes in or derived from the cell (or data derived therefrom) relative to which a target is compared. In some embodiments, for example, the expression of one or more genes from a target cell is compared to a corresponding expression of the genes in or derived from a reference cell.
  • sample refers to any biological material containing genetic information in the form of nucleic acids or proteins obtainable or obtained from one or more subjects or individuals.
  • samples are derived from subjects having leukemia, e.g., AML.
  • Exemplary samples include tissue samples, cell samples, bone marrow, and/or bodily fluids such as blood, saliva, semen, urine, and the like. Methods of obtaining samples and of isolating nucleic acids and proteins from sample are generally known to persons of skill in the art.
  • a "set” refers to a collection of one or more things. For example, a set may include 1, 2, 3, 4, 5, 10, 20, 50, 100, 1,000 or another number of genes or other types of molecules.
  • a “solid support” refers to a solid material that can be derivatized with, or otherwise attached to, a chemical moiety, such as an oligonucleotide probe or the like.
  • Exemplary solid supports include plates (e.g., multi-well plates, etc.), beads, microbeads, tubes, fibers, whiskers, combs, hybridization chips (including microarray substrates, such as those used in GeneChip® probe arrays (Affymetrix, Inc., Santa Clara, CA, USA) and the like), membranes, single crystals, ceramic layers, self-assembling monolayers, and the like.
  • “Specifically binding” means that a compound is capable of discriminating between two or more polynucleotides or polypeptides.
  • the compound binds to the desired polynucleotide or polypeptide, but essentially does not bind to a non-target polynucleotide or polypeptide.
  • the compound can be an antibody, or a fragment thereof, an enzyme, a so-called small molecule compound, a protein- scaffold (e.g., an anticalin).
  • a "subject” refers to an organism.
  • the organism is a mammalian organism, particularly a human organism.
  • substantially identical in the context of gene expression refers to levels of expression of two or more genes that are approximately equal to one another. In some embodiments, for example, the expression levels of two or more genes are substantially identical to one another when they differ by less than about 5% (e.g., about 4%, about 3%, about 2%, about 1%, etc.).
  • t(15;17) refers to AML with translocation (15; 17) according to the WHO classification of haematological malignancies.
  • t(8;21) refers to AML with translocation (8;21) according to the WHO classification of haematological malignancies.
  • targets refers to an object that is the subject of analysis.
  • targets are specific nucleic acid sequences (e.g., mRNAs of expressed genes, etc.), the presence, absence or abundance of which are to be determined.
  • targets include polypeptides (e.g., proteins, etc.) of expressed genes.
  • sequences subjected to analysis are in or derived from "target cells", such as a particular type of leukemia cell.
  • the present invention provides methods, reagents, systems, and kits for detecting and genotyping leukemia.
  • methods are provided for genotyping acute leukemia cells with t(l Iq23)/MLL.
  • certain methods described herein include detecting an expression level of a set of genes in or derived from a target human acute leukemia cell, e.g., obtained from a subject.
  • the set of genes is generally selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14.
  • these methods also include:
  • the reference human acute leukemia cell lacking t(l Iq23)/MLL is a precursor B-ALL cell with t(9;22), a precursor B-ALL cell with t(8;14), a precursor T-ALL cell (Table 13), an AML cell with t(8;21), an AML cell with t(15;17), an AML cell with inv(16), or an AML cell with a complex aberrant karyotype (Table 14).
  • Other aspects of the invention include methods for distinguishing acute myeloid leukemia (AML) cells with t(l Iq23)/MLL from acute lymphoblastic leukemia (ALL) cells with t(l Iq23)/MLL.
  • Tables 15-20 are versions of the probe lists provided in Tables 8-10 that support the statistical data provided therein with annotations. More specifically, Table 15 annotates the top 50-downregulated or lower expressed genes in ALL with 11 q23 that are listed in Table 8, whereas Table 16 annotates the top 50-upregulated or higher expressed genes in ALL with 1 Iq23 that are provided in Table 8.
  • Table 17 annotates the lower expressed genes in AML with 1 Iq23 that are listed in Table 9, while Table 18 annotates the higher expressed genes in AML with 1 Iq23 that are provided in Table 9.
  • Table 19 annotates the lower expressed genes in 1 Iq23 leukemias that are listed in Table 10, while Table 20 annotates the higher expressed genes in 1 Iq23 leukemias that are provided in Table 10.
  • kits and systems are also described further below.
  • Samples are collected and prepared for analysis using essentially any technique known to those of skill in the art.
  • blood samples are obtained from subjects via venipuncture.
  • Whole blood specimens are optionally collected in EDTA, Heparin or ACD vacutainer tubes.
  • the samples utilized for analysis comprise bone marrow aspirates, which are optionally processed, e.g., by erythrocyte lysis techniques, Ficoll density gradient centrifugations, or the like. Samples are typically either analyzed immediately following acquisition or stored frozen at, e.g., -8O 0 C until being subjected to analysis.
  • the cells lines or sources containing the target nucleic acids and/or expression products thereof are optionally subjected to one or more specific treatments that induce changes in gene expression, e.g., as part of processes to identify candidate modulators of gene expression.
  • a cell or cell line can be treated with or exposed to one or more chemical or biochemical constituents, e.g., pharmaceuticals, pollutants, DNA damaging agents, oxidative stress-inducing agents, pH-altering agents, membrane-disrupting agents, metabolic blocking agent, a chemical inhibitors, cell surface receptor ligands, antibodies, transcription promoters/enhancers/inhibitors, translation promoters/enhancers/inhibitors, protein- stabilizing or destabilizing agents, various toxins, carcinogens or teratogens, characterized or uncharacterized chemical libraries, proteins, lipids, or nucleic acids.
  • chemical or biochemical constituents e.g., pharmaceuticals, pollutants, DNA damaging agents, oxidative stress-inducing agents, pH-altering agents, membrane
  • the treatment comprises an environmental stress, such as a change in one or more environmental parameters including, but not limited to, temperature (e.g. heat shock or cold shock), humidity, oxygen concentration (e.g., hypoxia), radiation exposure, culture medium composition, or growth saturation.
  • environmental stress such as a change in one or more environmental parameters including, but not limited to, temperature (e.g. heat shock or cold shock), humidity, oxygen concentration (e.g., hypoxia), radiation exposure, culture medium composition, or growth saturation.
  • Responses to these treatments may be followed temporally, and the treatment can be imposed for various times and at various concentrations.
  • Target sequences can also be derived from cells exposed to multiple specific treatments as described above, either concurrently or in tandem (e.g., a cancerous cell or tissue sample may be further exposed to a DNA damaging agent while grown in an altered medium composition).
  • total RNA is isolated from samples for use as target sequences.
  • Cellular samples are lysed once culture with or without the treatment is complete by, for example, removing growth medium and adding a guanidinium- based lysis buffer containing several components to stabilize the RNA.
  • the lysis buffer also contains purified RNAs as controls to monitor recovery and stability of RNA from cell cultures. Examples of such purified RNA templates include the Kanamycin Positive Control RNA from Promega (Madison, WI, USA), and 7.5 kb Poly(A)-Tailed RNA from Life Technologies (Rockville, MD, USA). Lysates may be used immediately or stored frozen at, e.g., -8O 0 C.
  • RNA is purified from cell lysates (or other types of samples) using silica-based isolation in an automation-compatible, 96-well format, such as the Rneasy® purification platform (Qiagen, Inc. (Valencia, CA, USA)).
  • RNA is isolated using solid-phase oligo-dT capture using oligo-dT bound to microbeads or cellulose columns. This method has the added advantage of isolating mRNA from genomic DNA and total RNA, and allowing transfer of the mRNA-capture medium directly into the reverse transcriptase reaction.
  • Other RNA isolation methods are contemplated, such as extraction with silica-coated beads or guanidinium. Further methods for RNA isolation and preparation can be devised by one skilled in the art.
  • RNAse inhibitors are optionally added to the crude samples.
  • genomic DNA could contribute one or more copies of target sequence, depending on the sample.
  • the signal arising from genomic DNA may not be significant.
  • the background can be eliminated by treating the samples with DNAse, or by using primers that target splice junctions.
  • One skilled in the art can design a variety of specialized priming applications that would facilitate use of crude extracts as samples for the purposes of this invention.
  • the determination of gene expression levels may be effected at the transcriptional and/or translational level, i.e., at the level of mRNA or at the protein level.
  • any method of gene expression profiling can be used or adapted for use in performing the methods described herein including, e.g., methods based on hybridization analysis of polynucleotides, and methods based on sequencing of polynucleotides.
  • RNAse protection assays Hod, Biotechniques 13:852-854 (1992)
  • RT-PCR reverse transcription polymerase chain reaction
  • antibodies may be employed that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes.
  • sequencing-based gene expression analysis includes Serial Analysis of Gene Expression (SAGE), and gene expression analysis by massively parallel signature sequencing (MPSS).
  • SAGE Serial Analysis of Gene Expression
  • MPSS massively parallel signature sequencing
  • molecular species such as antibodies, aptamers, etc. that can specifically bind to proteins or fragments thereof are used for analysis (see, e.g., Beilharz et al., Brief Funct Genomic Proteomic 3(2):1O3-111 (2004)).
  • the methods described herein include determining the expression levels of transcribed polynucleotides.
  • the transcribed polynucleotide is an mRNA, a cDNA and/or a cRNA. Transcribed polynucleotides are typically isolated from a sample, reverse transcribed and/or amplified, and labeled by techniques referred to above or otherwise known to persons skilled in the art.
  • the methods of the invention generally include hybridizing transcribed polynucleotides to a complementary polynucleotide, or a portion thereof, under a selected hybridization condition (e.g., a stringent hybridization condition), as described herein.
  • a selected hybridization condition e.g., a stringent hybridization condition
  • the detection and quantification of amounts of polynucleotides to determine the level of expression of a marker are performed according to those described by, e.g., Sambrook et al., supra, or real time methods known in the art as 5'-nuclease methods disclosed in, e.g., WO 92/02638, U.S. Pat. No. 5,210,015, U.S. Pat.
  • 5 '-nuclease methods utilize the exonuclease activity of certain polymerases to generate signals.
  • target nucleic acids are detected in processes that include contacting a sample with an oligonucleotide containing a sequence complementary to a region of the target nucleic acid component and a labeled oligonucleotide containing a sequence complementary to a second region of the same target nucleic acid component sequence strand, but not including the nucleic acid sequence defined by the first oligonucleotide, to create a mixture of duplexes during hybridization conditions, wherein the duplexes comprise the target nucleic acid annealed to the first oligonucleotide and to the labeled oligonucleotide such that the
  • 3 '-end of the first oligonucleotide is adjacent to the 5'-end of the labeled oligonucleotide. Then this mixture is treated with a template-dependent nucleic acid polymerase having a 5' to 3' nuclease activity under conditions sufficient to permit the to 3' nuclease activity of the polymerase to cleave the annealed, labeled oligonucleotide and release labeled fragments. The signal generated by the hydrolysis of the labeled oligonucleotide is detected and/or measured. 5'-nuclease technology eliminates the need for a solid phase bound reaction complex to be formed and made detectable.
  • exemplary methods include, e.g., fluorescence resonance energy transfer between two adjacently hybridized probes as used in the LightCycler® format described in, e.g., U.S. Pat. No. 6,174,670, which is incorporated by reference.
  • the marker i.e., the polynucleotide
  • the marker is in form of a transcribed nucleotide, where total RNA is isolated, cDNA and, subsequently, cRNA is synthesized and biotin is incorporated during the transcription reaction.
  • the purified cRNA is applied to commercially available arrays that can be obtained from, e.g., Affymetrix, Inc. (Santa Clara, CA USA).
  • the hybridized cRNA is optionally detected according to the methods described in the examples provided below.
  • the arrays are produced by photolithography or other methods known to persons skilled in the art. Some of these techniques are also described in, e.g. U.S. Pat. No. 5,445,934, U.S. Pat. No. 5,744,305, U.S. Pat. No. 5,700,637, U.S. Pat. No. 5,945,334, EP 0 619 321, and EP 0 373 203, which are each incorporated by reference.
  • the polynucleotide or at least one of the polynucleotides is in form of a polypeptide (e.g., expressed from the corresponding polynucleotide).
  • the expression level of the polynucleotides or polypeptides is optionally detected using a compound that specifically binds to target polynucleotides or target polypeptides.
  • Some of the earliest expression profiling methods are based on the detection of a label in RNA hybrids or protection of RNA from enzymatic degradation (see, e.g.,
  • Methods based on detecting hybrids include northern blots and slot/dot blots. These two techniques differ in that the components of the sample being analyzed are resolved by size in a northern blot prior to detection, which enables identification of more than one species simultaneously.
  • Slot blots are generally carried out using unresolved mixtures or sequences, but can be easily performed in serial dilution, enabling a more quantitative analysis.
  • In situ hybridization is a technique that monitors transcription by directly visualizing RNA hybrids in the context of a whole cell. This method provides information regarding subcellular localization of transcripts (see, e.g., Suzuki et al.,
  • RNAse protection assays employ a labeled nucleic acid probe, which is hybridized to the RNA species being analyzed, followed by enzymatic degradation of single-stranded regions of the probe. Analysis of the amount and length of probe protected from degradation is used to determine the quantity and endpoints of the transcripts being analyzed.
  • RT-PCR Reverse Transcriptase PCR
  • Real-Time Detection RT-PCR can be used to compare, e.g., mRNA levels in different sample populations, in normal and tumor tissues, with or without drug treatment, to characterize patterns of gene expression, to discriminate between closely related mRNAs, and to analyze RNA structure.
  • assays are derivatives of PCR in which amplification is preceded by reverse transcription of mRNA into cDNA. Accordingly, an initial step in these processes is generally the isolation of mRNA from a target sample (e.g., leukemia cells).
  • a target sample e.g., leukemia cells
  • the starting material is typically total RNA isolated from cancerous tissues or cells (e.g., bone marrow, peripheral blood aliquots, etc.), and in certain embodiments, from corresponding normal tissues or cells.
  • cancerous tissues or cells e.g., bone marrow, peripheral blood aliquots, etc.
  • mRNA extraction are well known in the art and are disclosed in standard textbooks of molecular biology, including Ausubel et al., supra. Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker, Lab Invest. 56:A67 (1987), and De Andres et al., BioTechniques 18:42044 (1995), which are each incorporated by reference.
  • RNA isolation can be performed using purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions.
  • total RNA from cells in culture can be isolated using Qiagen Rneasy® mini-columns (referred to above).
  • Other commercially available RNA isolation kits include MasterPureTM Complete DNA and RNA Purification Kit (EPICENTRETM, Madison, Wis.), and Paraffin Block
  • RNA Isolation Kit (Ambion, Inc.). Total RNA from tissue samples can be isolated using RNA Stat-60 (Tel-Test). RNA prepared from tumor can be isolated, for example, by cesium chloride density gradient centrifugation.
  • RNA generally cannot serve as a template for PCR, the process of gene expression profiling by RT-PCR typically includes the reverse transcription of the
  • RNA template into cDNA followed by its exponential amplification in a PCR reaction.
  • Two commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MMLV-RT).
  • AMV-RT avilo myeloblastosis virus reverse transcriptase
  • MMLV-RT Moloney murine leukemia virus reverse transcriptase
  • the reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the particular circumstances of expression profiling analysis. For example, extracted
  • RNA can be reverse-transcribed using a GeneAmp RNA PCR kit (Perkin Elmer, CA, USA), following the manufacturer's instructions. The derived cDNA can then be used as a template in the subsequent PCR reaction.
  • GeneAmp RNA PCR kit Perkin Elmer, CA, USA
  • the PCR step can use a variety of thermostable DNA-dependent DNA polymerases, it typically employs the Taq DNA polymerase, which has a 5'-3' nuclease activity but lacks a 3'-5' proofreading endonuclease activity.
  • TaqMan® PCR typically utilizes the 5'-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5' nuclease activity can be used.
  • Pairs of primers are generally used to generate amplicons in PCR reactions.
  • a third oligonucleotide, or probe is designed to bind to nucleotide sequence located between PCR primer pairs.
  • Probe are generally non-extendible by Taq DNA polymerase enzyme, and are typically labeled with, e.g., a reporter fluorescent dye and a quencher fluorescent dye. Laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together, such as in an intact probe.
  • the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner. The resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore.
  • One molecule of reporter dye is typically liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data.
  • TaqMan® RT-PCR can be performed using commercially available equipment, such as, for example, a LightCycler® system (Roche Molecular Biochemicals, Mannheim, Germany) or an ABI PRISM 7700TM Sequence Detection SystemTM (Perkin-Elmer-Applied Biosystems, Foster City, CA, USA).
  • RT-PCR is typically performed using an internal standard.
  • An ideal internal standard is expressed at a relatively constant level among different cells or tissues, and is unaffected by the experimental treatment.
  • Exemplary RNAs frequently used to normalize patterns of gene expression are mRNAs transcribed from for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and ⁇ - actin.
  • GPDH glyceraldehyde-3-phosphate-dehydrogenase
  • exemplary methods for targeted mRNA analysis include differential display reverse transcriptase PCR (DDRT-PCR) and RNA arbitrarily primed PCR (RAP- PCR) (see, e.g., U.S. Patent No. 5,599,672; Liang and Pardee (1992) Science
  • the 5' and 3' termini of molecular beacons collectively comprise a pair of moieties, which confers the detectable properties of the molecular beacon.
  • One of the termini is attached to a fluorophore and the other is attached to a quencher molecule capable of quenching a fluorescent emission of the fluorophore.
  • a fluorophore such as EDANS or fluorescein, e.g., on the 5'-end and a quencher, such as Dabcyl, e.g., on the 3'-end.
  • the stem of the molecular beacon is stabilized by complementary base pairing.
  • This self-complementary pairing results in a "hairpin loop" structure for the molecular beacon in which the fluorophore and the quenching moieties are proximal to one another. In this confirmation, the fluorescent moiety is quenched by the quenching moiety.
  • the loop of the molecular beacon typically comprises the oligonucleotide probe and is accordingly complementary to a sequence to be detected in the target microbial nucleic acid, such that hybridization of the loop to its complementary sequence in the target forces disassociation of the stem, thereby distancing the fluorophore and quencher from each other. This results in unquenching of the fluorophore, causing an increase in fluorescence of the molecular beacon.
  • kits which utilize molecular beacons are also commercially available, such as the SentinelTM Molecular Beacon Allelic Discrimination Kits from Stratagene (La Jolla, CA, USA) and various kits from
  • oligonucleotides e.g., microarrays
  • polynucleotide sequences of interest e.g., probes, such as cDNAs, mRNAs, oligonucleotides, etc.
  • probes such as cDNAs, mRNAs, oligonucleotides, etc.
  • microchip substrate or other type of solid support
  • Sequences of interest can be obtained, e.g., by creating a cDNA library from an mRNA source or by using publicly available databases, such as GenBank, to annotate the sequence information of custom cDNA libraries or to identify cDNA clones from previously prepared libraries.
  • the arrayed sequences are then hybridized with target nucleic acids from cells or tissues of interest.
  • the source of mRNA typically is total RNA isolated from a sample.
  • high-density oligonucleotide arrays are produced using a light-directed chemical synthesis process (i.e., photolithography). Unlike common cDNA arrays, oligonucleotide arrays (according, e.g., to the Affymetrix technology) typically use a single-dye technology. Given the sequence information of the probes or markers, the sequences are typically synthesized directly onto the array, thus, bypassing the need for physical intermediates, such as PCR products, commonly utilized in making cDNA arrays.
  • markers, or partial sequences thereof can be represented by, e.g., between about 14 to 20 features, typically by less then 14 features, more typically less then about 10 features, even more typically by about 6 features or less, with each feature generally being a short sequence of nucleotides (oligonucleotide), which is typically a perfect match (PM) to a segment of the respective gene.
  • oligonucleotide typically a perfect match (PM) to a segment of the respective gene.
  • the PM oligonucleotides are paired with mismatch (MM) oligonucleotides, which have a single mismatch at the central base of the nucleotide and are used as "controls".
  • the chip exposure sites are typically defined by masks and are de-protected by the use of light, followed by a chemical coupling step resulting in the synthesis of one nucleotide.
  • the masking, light deprotection, and coupling process can then be repeated to synthesize the next nucleotide, until the nucleotide chain is of the specified length.
  • PCR amplified inserts of cDNA clones are applied to a substrate in a dense array. In some embodiments, for example, at least 10,000 different cDNA probe sequences are applied to a given solid support.
  • Fluorescently labeled cDNA targets may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from the samples of interest. Labeled cDNA targets applied to the chip hybridize with corresponding probes on the array. After washing (e.g., under stringent conditions) to remove non-specifically bound probes, the chip is typically scanned by confocal laser microscopy or by another detection method, such as a CCD camera. Quantitation of hybridization of each arrayed element allows for assessment of corresponding mRNA abundance. With dual color fluorescence, for example, separately labeled cDNA probes generated from two sources of RNA can be hybridized concurrently to the arrayed probes.
  • the relative abundance of the transcripts from the two sources corresponding to each specified gene can thus be determined simultaneously.
  • the miniaturized scale of the hybridization affords a convenient and rapid evaluation of the expression pattern for large numbers of genes. Such methods have been shown to have the sensitivity required to detect rare transcripts, which are expressed at a few copies per cell, and to reproducibly detect at least approximately two-fold differences in the expression levels (Schena et al., Proc. Natl. Acad. Sci. USA 93(2):106-149 (1996), which is incorporated by reference).
  • Other microarray-based assay formats are also optionally utilized. Microarray analysis can be performed using commercially available equipment, following manufacturer's protocols, such as by using the Affymetrix GeneChip® technology, or Agilent's microarray technology.
  • cDNA may be prepared into which a detectable label, as exemplified herein, is incorporated.
  • labeled cDNA in single-stranded form, may then be hybridized (e.g., under stringent or highly stringent conditions) to a panel of single-stranded oligonucleotides representing different genes and affixed to a solid support, such as a chip.
  • those cDNAs that have a counterpart in the oligonucleotide panel or array will be detected (e.g., quantitatively detected).
  • Various advantageous embodiments of this general method are feasible.
  • mRNA or cDNA may be amplified, e.g., by a polymerase chain reaction or another nucleic acid amplification technique.
  • cDNAs are transcribed into cRNAs prior to hybridization steps in a given assay.
  • labels can be attached or incorporated cRNAs during or after the transcription step.
  • one exemplary embodiment of the methods of the invention includes, as follows (1) obtaining a sample, e.g. bone marrow or peripheral blood aliquots, from a patient; (2) extracting RNA, e.g., mRNA, from the sample; (3) reverse transcribing the RNA into cDNA; (4) in vitro transcribing the cDNA into cRNA; (5) fragmenting the cRNA; (6) hybridizing the fragmented cRNA on selected microarrays (e.g., the HG-U133 microarray set available from Affymetrix, Inc. (Santa Clara, CA USA)); and (7) detecting hybridization.
  • a sample e.g. bone marrow or peripheral blood aliquots
  • RNA e.g., mRNA
  • Serial analysis of gene expression is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need for providing an individual hybridization probe for each transcript.
  • a short sequence tag e.g., about 10-14 bp
  • many transcripts are linked together to form long serial molecules, that can be sequenced, revealing the identity of the multiple tags simultaneously.
  • the expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag.
  • SAGE-based assays are also described in, e.g. Velculescu et al., Science 270:484- 487 (1995) and Velculescu et al., Cell 88:243-51 (1997), which are both incorporated by reference.
  • a microbead library of DNA templates is constructed by in vitro cloning. This is generally followed by the assembly of a planar array of the template-containing microbeads in a flow cell at a high density (typically greater than 3 x 10 6 microbeads/cm 2 ). The free ends of the cloned templates on each microbead are analyzed simultaneously, using a fluorescence- based signature sequencing method that does not require DNA fragment separation. This method can be used to simultaneously and accurately provide, in a single operation, hundreds of thousands of gene signature sequences from cDNA libraries. MPSS is also described in, e.g., Brenner et al., (2000) Nature BiotechnoloRv 18:630-634, which is incorporated by reference.
  • Immunoassays and proteomics Essentially any available technique for the detection of proteins is optionally utilized in the methods of the invention. Exemplary protein analysis technologies include, e.g., one- and two-dimensional SDS-P AGE-based separation and detection, immunoassays (e.g., western blotting, etc.), aptamer-based detection, mass spectrometric detection, and the like. These and other techniques are generally well-known in the art.
  • antibodies or antisera e.g., polyclonal antisera
  • monoclonal antibodies specific for particular targets are used to detect expression.
  • antibodies are directly labeled, e.g., with radioactive labels, fluorescent labels, haptens, chemiluminescent dyes, enzyme substrates or co-factors, enzyme inhibitors, free radicals, enzymes (e.g., horseradish peroxidase or alkaline phosphatase), or the like.
  • Such labeled reagents may be used in a variety of well known assays, such as radioimmunoassays, enzyme immunoassays, e.g., ELISA, fluorescent immunoassays, and the like. See, e.g., U.S. Pat. Nos. 3,766,162;
  • the binding of proteins indicative of a given leukemia type or subtype is optionally verified by binding to a detectably labeled secondary antibody or aptamer.
  • the labeling of antibodies is also described in, e.g., Harlow and Lane, Antibodies, a laboratory manual, CSH Press (1988), which is incorporated by reference.
  • a minimum set of proteins necessary for detecting various leukemia types or subtypes may be selected for the creation of a protein array for use in making diagnoses with, e.g., protein lysates of bone marrow samples directly.
  • Protein array systems for the detection of specific protein expression profiles are commercially available from various suppliers, including the Bio-PlexTM platform available from BIO-RAD Laboratories (Munich, Germany).
  • antibodies against the target proteins are produced and immobilized on a solid support, e.g., a glass slide or a well of a microtiter plate.
  • the immobilized antibodies can be labeled with a reactant that is specific for the target proteins.
  • reactants can include, e.g., enzyme substrates, DNA, receptors, antigens or antibodies to create for example a capture sandwich immunoassay.
  • Target proteins can also be detected using aptamers including photoaptamers.
  • Aptamers generally are single-stranded oligonucleotides (e.g., typically DNA for diagnostic applications) that assume a specific, sequence-dependent shape and binds to target proteins based on a "lock-and-key” fit between the two molecules. Aptamers can be identified using the SELEX process (Gold (1996) “The SELEX process: a surprising source of therapeutic and diagnostic compounds," Harvey
  • the detection of proteins via mass includes various formats that can be adapted for use in the methods of the invention.
  • Exemplary formats include matrix assisted laser desorption/ionization- (MALDI) and surface enhanced laser deso ⁇ tion/ionization-based (SELDI) detection.
  • MALDI- and SELDI-based detection are also described in, e.g., Weinberger et al. (2000) "Recent trends in protein biochip technology," Pharmaco genomics 1(4):395-416, Forde et al. (2002) “Characterization of transcription factors by mass spectrometry and the role of SELDI-MS,” Mass Spectrom. Rev. 21(6):419-439, and Leushner (2001) "MALDI
  • TOF mass spectrometry an emerging platform for genomics and diagnostics," Expert Rev. MoI. Diagn. 1(1):11-18, which are each incorporated by reference. Protein chips and related instrumentation are available from commercial suppliers, such as Ciphergen Biosystems, Inc. (Fremont, CA, USA).
  • oligonucleotides for use as probes and/or primers.
  • the DNAstar software package available from DNASTAR, Inc. can be used for sequence alignments.
  • target nucleic acid sequences and non-target nucleic acid sequences can be uploaded into DNAstar EditSeq program as individual files, e.g., as part of a process to identify regions in these sequences that have low sequence similarity.
  • pairs of sequence files can be opened in the DNAstar MegAlign sequence alignment program and the Clustal W method of alignment can be applied. The parameters used for Clustal W alignments are optionally the default settings in the software.
  • MegAlign typically does not provide a summary of the percent identity between two sequences. This is generally calculated manually. From the alignments, regions having, e.g., less than 85% identity with one another are typically identified and oligonucleotide sequences in these regions can be selected. Many other sequence alignment algorithms and software packages are also optionally utilized. Sequence alignment algorithms are also described in, e.g., Mount, Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press (2001), and Durbin et al., Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press (1998), which are both incorporated by reference.
  • optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman (1981) Adv. Appl. Math. 2:482, by the homology alignment algorithm of Needleman & Wunsch (1970) J. MoI. Biol. 48:443, by the search for similarity method of Pearson & Lipman (1988) Proc. Nat'l. Acad. Sci. USA 85:2444, which are each incorporated by reference, and by computerized implementations of these algorithms (e.g., GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin
  • BLAST algorithm Another example algorithm that is suitable for determining percent sequence identity is the BLAST algorithm, which is described in, e.g., Altschul et al. (1990) J. MoI. Biol. 215:403-410, which is incorporated by reference. Software for performing versions of BLAST analyses is publicly available through the National Center for Biotechnology Information on the world wide web at ncbi.nlm.nih.gov/ as of 11/4/2004.
  • PILEUP creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. PILEUP uses a simplification of the progressive alignment method of Feng & Doolittle (1987) J. MoI. Evol. 35:351-360, which is incorporated by reference. Oligonucleotide probes and primers are optionally prepared using essentially any technique known in the art.
  • the oligonucleotide probes and primers are synthesized chemically using essentially any nucleic acid synthesis method, including, e.g., according to the solid phase phosphoramidite method described by Beaucage and Caruthers (1981) Tetrahedron Letts. 22(20): 1859-1862, which is incorporated by reference.
  • oligonucleotides can also be synthesized using a triester method (see, e.g., Capaldi et al. (2000) "Highly efficient solid phase synthesis of oligonucleotide analogs containing phosphorodithioate linkages" Nucleic Acids Res.
  • primer nucleic acids optionally include various modifications.
  • primers include restriction site linkers, e.g., to facilitate subsequent amplicon cloning or the like.
  • primers are also optionally modified to improve the specificity of amplification reactions as described in, e.g., U.S. Pat. No. 6,001,611, entitled "MODIFIED NUCLEIC ACID
  • Probes and/or primers utilized in the methods and other aspects of the invention are typically labeled to permit detection of probe-target hybridization duplexes.
  • a label can be any moiety that can be attached to a nucleic acid and provide a detectable signal (e.g., a quantifiable signal).
  • Labels may be attached to oligonucleotides directly or indirectly by a variety of techniques known in the art. To illustrate, depending on the type of label used, the label can be attached to a terminal (5' or 3' end of an oligonucleotide primer and/or probe) or a non-terminal nucleotide, and can be attached indirectly through linkers or spacer arms of various sizes and compositions.
  • oligonucleotides containing functional groups e.g., thiols or primary amines
  • functional groups e.g., thiols or primary amines
  • oligonucleotides can label such oligonucleotides using protocols described in, e.g., lnnis et al. (Eds.) PCR Protocols: A Guide to Methods and Applications, Elsevier Science & Technology Books (1990)(Innis), which is incorporated by reference.
  • labels comprise a fluorescent dye (e.g., a rhodamine dye (e.g., R6G, Rl 10, TAMRA, ROX, etc.), a fluorescein dye (e.g., JOE, VIC, TET, HEX, FAM, etc.), a halofluorescein dye, a cyanine dye (e.g., CY3, CY3.5, CY5, CY5.5, etc.), a BODIPY® dye (e.g., FL, 530/550, TR, TMR, etc.), an ALEXA FLUOR® dye (e.g., 488, 532, 546, 568, 594, 555, 653, 647, 660, 680, etc.), a dichlororhodamine dye, an energy transfer dye (e.g., BIGDYETM
  • labels include, e.g., biotin, weakly fluorescent labels (Yin et al. (2003) Appl Environ Microbiol. 69(7):3938, Babendure et al. (2003) Anal. Biochem. 317(1):! . and Jankowiak et al. (2003) Chem Res Toxicol. 16(3):304), non-fluorescent labels, colorimetric labels, chemiluminescent labels (Wilson et al. (2003) Analyst. 128(5):480 and Roda et al. (2003) Luminescence 18(2):72), Raman labels, electrochemical labels, bioluminescent labels (Kitayama et al. (2003) Photochem Photobiol. 77(3):333, Arakawa et al.
  • nucleic acid labeling is also described further below.
  • labeling is achieved using synthetic nucleotides (e.g., synthetic ribonucleotides, etc.) and/or recombinant phycoerythrin (PE).
  • a fluorescent dye is a label or a quencher is generally defined by its excitation and emission spectra, and the fluorescent dye with which it is paired.
  • Fluorescent molecules commonly used as quencher moieties in probes and primers include, e.g., fluorescein, FAM, JOE, rhodamine, R6G, TAMRA, ROX, DABCYL, and EDANS. Many of these compounds are available from the commercial suppliers referred to above.
  • Exemplary non-fluorescent or dark quenchers that dissipate energy absorbed from a fluorescent dye include the Black HoIe QuenchersTM or BHQTM, which are commercially available from Biosearch Technologies, Inc. (Novato, CA, USA).
  • nucleic acid and virtually any labeled nucleic acid, whether standard or non-standard
  • nucleic acid can be custom or standard ordered from any of a variety of commercial sources, such as The Midland Certified Reagent
  • modified nucleotides are included in probes and primers.
  • the introduction of modified nucleotide substitutions into oligonucleotide sequences can, e.g., increase the melting temperature of the oligonucleotides. In some embodiments, this can yield greater sensitivity relative to corresponding unmodified oligonucleotides even in the presence of one or more mismatches in sequence between the target nucleic acid and the particular oligonucleotide.
  • modified nucleotides that can be substituted or added in oligonucleotides include, e.g., C5-ethyl-dC, C5-methyl-dU, C5-ethyl-dU, 2,6- diaminopurines, C5-propynyl-dC, C7-propynyl-dA, C7-propynyl-dG, C5- propargylamino-dC, C5-propargylamino-dU, C7-propargylamino-dA, C7- propargylamino-dG, 7-deaza-2-deoxyxanthosine, pyrazolopyrimidine analogs, pseudo-dU, nitro pyrrole, nitro indole, 2'-0-methyl Ribo-U, 2'-0-methyl Ribo-C, an 8-aza-dA, an 8-aza-dG, a 7-deaza-dA, a 7-d-
  • modified oligonucleotides include those having one or more LNATM monomers.
  • Nucleotide analogs such as these are also described in, e.g., U.S. Pat. No. 6,639,059, entitled “SYNTHESIS OF [2.2.I]BICYCLO NUCLEOSIDES,” issued October 28, 2003 to Kochkine et al, U.S. Pat. No. 6,303,315, entitled "ONE STEP SAMPLE PREPARATION AND
  • oligonucleotide probes designed to hybridize with target nucleic acids are covalently or noncovalently attached to solid supports.
  • labeled amplicons derived from patient samples are typically contacted with these solid support-bound probes to effect hybridization and detection.
  • amplicons are attached to solid supports and contacted with labeled probes.
  • antibodies, aptamers, or other probe biomolecules utilized in a given assay are similarly attached to solid supports.
  • substrates are fabricated from silicon, glass, or polymeric materials (e.g., glass or polymeric microscope slides, silicon wafers, wells of microwell plates, etc.). Suitable glass or polymeric substrates, including microscope slides, are available from various commercial suppliers, such as Fisher
  • solid supports utilized in the invention are membranes.
  • Suitable membrane materials are optionally selected from, e.g. polyaramide membranes, polycarbonate membranes, porous plastic matrix membranes (e.g., POREX® Porous Plastic, etc.), nylon membranes, ceramic membranes, polyester membranes, polytetrafluoroethylene
  • TEFLON® TEFLON® membranes, nitrocellulose membranes, or the like. Many of these membranous materials are widely available from various commercial suppliers, such as, PJ. Cobert Associates, Inc. (St. Louis, MO, USA), Millipore Corporation (Bedford, MA, USA), or the like.
  • Other exemplary solid supports that are optionally utilized include, e.g., ceramics, metals, resins, gels, plates, beads (e.g., magnetic microbeads, etc.), whiskers, fibers, combs, single crystals, self- assembling monolayers, and the like.
  • Nucleic acids are directly or indirectly (e.g., via linkers, such as bovine serum albumin (BSA) or the like) attached to the supports, e.g., by any available chemical or physical method.
  • linkers such as bovine serum albumin (BSA) or the like
  • a wide variety of linking chemistries are available for linking molecules to a wide variety of solid supports. More specifically, nucleic acids may be attached to the solid support by covalent binding, such as by conjugation with a coupling agent or by non-covalent binding, such as electrostatic interactions, hydrogen bonds or antibody-antigen coupling, or by combinations thereof.
  • Typical coupling agents include biotin/avidin, biotin/streptavidin, Staphylococcus aureus protein A/IgG antibody F c fragment, and streptavidin/protein A chimeras (Sano et al. (1991) Bio/Technology 9:1378, which is incorporated by reference), or derivatives or combinations of these agents.
  • Nucleic acids may be attached to the solid support by a photocleavable bond, an electrostatic bond, a disulfide bond, a peptide bond, a diester bond or a combination of these bonds. Nucleic acids are also optionally attached to solid supports by a selectively releasable bond such as
  • Cleavable attachments can be created by attaching cleavable chemical moieties between the probes and the solid support including, e.g., an oligopeptide, oligonucleotide, oligopolyamide, oligoacrylamide, oligoethylene glycerol, alkyl chains of between about 6 to 20 carbon atoms, and combinations thereof. These moieties may be cleaved with, e.g., added chemical agents, electromagnetic radiation, or enzymes.
  • Exemplary attachments cleavable by enzymes include peptide bonds, which can be cleaved by proteases, and phosphodiester bonds which can be cleaved by nucleases.
  • Chemical agents such as ⁇ -mercaptoethanol, dithiothreitol (DTT) and other reducing agents cleave disulfide bonds.
  • Other agents which may be useful include oxidizing agents, hydrating agents and other selectively active compounds.
  • Electromagnetic radiation such as ultraviolet, infrared and visible light cleave photocleavable bonds. Attachments may also be reversible, e.g., using heat or enzymatic treatment, or reversible chemical or magnetic attachments. Release and reattachment can be performed using, e.g., magnetic or electrical fields.
  • the length of complementary region or sequence between primer or probes and their binding partners should generally be sufficient to allow selective or specific hybridization of the primers or probes to the targets at the selected annealing temperatures used for a particular nucleic acid amplification protocol, expression profiling assay, etc.
  • complementary regions of, for example, between about 10 and about 50 nucleotides (e.g., about 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, or 25 or more nucleotides) are typically used in a given application.
  • “Stringent hybridization wash conditions" in the context of nucleic acid hybridization experiments, such as Southern and northern hybridizations, are sequence dependent, and are different under different environmental parameters.
  • highly stringent hybridization and wash conditions are selected to be about 5° C or less lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength and pH (as noted below, highly stringent conditions can also be referred to in comparative terms).
  • T m thermal melting point
  • the T 111 is the temperature (under defined ionic strength and pH) at which 50% of the test sequence hybridizes to a perfectly matched primer or probe.
  • Very stringent conditions are selected to be equal to the T 111 for a particular primer or probe.
  • the T, n is the temperature of the nucleic acid duplexes indicates the temperature at which the duplex is 50% denatured under the given conditions and its represents a direct measure of the stability of the nucleic acid hybrid.
  • the T m corresponds to the temperature corresponding to the midpoint in transition from helix to random coil; it depends on length, nucleotide composition, and ionic strength for long stretches of nucleotides.
  • unhybridized nucleic acid material can be removed by a series of washes, the stringency of which can be adjusted depending upon the desired results.
  • Low stringency washing conditions e.g., using higher salt and lower temperature
  • Higher stringency conditions e.g., using lower salt and higher temperature that is closer to the hybridization temperature
  • lowers the background signal typically with only the specific signal remaining. See, e.g., Rapley et al. (Eds.), Molecular Biomethods Handbook (Humana Press, Inc. 1998), which is incorporated by reference.
  • one measure of stringent hybridization is the ability of the primer or probe to hybridize to one or more of the target nucleic acids (or complementary polynucleotide sequences thereof) under highly stringent conditions. Stringent hybridization and wash conditions can easily be determined empirically for any test nucleic acid.
  • the hybridization and wash conditions are gradually increased (e.g., by increasing temperature, decreasing salt concentration, increasing detergent concentration and/or increasing the concentration of organic solvents, such as formalin, in the hybridization or wash), until a selected set of criteria is met.
  • the hybridization and wash conditions are gradually increased until a target nucleic acid, and complementary polynucleotide sequences thereof, binds to a perfectly matched complementary nucleic acid.
  • a target nucleic acid is said to specifically hybridize to a primer or probe nucleic acid when it hybridizes at least Vi as well to the primer or probe as to a perfectly matched complementary target, i.e., with a signal to noise ratio at least Vi as high as hybridization of the primer or probe to the target under conditions in which the perfectly matched primer or probe binds to the perfectly matched complementary target with a signal to noise ratio that is at least about 2.5x-10x, typically 5x-10x as high as that observed for hybridization to any of the unmatched target nucleic acids.
  • RNA is converted to cDNA in a reverse-transcription (RT) reaction using, e.g., a target-specific primer complementary to the RNA for each gene target being monitored.
  • RT reverse-transcription
  • Methods of reverse transcribing RNA into cDNA are well known, and described in Sambrook, supra.
  • Alternative methods for reverse transcription utilize thermostable DNA polymerases, as described in the art.
  • avian myeloblastosis virus reverse transcriptase (AMV- RT), or Maloney murine leukemia virus reverse transcriptase (MoMLV-RT) is used, although other enzymes are also optionally utilized.
  • AMV- RT avian myeloblastosis virus reverse transcriptase
  • MoMLV-RT Maloney murine leukemia virus reverse transcriptase
  • An advantage of using target-specific primers in the RT reaction is that only the desired sequences are converted into a PCR template.
  • RNA targets are reverse transcribed using non-specific primers, such as an anchored oligo-dT primer, or random sequence primers.
  • non-specific primers such as an anchored oligo-dT primer, or random sequence primers.
  • An advantage of this embodiment is that the "unfractionated" quality of the mRNA sample is maintained because the sites of priming are non-specific, i.e., the products of this RT reaction will serve as template for any desired target in the subsequent PCR amplification. This allows samples to be archived in the form of DNA, which is more stable than RNA.
  • transcription-based amplification systems are used, such as that first described by Kwoh et al. (Proc. Natl. Acad. Sci.
  • mRNA target of interest is copied into cDNA by a reverse transcriptase.
  • the primer for cDNA synthesis includes the promoter sequence of a designated DNA-dependent RNA polymerase 5' to the primer's region of homology with the template.
  • the resulting cDNA products can then serve as templates for multiple rounds of transcription by the appropriate RNA polymerase. Transcription of the cDNA template rapidly amplifies the signal from the original target mRNA. The isothermal reactions bypass the need for denaturing cDNA strands from their RNA templates by including RNAse H to degrade RNA hybridized to DNA.
  • amplification is accomplished by used of the ligase chain reaction (LCR), disclosed in European Patent Application No. 320,308 (Backman and Wang), or by the ligase detection reaction (LDR), disclosed in U.S.
  • LCR ligase chain reaction
  • LDR ligase detection reaction
  • Patent No. 4,883,750 (Whiteley et al.), which are each incorporated by reference.
  • LCR two probe pairs are typically prepared, which are complimentary each other, and to adjacent sequences on both strands of the target. Each pair will bind to opposite strands of the target such that they abut. Each of the two probe pairs can then be linked to form a single unit, using a thermostable ligase. By temperature cycling, as in PCR, bound ligated units dissociate from the target, then both molecules can serve as "target sequences" for ligation of excess probe pairs, providing for an exponential amplification.
  • the LDR is very similar to LCR.
  • oligonucleotides complimentary to only one strand of the target are used, resulting in a linear amplification of ligation products, since only the original target DNA can serve as a hybridization template. It is used following a PCR amplification of the target in order to increase signal.
  • Amplicons are optionally recovered and purified from other reaction components by any of a number of methods well known in the art, including electrophoresis, chromatography, precipitation, dialysis, filtration, and/or centrifugation. Aspects of nucleic acid purification are described in, e.g., Douglas et al., DNA
  • amplicons are not purified prior to detection, such as when amplicons are detected simultaneous with amplification.
  • the number of species than can be detected within a mixture depends primarily on the resolution capabilities of the separation platform used, and the detection methodology employed. In some embodiments, separation steps are is based upon size-based separation technologies. Once separated, individual species are detected and quantitated by either inherent physical characteristics of the molecules themselves, or detection of an associated label.
  • Embodiments employing other separation methods are also described.
  • certain types of labels allow resolution of two species of the same mass through deconvolution of the data.
  • Non-size based differentiation methods allow pooling of a plurality of multiplexed reactions to further increase throughput.
  • Certain embodiments of the invention incorporate a step of separating the products of a reaction based on their size differences.
  • the PCR products generated during an amplification reaction typically range from about 50 to about 500 bases in length, which can be resolved from one another by size.
  • Any one of several devices may be used for size separation, including mass spectrometry, any of several electrophoretic devices, including capillary, polyacrylamide gel, or agarose gel electrophoresis, or any of several chromatographic devices, including column chromatography, HPLC, or FPLC.
  • sample analysis includes the use of mass spectrometry.
  • mass spectrometry includes the use of mass spectrometry.
  • modes of separation that determine mass include Time-of- Flight (TOF), Fourier Transform Mass Spectrometry (FTMS), and quadruple mass spectrometry.
  • TOF Time-of- Flight
  • FTMS Fourier Transform Mass Spectrometry
  • quadruple mass spectrometry Possible methods of ionization include Matrix-Assisted Laser
  • MALDI Desorption and Ionization
  • ESI Electrospray Ionization
  • a preferred embodiment for the uses described in this invention is MALDI-TOF (Wu, et al. (1993) Rapid Communications in Mass Spectrometry 7:142-146, which is incorporated by reference).
  • This method may be used to provide unfragmented mass spectra of mixed-base oligonucleotides containing between about 1 and about 1000 bases.
  • the analyte is mixed into a matrix of molecules that resonantly absorb light at a specified wavelength. Pulsed laser light is then used to desorb oligonucleotide molecules out of the absorbing solid matrix, creating free, charged oligomers and minimizing fragmentation.
  • microcapillary is used for analysis of nucleic acids obtained from the sample.
  • Microcapillary electrophoresis generally involves the use of a thin capillary or channel, which may optionally be filled with a particular medium to improve separation, and employs an electric field to separate components of the mixture as the sample travels through the capillary.
  • RNA will separate based on size.
  • the high surface to volume ratio of these capillaries allows application of very high electric fields across the capillary without substantial thermal variation, consequently allowing very rapid separations.
  • these methods provide sensitivity in the range of attomoles, comparable to the sensitivity of radioactive sequencing methods.
  • the use of microcapillary electrophoresis in size separation of nucleic acids has been reported in Woolley and Mathies (Proc. Natl. Acad. Sci. USA (1994) 91 :11348-11352), which is incorporated by reference.
  • Capillaries are optionally fabricated from fused silica, or etched, machined, or molded into planar substrates.
  • the capillaries are filled with an appropriate separation/sieving matrix.
  • sieving matrices are known in the art that may be used for this application, including, e.g., hydroxyethyl cellulose, polyacrylamide, agarose, and the like.
  • the specific gel matrix, running buffers and running conditions are selected to obtain the separation required for a particular application. Factors that are considered include, e.g., sizes of the nucleic acid fragments, level of resolution, or the presence of undenatured nucleic acid molecules.
  • running buffers may include agents such as urea to denature double-stranded nucleic acids in a sample.
  • Microfluidic systems for separating molecules such as DNA and RNA are commercially available and are optionally employed in the methods of the present invention.
  • chromatographic techniques may be employed for resolving amplification products.
  • Many types of physical or chemical characteristics may be used to effect chromatographic separation in the present invention, including adsorption, partitioning (such as reverse phase), ion-exchange, and size exclusion.
  • Many specialized techniques have been developed for their application including methods utilizing liquid chromatography or HPLC (Katz and Dong (1990) BioTechniques 8(5):546-55; Gaus et al. (1993) J. Immunol. Methods 158:229-236).
  • cDNA products are captured by their affinity for certain substrates, or other incorporated binding properties.
  • labeled cDNA products such as biotin or antigen can be captured with beads bearing avidin or antibody, respectively.
  • Affinity capture is utilized on a solid support to enable physical separation.
  • solid supports include beads (e.g. solid, porous, magnetic), surfaces (e.g. plates, dishes, wells, flasks, dipsticks, membranes), or chromatographic materials (e.g. fibers, gels, screens).
  • Certain separation embodiments entail the use of microfluidic techniques. Technologies include separation on a microcapillary platform, such as designed by ACLARA BioSciences Inc. (Mountain View, CA), or the LabChipTM microfluidic devices made by Caliper Lifesciences Corp. Another technology developed by Nanogen, Inc. (San Diego, CA), utilizes microelectronics to move and concentrate biological molecules on a semiconductor microchip.
  • Chip which provides for parallel processing of hundreds of reactions, can also be used in certain embodiments.
  • These microfluidic platforms require only nanoliter sample volumes, in contrast to the microliter volumes required by other conventional separation technologies.
  • Some of the processes usually involved in genetic analysis have been miniaturized using microfluidic devices.
  • PCT publication WO 94/05414 reports an integrated micro-PCR apparatus for collection and amplification of nucleic acids from a specimen.
  • U.S. Patent Nos. 5,304,487 (Wilding et al.) and 5,296,375 (Kricka et al.) discuss devices for collection and analysis of cell-containing samples.
  • the duration of the current decrease was shown to be proportional to polymer length.
  • Primers are useful both as reagents for hybridization in solution, such as priming PCR amplification, as well as for embodiments employing a solid phase, such as microarrays.
  • sample nucleic acids such as mRNA or DNA are fixed on a selected matrix or surface.
  • PCR products may be attached to the solid surface via one of the amplification primers, then denatured to provide single- stranded DNA. This spatially-partitioned, single-stranded nucleic acid is then subject to hybridization with selected probes under conditions that allow a quantitative determination of target abundance.
  • amplification products from each individual reaction are not physically separated, but are differentiated by hybridizing with a set of probes that are differentially labeled.
  • unextended amplification primers may be physically immobilized at discreet positions on the solid support, then hybridized with the products of a nucleic acid amplification for quantitation of distinct species within the sample.
  • amplification products are separated by way of hybridization with probes that are spatially separated on the solid support. Separation platforms may optionally be coupled to utilize two different separation methodologies, thereby increasing the multiplexing capacity of reactions beyond that which can be obtained by separation in a single dimension.
  • RT-PCR primers of a multiplex reaction may be coupled with a moiety that allows affinity capture, while other primers remain unmodified.
  • Samples are then passed through an affinity chromatography column to separate PCR products arising from these two classes of primers. Flow-through fractions are collected and the bound fraction eluted. Each fraction may then be further separated based on other criteria, such as size, to identify individual components.
  • Detection Methods Following separation of the different products of a multiplex amplification, one or more of the amplicons are detected and/or quantitated. Some embodiments of the methods of the present invention enable direct detection of products. Other embodiments detect reaction products via a label associated with one or more of the amplification primers. Many types of labels suitable for use in the present invention are known in the art, including chemiluminescent, isotopic, fluorescent, electrochemical, inferred, or mass labels, or enzyme tags. In further embodiments, separation and detection may be a multi-step process in which samples are fractionated according to more than one property of the products, and detected one or more stages during the separation process.
  • An exemplary embodiment of the invention that does not use labeling or modification of the molecules being analyzed is detection of the mass-to-charge ratio of the molecule itself. This detection technique is optionally used when the separation platform is a mass spectrometer.
  • An embodiment for increasing resolution and throughput with mass detection is in mass-modifying the amplification products. Nucleic acids can be mass-modified through either the amplification primer or the chain-elongating nucleoside triphosphates. Alternatively, the product mass can be shifted without modification of the individual nucleic acid components, by instead varying the number of bases in the primers.
  • moieties have been shown to be compatible with analysis by mass spectrometry, including polyethylene glycol, halogens, alkyl, aryl, or aralkyl moieties, peptides (described in, for example, U.S. Patent No. 5,691,141, which is incorporated by reference).
  • Isotopic variants of specified atoms such as radioisotopes or stable, higher mass isotopes, are also used to vary the mass of the amplification product. Radioisotopes can be detected based on the energy released when they decay, and numerous applications of their use are generally known in the art.
  • Stable (non-decaying) heavy isotopes can be detected based on the resulting shift in mass, and are useful for distinguishing between two amplification products that would otherwise have similar or equal masses.
  • Other embodiments of detection that make use of inherent properties of the molecule being analyzed include ultraviolet light absorption (UV) or electrochemical detection. Electrochemical detection is based on oxidation or reduction of a chemical compound to which a voltage has been applied. Electrons are either donated
  • Some embodiments of the invention include identifying molecules indirectly by detection of an associated label.
  • a number of labels may be employed that provide a fluorescent signal for detection. If a sufficient quantity of a given species is generated in a reaction, and the mode of detection has sufficient sensitivity, then some fluorescent molecules may be incorporated into one or more of the primers used for amplification, generating a signal strength proportional to the concentration of DNA molecules.
  • fluorescent moieties including Alexa 350, Alexa 430, AMCA, BODIPY 630/650, BODIPY 650/665, BODIPY-FL, BODIPY-R6G, BODIPY-TMR, BODIPY-TRX, carboxyfluorescein, Cascade Blue, Cy3, Cy5, 6-FAM, Fluorescein, HEX, 6-JOE, Oregon Green 488, Oregon Green 500, Oregon Green 514, Pacific Blue, REG, Rhodamine Green, Rhodamine Red, ROX, TAMRA, TET, Tetramethylrhodamine, and Texas Red, are generally known in the art and routinely used for identification of discrete nucleic acid species, such as in sequencing reactions.
  • ET dyes The signal strength obtained from fluorescent dyes can be enhanced through use of related compounds called energy transfer (ET) fluorescent dyes.
  • ET dyes After absorbing light, ET dyes have emission spectra that allow them to serve as "donors" to a secondary "acceptor” dye that will absorb the emitted light and emit a lower energy fluorescent signal.
  • Use of these coupled-dye systems can significantly amplify fluorescent signal.
  • ET dyes include the ABI PRISM BigDye terminators, recently commercialized by Perkin-Elmer Corporation (Foster City, CA, USA) for applications in nucleic acid analysis.
  • chromophores incorporate the donor and acceptor dyes into a single molecule and an energy transfer linker couples a donor fluorescein to a dichlororhodamine acceptor dye, and the complex is attached, e.g., to a primer.
  • Fluorescent signals can also be generated by non-covalent intercalation of fluorescent dyes into nucleic acids after their synthesis and prior to separation.
  • This type of signal will vary in intensity as a function of the length of the species being detected, and thus signal intensities must be normalized based on size.
  • Several applicable dyes are known in the art, including, but not limited to, ethidium bromide and Vistra Green.
  • both electrochemical and infrared methods of detection can be amplified over the levels inherent to nucleic acid molecules through attachment of EC or IR labels.
  • Their characteristics and use as labels are described in, for example, PCT publication WO 97/27327, which is incorporated by reference.
  • Enzyme-linked reactions are also employed in the detecting step of the methods of the present invention. Enzyme-linked reactions theoretically yield an infinite signal, due to amplification of the signal by enzymatic activity.
  • an enzyme is linked to a secondary group that has a strong binding affinity to the molecule of interest. Following separation of the nucleic acid products, enzyme is bound via this affinity interaction. Nucleic acids are then detected by a chemical reaction catalyzed by the associated enzyme.
  • a primer may be synthesized containing a biotin molecule. After amplification, amplicons are separated by size, and those made with the biotinylated primer are detected by binding with streptavidin that is covalently coupled to an enzyme, such as alkaline phosphatase. A subsequent chemical reaction is conducted, detecting bound enzyme by monitoring the reaction product.
  • the secondary affinity group may also be coupled to an enzymatic substrate, which is detected by incubation with unbound enzyme.
  • Exploitation of known high-affinity biological interactions can provide a mechanism for physical capture.
  • Some examples of high- affinity interactions include those between a hormone with its receptor, a sugar with a lectin, avidin and biotin, or an antigen with its antibody.
  • affinity capture molecules are retrieved by cleavage, denaturation, or eluting with a competitor for binding, and then detected as usual by monitoring an associated label.
  • the binding interaction providing for capture may also serve as the mechanism of detection.
  • an amplification product or products are optionally changed, or "shifted,” in order to better resolve the amplification products from other products prior to detection.
  • chemically cleavable primers can be used in the amplification reaction.
  • one or more of the primers used in amplification contains a chemical linkage that can be broken, generating two separate fragments from the primer. Cleavage is performed after the amplification reaction, removing a fixed number of nucleotides from the 5' end of products made from that primer. Design and use of such primers is described in detail in, for example, PCT publication WO 96/37630, which is incorporated by reference.
  • the statistical significance of markers as expressed in q orp values based on the concept of the false discovery rate is optionally determined. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the/? value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate (see, e.g., Storey et al. (2003) Proc.Natl.Acad.Sci. 100:9440-5, which is incorporated by reference).
  • the markers described herein have ⁇ -values of less than about 3E-06, typically less than about 1.5E-09, more typically less than about 1.5E- 11, even more typically less than about 1.5E-20, and still more typically less than about 1.5E-30.
  • the expression level of at least about two, typically of at least about ten, more typically of at least about 25, and even more typically of at least about 50 of these markers is determined as described herein or by another technique known to those of skill in the art.
  • expression levels of one or more genes selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14 are determined in a given sample.
  • expression levels of each of these genes in a sample is determined and compared with expression levels detected in one or more reference leukemia cells.
  • the International Publication No. WO 03/039443 which is incorporated by reference, discloses certain marker genes the expression levels of which are characteristic for certain leukemia. Certain of the markers and/or methods disclosed therein are optionally utilized in performing the methods described herein.
  • the level of the expression of a marker is indicative of the genotype of the target cell.
  • the level of expression of a marker or group of markers is measured and is generally compared with the level of expression of the same marker or the same group of markers from other cells or samples. The comparison may be effected in an actual experiment or in silico. There is a meaningful difference in these levels of expression, e.g., when these expression levels (also referred to as expression pattern, expression signature, or expression profile) are measurably different. In some embodiments, the difference is typically at least about 5%, 10% or 20%, more typically at least about 50% or may even be as high as 75% or 100%.
  • the difference in the level of expression is optionally at least about 200%, i.e., two fold, at least about 500%, i.e., five fold, or at least about 1000%, i.e., 10 fold in some embodiments.
  • the expression level of markers expressed lower in a first subtype than in at least one second subtype, which differs from the first subtype is at least about 5%, 10% or 20%, more typically at least about 50% or may even be about 75% or about 100%, more typically at least about 10-fold, even more typically at least 50-fold, and still more typically at least about 100-fold lower in the first subtype.
  • the expression level of markers expressed higher in a first subtype than in at least one second subtype, which differs from the first subtype is at generally least about 5%, 10% or 20%, more generally at least about 50% or may even be about 75% or about 100%, more generally at least 10-fold, still more generally at least about 50-fold, and even more generally at least about 100-fold higher in the first subtype.
  • the classification accuracy of a given gene list for a set of microarray experiments is preferably estimated using Support Vector Machines (SVM), because there is evidence that SVM-based prediction slightly outperforms other classification techniques, such as k-Nearest Neighbors (k-NN).
  • SVM Support Vector Machines
  • the LIBSVM software package version 2.36 for example, is optionally used (SVM-type: SVC, linear kernel (http://www.csie.ntu.edu.tw/-cjlin/libsvrn/)).
  • Machine learning algorithms are also described in, e.g., Brown et al. (2000) Proc.Natl.Acad.Sci.. 97:262-267, Furey et al.
  • the classification accuracy of a given gene list for a set of microarray experiments can be estimated using Support Vector Machines (SVM) as supervised learning techniques.
  • SVMs are trained using differentially expressed genes, which were identified on a subset of the data and then this trained model is employed to assign new samples to those trained groups from a second and different data set.
  • Differentially expressed genes are optionally identified, e.g., applying analysis of variance (ANOVA) and t-test-statistics (Welch t-test).
  • ANOVA analysis of variance
  • Weight t-test t-test-statistics
  • respective training sets consisting of, e.g., 2/3 of cases and test sets with 1/3 of cases to assess classification accuracies can be designated. Assignment of cases to training and test sets is optionally randomized and balanced by diagnosis.
  • SVM Support Vector Machine
  • the apparent accuracy of prediction i.e., the overall rate of correct predictions of the complete data set can be estimated by, e.g., lOfold cross validation.
  • This process typically includes dividing the data set into 10 approximately equally sized subsets, training an SVM-model for 9 subsets, and generating predictions for the remaining subset. This training and prediction process can be repeated 10 times to include predictions for each subset. Subsequently the data set can be split into a training set, consisting of two thirds of the samples, and a test set with the remaining one third. Apparent accuracy for the training set can also be estimated by lOfold cross validation (analogous to apparent accuracy for complete set).
  • An SVM-model of the training set is optionally built to predict diagnosis in the independent test set, thereby estimating true accuracy of the prediction model.
  • Sensitivity (number of positive samples predicted)/(number of true positive)
  • Specificity (number of negative samples predicted)/(number of true negatives).
  • the present invention also provides systems for analyzing gene expression.
  • the system includes one or more probes that correspond to at least portions of genes or expression products thereof.
  • the genes are generally selected from the markers listed in, e.g., Table 8, Table 9, Table 10, Table 13, and/or Table 14.
  • the probes are nucleic acids (e.g., oligonucleotides, cDNAs, cRNAs, etc.), whereas in other embodiments, the probes are biomolecules (e.g., antibodies, aptmers, etc.) designed to detect expression products of the genes (e.g., proteins or fragments thereof).
  • the probes are arrayed on a solid support, whereas in others, they are provided in one or more containers, e.g., for assays performed in solution.
  • the system also includes at least one reference data bank or database for correlating detected expression levels of polynucleotides and/or polypeptides in at least one target leukemia cell from a human subject, which polynucleotides and/or polypeptides are targets of one or more of the probes, with the target leukemia cell comprising a t(l Iq23)/MLL.
  • the reference data bank is backed up on a computational data memory chip or other computer readable medium, which can be inserted in as well as removed from system of the present invention, e.g., like an interchangeable module, in order to use another data memory chip containing a different reference data bank.
  • the systems also include detectors (e.g., spectrometers, etc.) that detect binding between the probes and targets. Other detectors are described further below.
  • the systems also generally include at least one controller operably connected to the reference data bank and/or to the detector. In some embodiments, for example, the controller is integral with the reference data bank.
  • the systems of the present invention that include a desired reference data bank can be used in a way such that an unknown sample is, first, subjected to gene expression profiling, e.g., by microarray analysis in a manner as described herein or otherwise known to person skilled in the art, and the expression level data obtained by the analysis are, second, fed into the system and compared with the data of the reference data bank obtainable by the above method.
  • the apparatus suitably contains a device for entering the expression level of the data, for example, a control panel such as a keyboard.
  • the results, whether and how the data of the unknown sample fit into the reference data bank can be made visible on a monitor or display screen and, if desired, printed out on an incorporated of connected printer.
  • Computer components are described further below.
  • a system optionally further includes a thermal modulator operably connected to containers to modulate temperature in the containers (e.g., to effect thermocycling when target nucleic acids are amplified in the containers), and/or fluid transfer components (e.g., automated pipettors, etc.) that transfer fluid to and/or from the containers.
  • thermal modulator operably connected to containers to modulate temperature in the containers (e.g., to effect thermocycling when target nucleic acids are amplified in the containers), and/or fluid transfer components (e.g., automated pipettors, etc.) that transfer fluid to and/or from the containers.
  • fluid transfer components e.g., automated pipettors, etc.
  • these systems also include robotic components for translocating solid supports, containers, and the like, and/or separation components (e.g., microfluidic devices, chromatography columns, etc.) for separating the products of amplification reactions from one another.
  • the invention further provides a computer or computer readable medium that includes a data set that comprises a plurality of character strings that correspond to a plurality of sequences (or subsequences thereof) that correspond to genes selected from, e.g., the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14.
  • the computer or computer readable medium further includes an automatic synthesizer coupled to an output of the computer or computer readable medium.
  • the automatic synthesizer accepts instructions from the computer or computer readable medium, which instructions direct synthesis of, e.g., one or more probe nucleic acids that correspond to one or more character strings in the data set.
  • Detectors are structured to detect detectable signals produced, e.g., in or proximal to another component of the system (e.g., in container, on a solid support, etc.). Suitable signal detectors that are optionally utilized, or adapted for use, in these systems detect, e.g., fluorescence, phosphorescence, radioactivity, absorbance, refractive index, luminescence, or the like. Detectors optionally monitor one or a plurality of signals from upstream and/or downstream of the performance of, e.g., a given assay step. For example, the detector optionally monitors a plurality of optical signals, which correspond in position to "real time" results.
  • Example detectors or sensors include photomultiplier tubes, CCD arrays, optical sensors, temperature sensors, pressure sensors, pH sensors, conductivity sensors, scanning detectors, or the like. Each of these as well as other types of sensors is optionally readily incorporated into the systems described herein. Optionally, the systems of the present invention include multiple detectors.
  • More specific exemplary detectors that are optionally utilized in these systems include, e.g., a resonance light scattering detector, an emission spectroscope, a fluorescence spectroscope, a phosphorescence spectroscope, a luminescence spectroscope, a spectrophotometer, a photometer, and the like.
  • Various synthetic components are also utilized, or adapted for, use in the systems of the invention including, e.g., automated nucleic acid synthesizers, e.g., for synthesizing the oligonucleotides probes described herein.
  • Detectors and synthetic components that are optionally included in the systems of the invention are described further in, e.g., Skoog et al., Principles of Instrumental Analysis, 5 th Ed., Harcourt Brace College Publishers (1998) and Currell, Analytical Instrumentation: Performance Characteristics and Quality, John Wiley & Sons, Inc. (2000), both of which are incorporated by reference.
  • the systems of the invention also typically include controllers that are operably connected to one or more components (e.g., detectors, synthetic components, thermal modulator, fluid transfer components, etc.) of the system to control operation of the components. More specifically, controllers are generally included either as separate or integral system components that are utilized, e.g., to receive data from detectors, to effect and/or regulate temperature in the containers, to effect and/or regulate fluid flow to or from selected containers, or the like. Controllers and/or other system components is/are optionally coupled to an appropriately programmed processor, computer, digital device, or other information appliance
  • Suitable controllers are generally known in the art and are available from various commercial sources.
  • Any controller or computer optionally includes a monitor which is often a cathode ray tube ("CRT") display, a flat panel display (e.g., active matrix liquid crystal display, liquid crystal display, etc.), or others.
  • Computer circuitry is often placed in a box, which includes numerous integrated circuit chips, such as a microprocessor, memory, interface circuits, and others.
  • the box also optionally includes a hard disk drive, a floppy disk drive, a high capacity removable drive such as a writeable CD-ROM, and other common peripheral elements.
  • Inputting devices such as a keyboard or mouse optionally provide for input from a user.
  • the computer typically includes appropriate software for receiving user instructions, either in the form of user input into a set of parameter fields, e.g., in a GUI, or in the form of preprogrammed instructions, e.g., preprogrammed for a variety of different specific operations.
  • the software then converts these instructions to appropriate language for instructing the operation of one or more controllers to carry out the desired operation.
  • the computer then receives the data from, e.g., sensors/detectors included within the system, and interprets the data, either provides it in a user understood format, or uses that data to initiate further controller instructions, in accordance with the programming, e.g., such as controlling fluid flow regulators in response to fluid weight data received from weight scales or the like.
  • the computer can be, e.g., a PC (Intel x86 or Pentium chip-compatible DOSTM, OS2TM, WINDOWSTM, WINDOWS NTTM, WINDOWS95TM, W1NDOWS98TM, WINDOWS2000TM, WINDOWS XPTM, LINUX-based machine, a MACINTOSHTM, Power PC, or a UNIX-based (e.g., SUNTM work station) machine) or other common commercially available computer which is known to one of skill.
  • PC Intel x86 or Pentium chip-compatible DOSTM, OS2TM, WINDOWSTM, WINDOWS NTTM, WINDOWS95TM, W1NDOWS98TM, WINDOWS2000TM, WINDOWS XPTM, LINUX-based machine, a MACINTOSHTM, Power PC, or a UNIX-based (e.g., SUNTM work station) machine) or other common commercially available computer which is
  • Standard desktop applications such as word processing software (e.g., Microsoft WordTM or Corel WordPerfectTM) and database software (e.g., spreadsheet software such as Microsoft ExcelTM, Corel Quattro ProTM, or database programs such as Microsoft AccessTM or ParadoxTM) can be adapted to the present invention.
  • Software for performing, e.g., controlling temperature modulators and fluid flow regulators is optionally constructed by one of skill using a standard programming language such as Visual basic, Fortran, Basic, Java, or the like.
  • Reference data banks can be produced by, e.g., (a) compiling a gene expression profile of a patient sample by determining the expression level of at least one marker selected from those listed in, e.g., Table 8, Table 9, Table 10, Table 13, and/or Table 14, and (b) classifying the gene expression profile using a machine learning algorithm.
  • Exemplary machine learning algorithms are optionally selected from, e.g., Weighted Voting, K-Nearest Neighbors, Decision Tree Induction, Support Vector Machines (SVM), and Feed-Forward Neural Networks.
  • the machine learning algorithm is an SVM, such as polynomial kernel, linear kernel, and Gaussian Radial Basis Function-kernel SVM models.
  • kits that include at least one probe as described herein for genotyping leukemia cells.
  • the kits also include instructions for correlating detected expression levels of polynucleotides and/or polypeptides in at least one target leukemia cell from a human subject, which polynucleotides and/or polypeptides are targets of one or more of the probes, with the target leukemia cell comprising a t(l Iq23)/MLL.
  • the kit includes suitable auxiliaries, such as buffers, enzymes, labeling compounds, and/or the like.
  • probes are attached to solid supports, e.g. the wells of microtiter plates, nitrocellulose membrane surfaces, glass surfaces, to particles in solution, etc.
  • kits also contain at least one reference for a leukemia that, e.g., lacks or comprises t(l Iq23)/MLL.
  • the reference can be a sample, a database, or the like.
  • the kit includes primers and other reagents for amplifying target nucleic acids.
  • kits also include at least one container for packaging the probes, the set of instructions, and any other included components.
  • the MLL gene (also termed ALL-I, HRX, and TRXl) located at chromosome band 1 Iq23 is a recurrent target of chromosomal translocations in acute leukemias, particularly prevalent in infant leukemias and treatment-related secondary leukemias, and associated with dismal prognosis. " Reciprocal translocations associated with the MLL gene result in in-frame fusion transcripts with various partner genes from at least 50 distinct gene loci. In addition, a partial tandem duplication of the MLL gene has been reported. 5
  • the class of oncogenic MLL fusion proteins consists of the N-terminal portion of the MLL protein fused to C-terminal portions of a fusion partner.
  • Experimental systems in which MLL fusion proteins were generated to induce leukemia in mice demonstrated that this fusion to a C-terminal partner is necessary for immortalization.
  • Two critical regions within MLL were identified: a region with three AT hook DNA-binding motifs and the DNA methyltransferase homology region. 6
  • the MLL fusion partners act via dominant gain of function and seem to play a role in two main functional categories, namely signaling molecules that normally localize to the cytoplasm/cell junctions or nuclear factors implicated in regulatory processes of transcription. 7
  • MLL- AF4 expressing leukemias are mainly diagnosed as pro B ALL, whereas e.g. fusion partners AF9, AF6, or AFlO are common in myelomonocytic or monoblastic AML subtypes.
  • AF9, AF6, or AFlO are common in myelomonocytic or monoblastic AML subtypes.
  • analyses were performed to (i) identify t(l Iq23)/MLL gene signatures compared to numerous specific subtypes of acute leukemias, (ii) discriminate t(l Iq23)/MLL positive AML from t(l Iq23)/MLL ALL samples, (iii) investigate signatures correlated to MLL- AF9 and other MLL partner genes (iv) decipher common biological networks.
  • the analysis addressed how the differing MLL partner genes influence the global gene expression signature and whether pathways could be identified to explain the molecular determination of MLL leukemias occuring in both the myeloid and lymphoid lineages.
  • Affymetrix software package (Microarray Suite 5.0) extracted fluorescence intensities from each element on the microarrays as detected by confocal laser scanning. 17 Detection calls (present, marginal, or absent) were determined by default parameters. Signal intensity values were calculated by scaling the raw data intensities to a common target intensity (U 133 mask file; TGT value: 5000).
  • Each human GeneChip expression array features 100 human maintenance genes that serve as a tool to normalise and scale the data before performing data comparisons. As recommended by the manufacturer, these 100 probe sets were used for normalization (on the world wide web at affymetrix. com/support/technical/mask_files.affx as of October 27, 2004).
  • the minimal quality control parameters for inclusion of an expression profile in this analysis took into account more than 30% present calls (Ul 33 A microarray) and a low 375' ratio of represented glyceraldehyde-3'-phosphate dehydrogenase gene (GAPDH) probe sets.
  • the complete data set was randomly split into respective training and independent test cohorts. Then differentially expressed genes were identified in the training set, and a learning model was built including the top differentially expressed genes. Using this approach, the algorithm learns to discriminate between the respective subtypes based on gene expression data in the given training patient cohort. Having learned the expression features of the classes, the algorithm could recognize and predict new samples as class members based on their expression patterns in the test cohort. The prediction accuracy was estimated by 10-fold cross-validation and assessed for robustness in a resampling approach (additional description is provided below). As an additional method to extract differentially expressed genes the SAM software program (MS Excel application) was used. 24 Microarray signal intensities were transformed as described above and subsequently imputed into the software. A stringent cutoff for significance (tuning parameter delta) for ⁇ 1 false positive rated gene was chosen.
  • the resulting gene expression data was visualized using hierarchical cluster analysis and principal component analysis (GeneMaths XT, Applied Maths, Belgium). For visualization of unsupervised data analyses a variation filter was applied. In order to remove probesets that demonstrated minimal variation across the complete data matrix was filter for standard variances and probes demonstrating the largest variance were selected for analysis.
  • the false discovery rate is an accepted methodology to calculate statistical significance in microarray studies. 64 ' 65 A measure of statistical significance called the q value is associated with each tested feature taking automatically the fact into account, that thousands of genes are simultaneously being tested. The q value of a particular feature in a microarray data set is the expected proportion of false positives incurred when calling that feature significant.
  • a Support Vector Machine is a supervised learning algorithm developed over the past decade by Vapnik et al. 66 and has also recently been used for gene expression data analysis. 67"70
  • the SVM algorithm operates by mapping the given training set of samples into a possibly high-dimensional feature space and attempting to locate in that space a plane that separates the positive from the negative examples. Having found such a plane, the SVM can then predict the classification of an unlabeled example by mapping it into the feature space and asking on which side of the separating plane the example lies.
  • multi-class SVM classifiers were built with linear kernels based on class-specific genes using library LIBSVM version 2.36 (on the world wide web at csie.ntu.edu.tw/ ⁇ cjlin/libsvm/ as of October 27, 2004).
  • Apparent accuracy of the complete data set was estimated by 1 Ofold cross validation. This means that the data set was divided into 10, balanced by diagnosis, equally sized subsets, an SVM-model was trained for 9 subsets and predictions were generated for the remaining subset. This training / prediction process was repeated 10 times to include predictions for each subset. Apparent accuracy is the overall rate of correct predictions. Sensitivity and specificity were calculated as follows:
  • a resampling approach was applied to assess robustness of class prediction: The data set was randomly, but balanced by the respective subtypes, split into a training set, consisting of two thirds of samples, and an independent test set with the remaining one third. Differentially expressed genes were identified in the training set in a one-versus-all (OVA) approach (t-test-statistic), an SVM-model was built from the training set and predictions were made in the test set. This complete process was repeated 100 times. By this means 95% confidence intervals were estimated for accuracy, sensitivity and specificity.
  • OVA one-versus-all
  • Biological networks were generated through the use of Ingenuity Pathways Analysis (January 2004 release version), a web-delivered application that generates networks using differentially expressed genes from expression array data analyses. Networks were generated addressing two different questions, (i) discrimination of t(l Iq23)/MLL from other genetically defined acute leukemia subtypes, and (ii) discrimination of ALL with t(l Iq23)/MLL from AML with t(l Iq23)/MLL samples.
  • the networks are displayed graphically as nodes (genes/gene products) and edges (the biological relationships between the nodes).
  • the intensity of the node color indicates the degree of up- (green) or down- (red) regulation.
  • nodes are displayed using various shapes that represent the functional class of the gene product.
  • Edges are displayed with various labels that describe the nature of the relationship between the nodes (e.g., B for binding, T for transcription).
  • the length of an edge reflects the evidence supporting that node-to- node relationship, in that edges supported by more articles from the literature are shorter. Details relating to Ingenuity Pathways Analysis are also available on the world wide web at ingenuity.com as of 11/4/2004.
  • Figure 1 is a schematic that provides a biological network node shape description
  • Figure 2 is a schematic that provides biological network edge labels
  • Figure 3 is a schematic that shows biological network edge types.
  • t(l Iq23)/MLL samples were combined into one single group.
  • Differentially expressed genes were identified between t(l Iq23)/MLL and all other classes, i.e. AML with t(8;21), inv(16), t(15;17), or complex chromosomal aberrations and distinct precursor B-ALL subtypes with t(8;14), t(9;22), or precursor T-ALL, in a supervised analysis approach (OVA, one-versus all).
  • the application queries the Ingenuity Pathways Knowledge Base for interactions between focus genes and all other gene objects stored in the knowledge base, and generates a set of networks with a network size of 35 genes/gene products.
  • the application then computes a score for each network according to the fit of the user's set of significant genes.
  • the score is derived from a p-value and indicates the likelihood of the focus genes in a network being found together due to random chance.
  • a score of 2 indicates that there is a 1 in 100 chance that the focus genes are together in a network due to random chance. Therefore, scores of 2 or higher have at least a 99% confidence of not being generated by random chance alone.
  • Biological functions are then calculated and assigned to each network. Four networks were further closer evaluated (see, Figure 4).
  • Biological networks were generated that are based on genes discriminating ALL with t(l Iq23)/MLL samples from AML with t(l Iq23)/MLL.
  • ALL with t(l 1 q23)/MLL samples were compared against AML with t(l 1 q23)/MLL samples using a supervised approach and differentially expressed genes were identified.
  • Focus genes are genes from the analysis input data file that meet both of the following criteria: These genes have been designated as being of interest, i.e. discriminating ALL with t(l Iq23)/MLL samples statistically significant from AML with t(l Iq23)/MLL.
  • Data set 1 This data set contains the data provided in Tables 7-10.
  • the differentially expressed genes depicted in the tables are listed according to the corresponding Affymetrix probe set identifier, fold change, q-value, and t-test statistic, respectively.
  • Table 7 Detailed information on the t(l lq23)/MLL patient samples (age, sex, MLL translocation partner, immunophenotype, karyotype)
  • Table 8 Top 50 lower/higher expressed genes in ALL with t(l Iq23)/MLL compared to precursor B-ALL cases with t(9;22), t(8;14), and precursor T- ALL
  • Table 9 Top 50 lower/higher expressed genes in AML with t(l Iq23)/MLL compared to AML with t(8;21), t(15;17), inv(16), and samples with complex aberrant karyotypes
  • Table 10 Top 50 lower/higher expressed genes in t(l Iq23)/MLL leukemias (ALL and AML) compared to precursor B-ALL cases with t(9;22), t(8;14), and precursor T-ALL as well as AML with t(8;21), t(15;17), inv(16), and samples with complex aberrant karyotypes.
  • This data set is supporting the networks visualizing genes distinguishing t(l Iq23)/MLL leukemias from other acute leukemia subtypes. It contains gene expression information on all genes depicted in one of the four t(l Iq23)/MLL specific networks (see, Figure 4). Values in the columns reflect signal intensities and a call of "Present”, “Absent”, or "Marginal” to each probe set. This corresponds to Tables 2-6, which provide the raw expression intensities of the genes contained in the networks (termed as MLL targets).
  • This data set is supporting the networks visualizing differentially expressed genes between ALL with t(l Iq23)/MLL and AML with t(l 1 q23)/MLL. It contains gene expression information on all genes depicted in one of the eight t(l Iq23)/MLL specific networks (see, Figure 9). Values in the columns reflect signal intensities and a call of "Present”, “Absent”, or "Marginal” to each probe set. This corresponds to Table 1.
  • a robust set of differentially expressed genes was identified which accurately stratified the samples according to their underlying cytogenetic and immunophenotypic characteristics, i.e. myeloid subclasses, precursor B-lineage or precursor T-lineage ALL.
  • FIG. 18A displays a principal component analysis of 111 ALL samples based on the differential expression of 262 genes (Table 13).
  • ALL subclasses accurately cluster together.
  • the top 50 genes with higher expression or lower expression, respectively, in ALL with t(l Iq23)/MLL are given in the Table 8.
  • t(l Iq23)/MLL positive samples are clearly distinct from other subtypes of same cell lineage, i.e. myeloid or lymphoblastic. They have a characteristic underlying expression signature compared to other distinct acute leukemia subclasses. Subsequently, all samples were included into one comprehensive analysis. A supervised data analysis algorithm was applied to identify genes that separate each of the nine subtypes from the remaining classes. As shown in Figure 19, the nine distinct acute leukemia subtypes can accordingly be separated. The hierarchical clustering algorithm identified common expression signatures and orders the patient samples accurately by similarities.
  • t(l Iq23)/MLL positive samples are not found to cluster together but rather according to the lineage they are derived from, i.e. a lymphoblastic t(l Iq23)/MLL cluster and a myeloid t(l Iq23)/MLL cluster can be observed.
  • ALL samples with t(l Iq23)/MLL are grouped next to ALL with t(9;22) and t(8;14), and AML with t(l Iq23)/MLL are grouped next to AML with t(15;17) or AML with t(8;21) cases.
  • Downregulated genes included, for example, TNF-receptor superfamily members TNFRSFlOA and TNFRSFlOD, or MADHl, functioning downstream of TGF-beta receptor serine/threonine kinases.
  • Three additional networks are available in Figures 6-8. They visualize networks containing other genes with known relationship with t(l Iq23)/MLL leukemias, e.g. HOXA cluster genes (H0XA5, HOXAlO), as well as the Hox coregulator PBX3, or the tyrosine kinase
  • FLTi FLTi.
  • Other target genes with higher expression in t(l Iq23)/MLL leukemias included HIPl, so far associated with prostate cancer progression, proto-oncogene FRATl, TAFlB, playing a role in the tumori genesis of colorectal carcinomas, and ZFHXlB, a transcriptional corepressor.
  • the top 50 genes with higher expression or lower expression, respectively, in both leukemias with t(l 1 q23)/MLL combined are given in Table 10.
  • the analysis next directly compared expression signatures of ALL with t(l Iq23)/MLL against AML with t(l Iq23)/MLL in a supervised algorithm.
  • differentially expressed genes upregulated candidates in lymphoblastic t(l Iq23)/MLL leukemias demonstrated a dominant pattern according to B-lineage commitment.
  • PAX5 the B-cell lineage specific activator was designated as one of the top-ranked differentially expressed genes.
  • PAX5 target genes BLK and CD19 could also be confirmed upregulated in ALL with t(l Iq23)/MLL by microarray analysis.
  • IGHM encoding the IgM heavy chain
  • VPREBl surrogate light-chain, important for forming the pre-B cell receptor
  • CD22 or CD79A further elucidates the B- lineage commitment of ALL with t(l Iq23)/MLL.
  • genes with higher expression in t(l Iq23)/MLL positive AML included the transcriptional acivator CEBPB, protein tyrosine kinase KIT, MADH2, a transcription factor binding protein and MITF, a transcriptional regulator.
  • FIGS 11-17 Seven additional networks are provided in Figures 11-17. They visualize networks containing genes that further separate t(l Iq23)/MLL leukemias. A myeloid commitment through higher expression in AML with t(l Iq23)/MLL could be demonstrated by differential expression of CEBPA (CCAAT/enhancer binding protein-alpha), a transcription factor required for differentiation of myeloid progenitors, as well as SPIl (PU.1), a critical player in myeloid development, or GM-CSFR, and G-CSFR genes.
  • CEBPA CCAAT/enhancer binding protein-alpha
  • SPIl PU.1
  • differentially expressed candidate genes with higher expression in t(l Iq23)/MLL positive ALL include BCLIlA, also involved in lymphoid malignancies, transcription regulator ETS2, chromatin binding proteins CBX2 and CBX4, and early B cell factor EBF, which can restrict lymphopoiesis to the B cell lineage and works in concert with PAX5 to activate genes required for B cell differentiation.
  • the supplementary networks also contain other differentially expressed genes with higher expression in t(l Iq23)/MLL positive AML.
  • FES a tyrosine kinase oncogene
  • MNDA a tyrosine kinase oncogene
  • CITED4 a CBP/p300-interacting transcriptional transactivator
  • SOCS suppressors of cytokine signaling
  • This SVM model was used to predict samples in the test cohort.
  • MLL-AFlO or MLL- AF 6 samples are classified as MLL- AF9 samples.
  • MLL-AFlO or MLL- AF 6 samples are classified as MLL- AF9 samples.
  • the matrix shows the predicted MLL fusion partner gene. Misclassified samples are given by bold letters.
  • the training set included 2/3 of patients, the test set 1/3, respectively.
  • the test set for each of the 100 runs included 20 samples which were randomly chosen from the total patient cohort to include 1 MLL-AFlO, 2 MLL- AF6, 8 MLL- AF9, 1 MLL-ELL, 7 MLL-AF4, and 1 MLL-ENL sample, respectively.
  • the classification algorithm Table 12
  • 7 MLL-AF4 samples have been predicted by the algorithm 700 times (each sample 100 times).
  • MLL-AF4 has been given correctly 659 times, i.e. on average 6.59 per run.
  • a MLL-AF4 sample has been predicted as MLL- AF9, in 1 prediction as MLL-ELL, and in 31 predictions as MLL-ENL, respectively.
  • the matrix shows the predicted MLL fusion partner gene as determined after 100 runs of SVM-based classifications. Misclassified samples are given by bold letters. Average numbers of predictions per run are given.
  • t(l Iq23)/MLL leukemia- associated genes are related to each other in a novel constellation.
  • HIPl, FRATl, TAFlB and ZFHXL RUNX2 normally plays a key role in osteogenesis but also a direct oncogenic role had been proposed.
  • HIPl encodes an endocytic protein with transforming properties that is involved in a cancer-causing translocation and which is overexpressed in a variety of human cancers.
  • Proto-oncogene FRATl represents the human homologue to mouse proto-oncogene Fratl, which promotes carcinogenesis through activation of the Wnt/beta-catenin/TCF signaling pathway.
  • TAFlB has been identified to play a role in the tumorigenesis of colorectal carcinomas with mi crosatellite instability.
  • ZFHXl encoding Smad-interacting protein 1 (SIPl) directly represses E-cadherin gene transcription and activates cancer invasion via the upregulation of the matrix metalloproteinase gene family. 42
  • Consistently downregulated genes in t(l lq23)/MLL leukemias included TNF- receptor superfamily members required in TRAIL-mediated apoptosis, TNFRSFlOA and TNFRSFlOD, 43 or MADHl (SMADl), functioning downstream of TGF-beta receptor serine/threonine kinases. 44 However, it only can be speculated whether the dysregulated expression of these genes confer any resistance to apoptotic stimuli.
  • the t(l Iq23)/MLL leukemias are generally associated with a high risk of treatment failure and therefore novel therapeutic strategies are needed to improve outcome in patients with 1 Iq23 abnormalities.
  • Small molecule inhibitors of FLT3, a receptor tyrosine kinase may prove to be beneficial. 45 It can be speculated that beside the known mutations affecting the juxtamembrane region and receptor activation loop a constitutive FLT3 signaling caused by high level expression also contributes to the development and maintenance of MLL. In recent studies high levels of FLT3 expression in patients with MLL rearrangements have been identified and FLT3 successfully has been validated as a therapeutic target. 28 ' 46 One also can observe an overexpression of FLT3 in both t(l Iq23)/MLL leukemias compared to other acute leukemia classes (see, e.g., Figure 6).
  • the cases with MLL gene translocations did not cluster as a unique subgroup, but instead clustered according to their lineage of origin. Therefore, it is proposed that MLL aberrations lead to specific expression signatures but that there is a clear identification of lymphoblastic lineage commitment for ALL with t(l Iq23)/MLL. This seems to be in contrast with the previously reported finding that MLL positive leukemias are unique and should be constituted as a distinct disease.
  • PAX5 restricteds the developmental options of lymphoid progenitors to the B cell lineage by repressing the transcription of lineage-inappropriate genes and simultaneously activating the expression of B-lymphoid signaling molecules. Its influence can also be followed more downstream when the analysis focused on PAX5 target genes, also included in the list of top ranked differential genes. It is known that e.g. BLK, or CD 19 are controlled by PAX5.
  • GM-CSFR granulocyte/macrophage colony-stimulating factor
  • G-CSFR granulocyte colony-stimulating factor
  • FES tyrosine kinase oncogene
  • FES may be a key component of the granulocyte differentiation machinery and contributes to lineage determination at the level of multi-lineage hematopoietic progenitors as well as the more committed granulo-monocytic progenitors.
  • MNDA myeloid cell nuclear differentiation antigen
  • MNDA expression further elucidates the myeloid lineage specificity in t(l Iq23)/MLL positive AML.
  • CITED4 a CBP/p300-interacting transcriptional transactivator is significantly higher expressed in AML with t(l Iq23)/MLL.
  • It may function as a co-activator for transcription factor AP-2 and possible roles for CITED4 in regulation of gene expression during development and differentiation of blood cells have been implied.
  • an exploration of the biological networks identified in this analysis may provide new insights into the altered biology of these leukemias and may lead to useful target genes for follow-up experiments.
  • a major goal of this study was to directly assess the influence of the different MLL translocation partners on the transcriptional program in MLL leukemias.
  • a supervised pairwise comparison of MLL-AF9 positive samples against MLL- AF9 negative samples in AML was performed. No statistically significant differences in their gene expression signatures were found.
  • SAM plots in order to visualize the degree of differences in their gene expression pattern it was observed that within AML the MLL- AF9 positive samples were very similar compared to the MLL- AF9 negative samples.
  • no clear subclustering of t(9;l 1)/MLL-AF9 positive samples was observed.
  • MLL-AF9 MLL-AF9
  • MLL-AF6 MLL-AF6
  • MLL-AFl MLL-AFl
  • fusion partner e.g. AF4 in lymphoblastic leukemias, or AF9 in myeloid leukemias.
  • the fusion partner might be able to contribute to cell-fate decisions.
  • the different MLL fusion proteins would dictate the respective differentiation pathway by facilitating the establishment of lineage-specific gene expression programs. In the gene expression patterns described here a strong association of lymphoid commitment in ALL with t(l Iq23)/MLL was observed.
  • MLL-ENL samples were separated.
  • the t(l 1 ;19)(q23;pl3.1) chromosomal translocation fuses the gene encoding transcriptional elongation factor ELL to the MLL gene.
  • Recent data indicates that neoplastic transformation by the MLL-ELL fusion protein is likely to result from aberrant transcriptional activation of MLL target genes.
  • the clustering described here would further support a hypothesis of tumor tropism where the MLL-ENL fusion protein can no longer influence the differentiation pathway. As a consequence these data may explain that not the translocation partner gene but rather the cellular lineage are influencing the observed major changes in expression signatures in t(l Iq23)/MLL leukemias.
  • MLL gene associated leukemias are their frequency as chemotherapy-related leukemias. 62 This was not in the focus of the presented analyses. However, in a recent study, a significant difference in outcome was demonstrated in AML with t(l Iq23)/MLL rearrangement between de novo and therapy-related cases. 3 Therefore, future studies may also be directed to study gene expression profiles in these patient cohorts.
  • microarray analyses might help to further understand the biology in these leukemias that develop after a relatively short latent period after treatment of a primary malignancy and often follow the use of drugs that inhibit the activity of DNA-topoisomerase II.
  • Microarray technology demonstrated that based on a cohort of thoroughly characterized leukemia samples, expression signatures lead to a better understanding of biological features of these specific acute leukemia subtypes. Novel networks of candidate genes were depicted and may inspire follow-up studies to elucidate the events leading to these types of prognostically unfavorable acute leukemias and may be exploited to identify new therapeutic targets.
  • EXAMPLE 2 GENERAL MATERIALS. METHODS AND DEFINITIONS OF FUNCTIONAL ANNOTATIONS The methods section contains both information on statistical analyses used for identification of differentially expressed genes and detailed annotation data of identified microarray probe sets.
  • sequence data are omitted due to their large size, and because they do not change, whereas the annotation data are updated periodically, for example new information on chromosomal location and functional annotation of the respective gene products. Sequence data are available to download in the NetAffx Download Center on the world wide web at affymetrix.com.
  • Microarray probe sets for example, found to be differentially expressed between different types of leukemia samples are further described by additional information.
  • the fields are of the following types:
  • HG-U 133 ProbeSetJD describes the probe set identifier. Examples are: 200007 _at, 20001 l_s_at,200012_x_at.
  • Sequence Type indicates whether the sequence is an Exemplar, Consensus or
  • Control sequence An Exemplar is a single nucleotide sequence taken directly from a public database. This sequence could be an mRNA or an expressed sequence tag (EST).
  • EST expressed sequence tag
  • a Consensus sequence is a nucleotide sequence assembled by
  • Affymetrix based on one or more sequence taken from a public database.
  • Transcript ID The cluster identification number with a sub-cluster identifier appended.
  • Sequence Derived From The accession number of the single sequence, or representative sequence on which the probe set is based. Refer to the "Sequence Source” field to determine the database used.
  • Consensus sequences Affymetrix identification number or public accession number.
  • Sequence Source The database from which the sequence used to design this probe set was taken.
  • GenBank® GenBank®, RefSeq, UniGene, TIGR (annotations from The Institute for Genomic Research).
  • Gene Symbol and Title A gene symbol and a short title, when one is available. Such symbols are assigned by different organizations for different species. Affymetrix annotational data comes from the UniGene record. There is no indication which species-specific databank was used, but some of the possibilities include for example HUGO: The
  • MapLocation The map location describes the chromosomal location when one is available.
  • Unigene Accession UniGene accession number and cluster type. Cluster type can be "full length" or
  • LocusLink This information represents the LocusLink accession number. FuIl Length Ref. Sequences Indicates the references to multiple sequences in RefSeq. The field contains the ID and description for each entry, and there can be multiple entries per probeSet.
  • Tusher VG Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc.Natl.Acad.Sci.U.S.A. 2001 ;98:5116-5121.

Abstract

The present invention relates to rapid and reliable approaches, based on gene expression, to detecting and genotyping leukemia. In some embodiments, for example, methods are provided for genotyping acute leukemia cells with t(11q23)/MLL. In addition to methods, the invention also provides related kits and systems.

Description

GENE EXPRESSION PROFILING OF LEUKEMIAS WITH MLL GENE REARRANGEMENTS
FIELD OF THE INVENTION
The present invention relates to the detection and classification of leukemia and accordingly, provides diagnostic and/or prognostic information in certain embodiments.
BACKGROUND OF THE INVENTION
Leukemias are generally classified into four different groups or types: acute myeloid (AML), acute lymphatic (ALL), chronic myeloid (CML) and chronic lymphatic leukemia (CLL). Within these groups, several subcategories or subtypes can be identified using various approaches. These different subcategories of leukemia are associated with varying clinical outcomes and therefore can serve as guides to the selection of different treatment strategies. The importance of highly specific classification may be illustrated for AML as a very heterogeneous group of diseases. Effort has been aimed at identifying biological entities and to distinguish and classify subgroups of AML that are associated with, e.g., favorable, intermediate or unfavorable prognoses. In 1976, for example, the FAB classification was proposed by the French-American-British co-operative group that utilizes cytomorphology and cytochemistry to separate AML subgroups according to the morphological appearance of blasts in the blood and bone marrow.
In addition, genetic abnormalities occurring in leukemic blasts were recognized as having a major impact on the morphological picture and on prognosis. As a consequence, the karyotype of leukemic blasts is commonly used as an independent prognostic factor regarding response to therapy as well as survival. A combination of methods is typically used to obtain the diagnostic information in leukemia. To illustrate, the analysis of the morphology and cytochemistry of bone marrow blasts and peripheral blood cells is commonly used to establish a diagnosis. In some cases, for example, immunophenotyping is also utilized to separate an undifferentiated AML from acute lymphoblastic leukemia and from CLL. In certain instances, leukemia subtypes can be diagnosed by cytomorphology alone, but this typically requires that an expert review sample smears. However, genetic analysis based on, e.g., chromosome analysis, fluorescence in situ hybridization (FISH), or reverse transcription PCR (RT-PCR) and immunophenotyping is also generally used to accurately assign cases to the correct category. An aim of these techniques, aside from diagnosis, is to determine the prognosis of the leukemia under consideration. One disadvantage of these methods, however, is that viable cells are generally necessary, as the cells used for genetic analysis need to divide in vitro in order to obtain metaphases for the analysis. Another exemplary problem is the long lag period (e.g., 72 hours) that typically occurs between the receipt of the materials to be analyzed in the laboratory and the generation of results. Furthermore, great experience in preparing chromosomes and analyzing karyotypes is generally needed to obtain correct results in most cases. Using these techniques in combination, hematological malignancies can be separated into CML, CLL, ALL, and AML. Within the latter three disease entities, several prognostically relevant subtypes have been identified. This further sub-classification commonly relies on genetic abnormalities of leukemic blasts and is associated with different prognoses.
The sub-classification of leukemias is used increasingly as a guide to the selection of appropriate therapies. The development of new, specific drugs and treatment approaches often includes the identification of specific subtypes that may benefit from a distinct therapeutic protocol and thus, improve the outcomes of distinct subsets of leukemia. For example, the therapeutic drug (STI571) inhibits the CML specific chimeric tyrosine kinase BCR-ABL generated from the genetic defect observed in CML, the BCR-ABL-rearrangement due to the translocation between chromosomes 9 and 22 (t(9;22) (q34;qll)). In patients treated with this new drug, the therapy response is dramatically higher as compared to other drugs that have previously been used. Another example is a subtype of acute myeloid leukemia,
AML M3 and its variant M3v, which both include the karyotype t(15;17)(q22;qll- 12). The introduction of all-trans retinoic acid (ATRA) has improved the outcome in this subgroup of patient from about 50% to 85% long-term survivors. Accordingly, the rapid and accurate identification of distinct leukemia subtypes is of consequence to further drug development in addition to diagnostics and prognostics. According to Golub et al. (Science, 1999, 286, 531-7, which is incorporated by reference), gene expression profiles can be used for class prediction and discriminating AML from ALL samples. However, for the analysis of acute leukemias the selection of the two different subgroups was performed using exclusively morphologic-phenotypical criteria. This was only descriptive and did not provide deeper insights into the pathogenesis or the underlying biology of the leukemia. The approach reproduces only very basic knowledge of cytomorphology and intends to differentiate classes. However, the data generated via such an approach is generally not sufficient to predict prognostically relevant cytogenetic aberrations.
SUMMARY OF THE INVENTION
The present invention relates to rapid, cost effective, and reliable approaches to detecting and genotyping leukemia. In certain embodiments, for example, methods are provided for genotyping acute leukemia cells with t(l Iq23)/MLL. To further illustrate, the invention also provides methods for distinguishing acute myeloid leukemia (AML) cells with t(l Iq23)/MLL from acute lymphoblastic leukemia (ALL) cells with t(l Iq23)/MLL in some embodiments. Aside from providing diagnostic information to patients, these distinctions can also assist in selecting appropriate therapies and in prognostication. In some embodiments, these methods include profiling the expression of selected populations of genes using real-time
PCR analysis, oligonucleotide arrays, or the like. In addition to methods of genotyping leukemia, the invention also provides related kits and systems.
In one aspect, the invention provides a method of genotyping a leukemia cell. The method includes detecting an expression level of at least one set of genes in or derived from at least one target human leukemia cell. The target human leukemia cell is generally obtained from a subject. Typically, the set of genes is selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14. In some embodiments, for example, the set of genes in or derived from the target human leukemia cell comprises at least about 10, 100, 1000, 10000, or more members. In addition, the method also includes correlating a detected differential expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell lacking t(l Iq23)/MLL with the target human leukemia cell having t(l Iq23)/MLL; correlating a detected substantially identical expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell lacking t(l Iq23)/MLL with the target human leukemia cell lacking t(l Iq23)/MLL; correlating a detected differential expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell having t(l Iq23)/MLL with the target human leukemia cell lacking t(l Iq23)/MLL; or correlating a detected substantially identical expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell having t(l Iq23)/MLL with the target human leukemia cell having t(l Iq23)/MLL, thereby genotyping the leukemia cell. In some embodiments, the reference human leukemia cell lacking t(l Iq23)/MLL comprises a precursor B-ALL cell with t(9;22), a precursor B-ALL cell with t(8;14), a precursor T-ALL cell, an AML cell with t(8;21), an AML cell with t(15;17), an AML cell with inv(16), or an AML cell with a complex aberrant karyotype. In certain embodiments, the detected differential expression of the genes comprises at least about a 5% difference, whereas the detected substantially identical expression of the genes comprises less than about a 5% difference.
In certain embodiments, the method includes correlating a detected differential expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference ALL cell having t(l Iq23)/MLL with the target human acute leukemia being an AML cell; or correlating a detected substantially identical expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference AML cell having t(l Iq23)/MLL with the target human acute leukemia being an AML cell. In some embodiments, the method includes correlating a detected differential expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference AML cell having t(l Iq23)/MLL with the target human acute leukemia being an ALL cell; or correlating a detected substantially identical expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference ALL cell having t(l Iq23)/MLL with the target human acute leukemia being an ALL cell. In addition, the markers described herein are also optionally used for cross-lineage comparisons, such as ALL with t(l Iq23)/MLL compared to AML without t(l Iq23)/MLL, AML with t(l Iq23)/MLL compared to ALL without t(l Iq23)/MLL, and the like.
Expression levels are detected using essentially any gene expression profiling technique. In some embodiments, for example, the expression level is detected using an array, a robotics system, and/or a microfluidic device. In certain embodiments, the expression level of the set of genes is detected by amplifying nucleic acid sequences associated with the genes to produce amplicons and detecting the amplicons. In these embodiments, the amplicons are generally detected using a process that comprises one or more of: hybridizing the amplicons to an oligonucleotide array, digesting the amplicons with a restriction enzyme, or real-time polymerase chain reaction (PCR) analysis. In certain embodiments, the expression level of the set of genes is detected by, e.g., measuring quantities of transcribed polynucleotides (e.g., mRNAs, cDNAs, etc.) or portions thereof expressed or derived from the genes. In some embodiments, the expression level is detected by, e.g., contacting polynucleotides or polypeptides expressed from the genes with compounds (e.g., aptamers, antibodies or fragments thereof, etc.) that specifically bind the polynucleotides or polypeptides.
In another aspect, the invention provides a method of producing a reference data bank for genotyping leukemia cells. The method includes (a) compiling a gene expression profile of a patient sample by detecting the expression level of one or more genes of at least one human leukemia cell, which genes are selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14, and (b) classifying the gene expression profile using a machine learning algorithm. In still another aspect, the invention provides a kit that includes one or more probes that correspond to at least portions of genes or expression products thereof, which genes are selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14. In some embodiments, at least one solid support comprises the probes. Optionally, the kit also includes one or more additional reagents to perform real-time PCR analyses. The kit also includes instructions for correlating detected expression levels of polynucleotides and/or polypeptides in at least one target leukemia cell from a human subject, which polynucleotides and/or polypeptides are targets of one or more of the probes, with the target leukemia cell comprising a t(l Iq23)/MLL.
In another aspect, the invention provides a system that includes one or more probes that correspond to at least portions of genes or expression products thereof, which genes are selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14. In some embodiments, at least one solid support comprises the probes. In certain embodiments, the system includes one or more additional reagents and/or components to perform real-time PCR analyses. The system also includes at least one reference data bank for correlating detected expression levels of polynucleotides and/or polypeptides in at least one target leukemia cell from a human subject, which polynucleotides and/or polypeptides are targets of one or more of the probes, with the target leukemia cell comprising a t(l Iq23)/MLL. The reference data bank is generally produced by, e.g., (a) compiling a gene expression profile of a patient sample by detecting the expression level at least one of the genes, and (b) classifying the gene expression profile using a machine learning algorithm. The machine learning algorithm is generally selected from, e.g., a weighted voting algorithm, a K-nearest neighbors algorithm, a decision tree induction algorithm, a support vector machine, a feed-forward neural network, etc.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a schematic that provides a biological network node shape description.
Figure 2 is a schematic that provides biological network edge labels. Figure 3 is a schematic that shows biological network edge types. Figure 4 shows focus and non-focus genes and statistical scores for 4 networks generated based on n=402 focus genes that were identified to discriminate t(l Iq23)/MLL samples statistically significant from other acute leukemia subclasses. Focus genes are given in bold letters. Genes that are marked by asterisks were represented by multiple Affymetrix probe set identifiers in the input file.
Figure 5 is a graphical display of biological network 1 referred to in Figure 4. In particular, the network is graphically displayed with genes/gene products as nodes and the biological relationships between the nodes as edges. The intensity of the node color indicates the degree of differential gene expression. Green intensities correspond to a lower expression (downregulated fold change) in t(l Iq23)/MLL cases compared to AML subtypes (inv(16), t(8;21), t(15;17), complex karyotypes) or ALL subtypes (t(9;22), t(8;14), T-ALL), respectively. Red intensities correspond to a higher expression in t(l Iq23)/MLL cases (upregulated fold change), respectively. Nodes are displayed using various shapes that represent the functional class of the gene product. Edges are displayed with various labels that describe the nature of the relationship between the nodes (e.g., B for binding, T for transcription). The length of an edge reflects the evidence supporting that node-to- node relationship, in that edges supported by more articles from the literature are shorter. Note: Focus genes were included in the original text format file derived from the list of differentially expressed genes. Non-focus genes were derived from queries for interactions between focus genes and all other gene objects stored in the Ingenuity knowledge data base. Gene expression raw data is provided in Tables 2- 6. Figure 6 is a graphical display of biological network 2 referred to in Figure 4.
Figure 7 is a graphical display of biological network 3 referred to in Figure 4. Figure 8 is a graphical display of biological network 4 referred to in Figure 4.
Figure 9 shows focus and non-focus genes and statistical scores for eight networks generated based on n=416 focus genes that were identified to be differentially expressed between ALL with t(l 1 q23)/MLL and AML with t(l Iq23)/MLL samples. Focus genes are given in bold letters. Genes that are marked by asterisks were represented by multiple Affymetrix probe set identifiers in the input file. Gene expression data is provided in Table 1. The raw data refers to all lineage networks (see, Figures 10-17).
Figure 10 graphically shows differentially expressed genes between ALL with t(l Iq23)/MLL and AML with t(l Iq23)/MLL, and corresponds to network 1 referred to in Figure 9. In particular, a biological network is displayed graphically. Additional details for the legend are provided in Figures 1-3. Green intensities correspond to a lower expression in ALL with t(l Iq23)/MLL cases compared to AML with t(l Iq23)/MLL samples (downregulated fold change). Red intensities correspond to a higher expression in ALL with t(l Iq23)/MLL cases compared to
AML with t(l Iq23)/MLL samples (upregulated fold change).
Figure 11 is a graphically displayed biological network corresponding to network 2 referred to in Figure 9.
Figure 12 is a graphically displayed biological network corresponding to network 3 referred to in Figure 9.
Figure 13 is a graphically displayed biological network corresponding to network
4 referred to in Figure 9.
Figure 14 is a graphically displayed biological network corresponding to network
5 referred to in Figure 9. Figure 15 is a graphically displayed biological network corresponding to network
6 referred to in Figure 9.
Figure 16 is a graphically displayed biological network corresponding to network
7 referred to in Figure 9.
Figure 17 is a graphically displayed biological network corresponding to network 8 referred to in Figure 9.
Figures 18 A and B are principal component analyses including various acute leukemia subtypes. The leukemia samples are plotted in a three-dimensional space using the three components capturing most of the variance in the original data set. Each patient sample is represented by a single color-coded sphere. The labels and coloring of the classes were added after the analysis for means for better visualization. (Panel A) Adult ALL of the four subcategories precursor B-ALL samples comprising t(l Iq23)/MLL (n=25), t(9;22) (n=47), t(8;14) (n=16), and precursor T-ALL (n=47) are accurately separated based on 262 differentially expressed genes (Table 13). (Panel B) Adult AML samples including t(l Iq23)/MLL (n=48), t(8;21) (n=38), t(15;17) (n=42), inv(16) (n=49), and complex aberrant karyotypes (n=75) are accurately separated based on 416 differentially expressed genes (Table 14).
Figure 19 shows a hierarchical cluster analysis of n=378 adult ALL and AML samples (columns). The normalized expression value for each gene (given in rows) is coded by color (standard deviation from mean). Red cells indicate high expression and green cells indicate low expression. The coloring of the groups is identical to Figures 18 A and B. The t(l Iq23)/MLL leukemias are highlighted by arrows. Figures 20 A and B show an unsupervised analysis of adult ALL and AML t(l Iq23)/MLL samples. Unsupervised analysis using a selection of 5,000 genes that showed the largest variance across all samples. (Panel A) In the three- dimensional PCA plot data points with similar characteristics will cluster together. Here, each patient's expression pattern is represented by a single color-coded sphere. ALL with t(l Iq23)/MLL samples are labeled mauve, AML with t(l Iq23)/MLL are labeled turquoise, respectively. The labels and coloring of the classes were added after the analysis for means for better visualization. (Panel B) Enlarged dendrogram of ALL and AML t(l Iq23)/MLL samples when analyzed by unsupervised hierarchical clustering (cluster algorithm: Ward; selected coefficient: Euclidean distance). For each sample the respective immunophenotype, FAB subtype and MLL fusion partner gene as confirmed by FISH and/or PCR-based molecular analyses is given. MLL-X indicates samples with unknown partner genes. Two of the 48 MLL gene rearranged AML are contained in the ALL branch of the dendrogram. Figure 21 graphically shows the supervised identification of differentially expressed genes. The left plot shows a supervised analysis of AML samples comparing a group of t(9;l 1) positive cases (n=23) against non-t(9;l 1) positive samples (n=25). Here, no statistically significant differentially expressed genes were identified. The right plot shows a supervised comparison of ALL with t(l Iq23)/MLL versus AML with t(l Iq23)/MLL. Red dots indicate genes with higher expression in AML with t(l Iq23)/MLL and green dots indicate higher expressed genes in ALL with t(l Iq23)/MLL.
Figure 22 shows an unsupervised analysis of adult ALL and AML t(l Iq23)/MLL samples. The unsupervised analysis is based on 5,000 genes that showed the largest variance across all samples. For better visualization the labels and coloring of the classes were added after the analysis. In the three-dimensional PCA plot data points with similar characteristics will cluster together. Here, each patient's expression pattern is represented by a single color-coded sphere. For each sample the MLL fusion partner gene as confirmed by FISH and/or PCR-based molecular analyses is given. MLL-X indicates samples with unknown partner genes.
DETAILED DESCRIPTION
DEFINITIONS
Before describing the present invention in detail, it is to be understood that this invention is not limited to particular embodiments. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. Units, prefixes, and symbols are denoted in the forms suggested by the International System of Units (SI), unless specified otherwise. Numeric ranges are inclusive of the numbers defining the range. As used in this specification and the appended claims, the singular forms "a", "an" and "the" also include plural referents unless the context clearly dictates otherwise. To illustrate, reference to "a cell" includes two or more cells. Further, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. The terms defined below, and grammatical variants thereof, are more fully defined by reference to the specification in its entirety. "Ilq23/MLL" refers an 1 Iq23 rearrangement of the human MLL gene. An "antibody" refers to a polypeptide substantially encoded by at least one immunoglobulin gene or fragments of at least one immunoglobulin gene, which can participate in specific binding with a ligand. The term "antibody" includes polyclonal and monoclonal antibodies and biologically active fragments thereof including among other possibilities "univalent" antibodies (Glennie et al. (1982)
Nature 295:712); Fab proteins including Fab1 and F(ab')2 fragments whether covalently or non-covalently aggregated; light or heavy chains alone, typically variable heavy and light chain regions (VH and VL regions), and more typically including the hypervariable regions (otherwise known as the complementarity determining regions (CDRs) of the VH and VL regions); Fc proteins; "hybrid" antibodies capable of binding more than one antigen; constant-variable region chimeras; "composite" immunoglobulins with heavy and light chains of different origins; "altered" antibodies with improved specificity and other characteristics as prepared by standard recombinant techniques, by mutagenic techniques, or other directed evolutionary techniques known in the art. Derivatives of antibodies include scFvs, chimeric and humanized antibodies. See, e.g., Harlow and Lane, Antibodies, a laboratory manual, CSH Press (1988), which is incorporated by reference. For the detection of polypeptides using antibodies or fragments thereof, there are a variety of methods known to a person skilled in the art, which are optionally utilized. Examples include immunoprecipitations, Western blottings,
Enzyme-linked immuno sorbent assays (ELISA), radioimmunoassays (RIA), dissociation-enhanced lanthanide fluoro immuno assays (DELFIA), scintillation proximity assays (SPA). To facilitate detection, an antibody is typically labeled by one or more of the labels described herein or otherwise known to persons skilled in the art.
In general, an "array" or "microarray" refers to a linear or two- or three dimensional arrangement of preferably discrete nucleic acid or polypeptide probes which comprises an intentionally created collection of nucleic acid or polypeptide probes of any length spotted onto a substrate/solid support. The person skilled in the art knows a collection of nucleic acids or polypeptide spotted onto a substrate/solid support also under the term "array". As also known to the person skilled in the art, a microarray usually refers to a miniaturized array arrangement, with the probes being attached to a density of at least about 10, 20, 50, 100 nucleic acid molecules referring to different or the same genes per cm2. Furthermore, where appropriate an array can be referred to as "gene chip". The array itself can have different formats, e.g., libraries of soluble probes or libraries of probes tethered to resin beads, silica chips, or other solid supports.
"Complementary" and "complementarity", respectively, can be described by the percentage, i.e., proportion, of nucleotides that can form base pairs between two polynucleotide strands or within a specific region or domain of the two strands. Generally, complementary nucleotides are, according to the base pairing rules, adenine and thymine (or adenine and uracil), and cytosine and guanine.
Complementarity may be partial, in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or, there may be a complete or total complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has effects on the efficiency and strength of hybridization between nucleic acid strands.
Two nucleic acid strands are considered to be 100% complementary to each other over a defined length if in a defined region all adenines of a first strand can pair with a thymine (or an uracil) of a second strand, all guanines of a first strand can pair with a cytosine of a second strand, all thymine (or uracils) of a first strand can pair with an adenine of a second strand, and all cytosines of a first strand can pair with a guanine of a second strand, and vice versa. According to the present invention, the degree of complementarity is determined over a stretch of about 20 or 25 nucleotides, i.e., a 60% complementarity means that within a region of 20 nucleotides of two nucleic acid strands 12 nucleotides of the first strand can base pair with 12 nucleotides of the second strand according to the above base pairing rules, either as a stretch of 12 contiguous nucleotides or interspersed by non-pairing nucleotides, when the two strands are attached to each other over the region of 20 nucleotides. The degree of complementarity can range from at least about 50% to full, i.e., 100% complementarity. Two single nucleic acid strands are said to be "substantially complementary" when they are at least about 80% complementary, and more typically about 90% complementary or higher. For carrying out the methods of present invention substantial complementarity is generally utilized.
Two nucleic acids "correspond" when they have substantially identical or complementary sequences, when one nucleic acid is a subsequence of the other, or when one sequence is derived naturally or artificially from the other.
The term "differential gene expression" refers to a gene or set of genes whose expression is activated to a higher or lower level in a subject suffering from a disease, (e.g., cancer) relative to its expression in a normal or control subject. Differential gene expression can also occur between different types or subtypes of diseased cells. The term also includes genes whose expression is activated to a higher or lower level at different stages of the same disease. It is also understood that a differentially expressed gene may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product. Such differences may be evidenced by a change in mRNA levels, surface expression, secretion or other partitioning of a polypeptide, for example. Differential gene expression may include a comparison of expression between two or more genes or their gene products, or a comparison of the ratios of the expression between two or more genes or their gene products, or even a comparison of two differently processed products of the same gene, which differ between, e.g., normal subjects and subjects suffering from a disease, various stages of the same disease, different types or subtypes of diseased cells, etc. Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a gene or its expression products among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages. In certain embodiments,
"differential gene expression" is considered to be present when there is at least an about two-fold, typically at least about four-fold, more typically at least about six¬ fold, most typically at least about ten-fold difference between, e.g., the expression of a given gene in normal and diseased subjects, in various stages of disease development in a diseased subject, different types or subtypes of diseased cells, etc. The term "expression" refers to the process by which mRNA or a polypeptide is produced based on the nucleic acid sequence of a gene, i.e., "expression" also includes the formation of mRNA in the process of transcription. The term "determining the expression level" refers to the determination of the level of expression of one or more markers.
The term "genotype" refers to a description of the alleles of a gene or genes contained in an individual or a sample. As used herein, no distinction is made between the genotype of an individual and the genotype of a sample originating from the individual. Although, typically, a genotype is determined from samples of diploid cells, a genotype can be determined from a sample of haploid cells, such as a sperm cell.
The term "gene" refers to a nucleic acid sequence encoding a gene product. The gene optionally comprises sequence information required for expression of the gene (e.g., promoters, enhancers, etc.). The term "gene expression data" refers to one or more sets of data that contain information regarding different aspects of gene expression. The data set optionally includes information regarding: the presence of target-transcripts in cell or cell- derived samples; the relative and absolute abundance levels of target transcripts; the ability of various treatments to induce expression of specific genes; and the ability of various treatments to change expression of specific genes to different levels.
Nucleic acids "hybridize" when they associate, typically in solution. Nucleic acids hybridize due to a variety of well-characterized physico-chemical forces, such as hydrogen bonding, solvent exclusion, base stacking and the like. In certain embodiments, hybridization occurs under conventional hybridization conditions, such as under stringent conditions as described, for example, in Sambrook et al., in "Molecular Cloning: A Laboratory Manual" (1989), Eds. J. Sambrook, E. F. Fritsch and T. Maniatis, Cold Spring Harbour Laboratory Press, Cold Spring Harbour, NY, which is incorporated by reference. Such conditions are, for example, hybridization in 6x SSC, pH 7.0 / 0.1 % SDS at about 45°C for 18-23 hours, followed by a washing step with 2x SSC/1 % SDS at 50°C. In order to select the stringency, the salt concentration in the washing step can, for example, be chosen between 2x SSC/0.1 % SDS at room temperature for low stringency and 0.2x SSC/0.1 % SDS at 50°C for high stringency. In addition, the temperature of the washing step can be varied between room temperature (ca. 22°C), for low stringency, and 65°C to 7O0C for high stringency. Also contemplated are polynucleotides that hybridize at lower stringency hybridization conditions. Changes in the stringency of hybridization and signal detection are primarily accomplished through the manipulation of, e.g., formamide concentration (lower percentages of formamide result in lowered stringency), salt conditions, or temperature. For example, lower stringency conditions include an overnight incubation at 370C in a solution comprising 6X SSPE (2OX SSPE = 3M NaCl; 0.2M NaH2PO4; 0.02M EDTA, pH 7.4), 0.5% SDS, 30% formamide, 100 mg/mL salmon sperm blocking DNA, followed by washes at 500C with 1 X SSPE, 0.1% SDS. In addition, to achieve even lower stringency, washes performed following stringent hybridization can be done at higher salt concentrations (e.g., 5x SSC).
Variations in the above conditions may be accomplished through the inclusion and/or substitution of alternate blocking reagents used to suppress background in hybridization experiments. The inclusion of specific blocking reagents may require modification of the hybridization conditions described herein, due to problems with compatibility. An extensive guide to the hybridization of nucleic acids is found in
Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology- Hybridization with Nucleic Acid Probes part I chapter 2, "Overview of principles of hybridization and the strategy of nucleic acid probe assays," (Elsevier, New York), as well as in Ausubel (Ed.) Current Protocols in Molecular Biology, Volumes I, II, and III, (1997), which are each incorporated by reference. Hames and Higgins (1995) Gene Probes 1 IRL Press at Oxford University Press, Oxford, England, (Hames and Higgins 1) and Hames and Higgins (1995) Gene Probes 2 IRL Press at Oxford University Press, Oxford, England (Hames and Higgins 2) provide details on the synthesis, labeling, detection and quantification of DNA and RNA, including oligonucleotides. Both Hames and Higgins 1 and 2 are incorporated by reference. "inv(16)" refers to AML with inversion 16 according to the WHO classification of haematological malignancies.
A "label" refers to a moiety attached (covalently or non-covalently), or capable of being attached, to a molecule (e.g., a polynucleotide, a polypeptide, etc.), which moiety provides or is capable of providing information about the molecule (e.g., descriptive, identifying, etc. information about the molecule) or another molecule with which the labeled molecule interacts (e.g., hybridizes, etc.). Exemplary labels include fluorescent labels (including, e.g., quenchers or absorbers), non-fluorescent labels, colorimetric labels, chemiluminescent labels, bioluminescent labels, radioactive labels (such as 3H, 35S, 32P, 1251, 57Co or 14C), mass-modifying groups, antibodies, antigens, biotin, haptens, digoxigenin, enzymes (including, e.g., peroxidase, phosphatase, etc.), and the like. To further illustrate, fluorescent labels may include dyes that are negatively charged, such as dyes of the fluorescein family, or dyes that are neutral in charge, such as dyes of the rhodamine family, or dyes that are positively charged, such as dyes of the cyanine family. Dyes of the fluorescein family include, e.g., FAM, HEX, TET, JOE, NAN and ZOE. Dyes of the rhodamine family include, e.g., Texas Red, ROX, Rl 10, R6G, and TAMRA. FAM, HEX, TET, JOE, NAN, ZOE, ROX, Rl 10, R6G, and TAMRA are commercially available from, e.g., Perkin-Elmer, Inc. (Wellesley, MA, USA), and Texas Red is commercially available from, e.g., Molecular Probes, Inc. (Eugene,
OR, USA). Dyes of the cyanine family include, e.g., Cy2, Cy3, Cy3.5, Cy5, Cy5.5, and Cy7, and are commercially available from, e.g., Amersham Biosciences Corp. (Piscataway, NJ, USA). Suitable methods include the direct labeling (incorporation) method, an amino-modified (amino-allyl) nucleotide method (available e.g. from Ambion, Inc. (Austin, TX, USA), and the primer tagging method (DNA dendrirner labeling, as kit available e.g. from Genisphere, Inc. (Hatfield, PA, USA)). In some embodiments, biotin or biotinylated nucleotides are used for labeling, with the latter generally being directly incorporated into, e.g., the cRNA polynucleotide by in vitro transcription. The term "lower expression" refers an expression level of one or more markers from a target that is less than a corresponding expression level of the markers in a reference. In certain embodiments, "lower expression" is assigned to all by numbers and Affymetrix Id. definable polynucleotides the t-values and fold change (fc) values of which are negative. Similarly, the term "higher expression" refers an expression level of one or more markers from a target that is more than a corresponding expression level of the markers in a reference. In some embodiments, "higher expression" is assigned to all by numbers and Affymetrix Id. definable polynucleotides the t-values and fold change (fc) values of which are positive.
A "machine learning algorithm" refers to a computational-based prediction methodology, also known to persons skilled in the art as a "classifier", employed for characterizing a gene expression profile. The signals corresponding to certain expression levels, which are obtained by, e.g., microarray-based hybridization assays, are typically subjected to the algorithm in order to classify the expression profile. Supervised learning generally involves "training" a classifier to recognize the distinctions among classes and then "testing" the accuracy of the classifier on an independent test set. For new, unknown samples the classifier can be used to predict the class in which the samples belong.
The term "marker" refers to a genetically controlled difference that can be used in the genetic analysis of a test or target versus a control or reference sample for the purpose of assigning the sample to a defined genotype or phenotype. In certain embodiments, for example, "markers" refer to genes, polynucleotides, polypeptides, or fragments or portions thereof that are differentially expressed in, e.g., different leukemia types and/or subtypes. The markers can be defined by their gene symbol name, their encoded protein name, their transcript identification number (cluster identification number), the data base accession number, public accession number and/or GenBank identifier. Markers can also be defined by their
Affymetrix identification number, chromosomal location, UniGene accession number and cluster type, and/or LocusLink accession number. The Affymetrix identification number (affy id) is accessible for anyone and the person skilled in the art by entering the "gene expression omnibus" internet page of the National Center for Biotechnology Information (NCBI) on the world wide web at ncbi.nlm.nih.gov/geo/ as of 11/4/2004. In particular, the affy id's of the polynucleotides used for certain embodiments of the methods described herein are derived from the so-called human genome Ul 33 chip (Affymetrix, inc., Santa Clara, CA, USA). The sequence data of each identification number can be viewed on the world wide web at, e.g., ncbi.nlm.nih.gov/projects/geo/ as of 11/4/2004 using the accession number GPL96 for U133A annotational data and accession number GPL97 for U133B annotational data. In some embodiments, the expression level of a marker is determined by the determining the expression of its corresponding polynucleotide.
The term "normal karyotype" refers to a state of those cells lacking any visible karyotype abnormality detectable with chromosome banding analysis.
The term "nucleic acid" refers to a polymer of monomers that can be corresponded to a ribose nucleic acid (RNA) or deoxyribose nucleic acid (DNA) polymer, or analog thereof. This includes polymers of nucleotides such as RNA and DNA, as well as modified forms thereof, peptide nucleic acids (PNAs), locked nucleic acids (LN A™s), and the like. In certain applications, the nucleic acid can be a polymer that includes multiple monomer types, e.g., both RNA and DNA subunits. A nucleic acid can be or include, e.g., a chromosome or chromosomal segment, a vector (e.g., an expression vector), an expression cassette, a naked DNA or RNA polymer, the product of a polymerase chain reaction (PCR) or other nucleic acid amplification reaction, an oligonucleotide, a probe, a primers, etc. A nucleic acid can be e.g., single-stranded or double-stranded. Unless otherwise indicated, a particular nucleic acid sequence optionally comprises or encodes complementary sequences, in addition to any sequence explicitly indicated.
Oligonucleotides (e.g., probes, primers, etc.) of a defined sequence may be produced by techniques known to those of ordinary skill in the art, such as by chemical or biochemical synthesis, and by in vitro or in vivo expression from recombinant nucleic acid molecules, e.g., bacterial or retroviral vectors.
Oligonucleotides which are primer and/or probe sequences, as described below, may comprise DNA, RNA or nucleic acid analogs such as uncharged nucleic acid analogs including but not limited to peptide nucleic acids (PNAs) which are disclosed in International Patent Application WO 92/20702 or morpholino analogs which are described in U.S. Pat. Nos. 5,185,444, 5,034,506, and 5,142,047 all of which are incorporated by reference. Such sequences can routinely be synthesized using a variety of techniques currently available. For example, a sequence of DNA can be synthesized using conventional nucleotide phosphoramidite chemistry and the instruments available from Applied Biosystems, Inc, (Foster City, CA, USA);
DuPont, (Wilmington, DE, USA); or Milligen, (Bedford, MA, USA). Similarly, and when desirable, the sequences can be labeled using methodologies well known in the art such as described in U.S. patent application numbers 5,464,746; 5,424,414; and 4,948,882 all of which are incorporated by reference. A nucleic acid, nucleotide, polynucleotide or oligonucleotide can comprise the five biologically occurring bases (adenine, guanine, thymine, cytosine and uracil) and/or bases other than the five biologically occurring bases. These bases may serve a number of purposes, e.g., to stabilize or destabilize hybridization; to promote or inhibit probe degradation; or as attachment points for detectable moieties or quencher moieties. For example, a polynucleotide of the invention can contain one or more modified, non-standard, or derivatized base moieties, including, but not limited to, N6-methyl-adenine, N6-tert-butyl-benzyl-adenine, imidazole, substituted imidazoles, 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5- iodouracil, hypoxanthine, xanthine, 4-acetyl cytosine, 5-(carboxyhydroxymethyl)uracil, 5-carboxymethylarninomethyl-2-thiouridine,
5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1 -methyl guanine, 1 -methylinosine, 2,2- dimethylguanine, 2-methyladenine, 2-methylguanine, 3 -methyl cytosine, 5- methylcytosine, N6-methyladenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D mannosylqueosine, 5'- methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-N6- isopentenyl adenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5- methyluracil, uracil-5- oxyacetic acidmethyl ester, 3-(3-amino-3-N-2- carboxypropyl) uracil, (acp3)w, 2,6- diaminopurine, and 5-propynyl pyrimidine.
Other examples of modified, non-standard, or dervatized base moieties may be found in U.S. Patent Nos. 6,001,611, 5,955,589, 5,844,106, 5,789,562, 5,750,343, 5,728,525, and 5,679,785, each of which is incorporated by reference.
Furthermore, a nucleic acid, nucleotide, polynucleotide or oligonucleotide can comprise one or more modified sugar moieties including, but not limited to, arabinose, 2-fluoroarabinose, xylulose, and hexose. A nucleic acid, nucleotide, polynucleotide or oligonucleotide can comprise phosphodiester linkages or modified linkages including, but not limited to phosphotriester, phosphoramidate, siloxane, carbonate, carboxymethylester, acetamidate, carbamate, thioether, bridged phosphoramidate, bridged methylene phosphonate, phosphorothioate, methylphosphonate, phosphorodithioate, bridged phosphorothioate or sulfone linkages, and combinations of such linkages.
The term "polynucleotide" refers to a DNA, in particular cDNA, or RNA, in particular a cRNA, or a portion thereof. In the case of RNA (or cDNA), the polynucleotide is formed upon transcription of a nucleotide sequence that is capable of expression. "Polynucleotide fragments" refer to fragments of between at least 8, such as 10, 12, 15 or 18 nucleotides and at least 50, such as 60, 80, 100, 200 or 300 nucleotides in length, or a complementary sequence thereto, e.g., representing a consecutive stretch of nucleotides of a gene, cDNA or mRNA. In some embodiments, polynucleotides also include any fragment (or complementary sequence thereto) of a sequence corresponding to or derived from any of the markers defined herein.
The term "primer" refers to an oligonucleotide having a hybridization specificity sufficient for the initiation of an enzymatic polymerization under predetermined conditions, for example in an amplification technique such as polymerase chain reaction (PCR), in a process of sequencing, in a method of reverse transcription and the like. The term "probe" refers to an oligonucleotide having a hybridization specificity sufficient for binding to a defined target sequence under predetermined conditions, for example in an amplification technique such as a 5'-nuclease reaction, in a hybridization-dependent detection method, such as a Southern or Northern blot, and the like. In certain embodiments, probes correspond at least in part to selected markers. Primers and probes may be used in a variety of ways and may be defined by the specific use. For example, a probe can be immobilized on a solid support by any appropriate means, including, but not limited to: by covalent bonding, by adsorption, by hydrophobic and/or electrostatic interaction, or by direct synthesis on a solid support (see in particular patent application WO 92/10092). A probe may be labeled by means of a label chosen, for example, from radioactive isotopes, enzymes, in particular enzymes capable of acting on a chromogenic, fluorescent or luminescent substrate (in particular a peroxidase or an alkaline phosphatase), chromophoric chemical compounds, chromogenic, fluori genie or luminescent compounds, analogues of nucleotide bases, and ligands such as biotin. Illustrative fluorescent compounds include, for example, fluorescein, carboxyfluorescein, tetrachlorofluorescein, hexachlorofluorescein, Cy3, tetramethylrhodamine, Cy3.5, carboxy-x-rhodamine, Texas Red, Cy5, and Cy5.5. Illustrative luminescent compounds include, for example, luciferin and 2,3- dihydrophthalazinediones, such as luminol. Other suitable labels are described herein or are otherwise known to those of skill in the art.
Oligonucleotides (e.g., primers, probes, etc.), whether hybridization assay probes, amplification primers, or helper oligonucleotides, may be modified with chemical groups to enhance their performance or to facilitate the characterization of amplification products. For example, backbone-modified oligonucleotides such as those having phosphorothioate or methylphosphonate groups which render the oligonucleotides resistant to the nucleolytic activity of certain polymerases or to nuclease enzymes may allow the use of such enzymes in an amplification or other reaction. Another example of modification involves using non-nucleotide linkers (e.g., Arnold, et al., "Non- Nucleotide Linking Reagents for Nucleotide Probes", EP 0 313 219, which is incorporated by reference) incorporated between nucleotides in the nucleic acid chain which do not interfere with hybridization or the elongation of the primer. Amplification oligonucleotides may also contain mixtures of the desired modified and natural nucleotides.
A "reference" in the context of gene expression profiling refers to a cell and/or genes in or derived from the cell (or data derived therefrom) relative to which a target is compared. In some embodiments, for example, the expression of one or more genes from a target cell is compared to a corresponding expression of the genes in or derived from a reference cell.
A "sample" refers to any biological material containing genetic information in the form of nucleic acids or proteins obtainable or obtained from one or more subjects or individuals. In some embodiments, samples are derived from subjects having leukemia, e.g., AML. Exemplary samples include tissue samples, cell samples, bone marrow, and/or bodily fluids such as blood, saliva, semen, urine, and the like. Methods of obtaining samples and of isolating nucleic acids and proteins from sample are generally known to persons of skill in the art. A "set" refers to a collection of one or more things. For example, a set may include 1, 2, 3, 4, 5, 10, 20, 50, 100, 1,000 or another number of genes or other types of molecules.
A "solid support" refers to a solid material that can be derivatized with, or otherwise attached to, a chemical moiety, such as an oligonucleotide probe or the like. Exemplary solid supports include plates (e.g., multi-well plates, etc.), beads, microbeads, tubes, fibers, whiskers, combs, hybridization chips (including microarray substrates, such as those used in GeneChip® probe arrays (Affymetrix, Inc., Santa Clara, CA, USA) and the like), membranes, single crystals, ceramic layers, self-assembling monolayers, and the like. "Specifically binding" means that a compound is capable of discriminating between two or more polynucleotides or polypeptides. For example, the compound binds to the desired polynucleotide or polypeptide, but essentially does not bind to a non-target polynucleotide or polypeptide. The compound can be an antibody, or a fragment thereof, an enzyme, a so-called small molecule compound, a protein- scaffold (e.g., an anticalin).
A "subject" refers to an organism. Typically, the organism is a mammalian organism, particularly a human organism.
The term "substantially identical" in the context of gene expression refers to levels of expression of two or more genes that are approximately equal to one another. In some embodiments, for example, the expression levels of two or more genes are substantially identical to one another when they differ by less than about 5% (e.g., about 4%, about 3%, about 2%, about 1%, etc.).
"t(15;17)" refers to AML with translocation (15; 17) according to the WHO classification of haematological malignancies. "t(8;21)" refers to AML with translocation (8;21) according to the WHO classification of haematological malignancies.
The term "target" refers to an object that is the subject of analysis. In some embodiments, for example, targets are specific nucleic acid sequences (e.g., mRNAs of expressed genes, etc.), the presence, absence or abundance of which are to be determined. In certain embodiments, targets include polypeptides (e.g., proteins, etc.) of expressed genes. Typically, the sequences subjected to analysis are in or derived from "target cells", such as a particular type of leukemia cell.
INTRODUCTION
The present invention provides methods, reagents, systems, and kits for detecting and genotyping leukemia. In some embodiments, for example, methods are provided for genotyping acute leukemia cells with t(l Iq23)/MLL. To illustrate, certain methods described herein include detecting an expression level of a set of genes in or derived from a target human acute leukemia cell, e.g., obtained from a subject. The set of genes is generally selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14. In addition, these methods also include:
(a) correlating a detected differential expression of one or more genes of the target human acute leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human acute leukemia cell lacking t(l Iq23)/MLL with the target human acute leukemia cell having t(l Iq23)/MLL;
(b) correlating a detected substantially identical expression of one or more genes of the target human acute leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human acute leukemia cell lacking t(l Iq23)/MLL with the target human acute leukemia cell lacking t(l Iq23)/MLL;
(c) correlating a detected differential expression of one or more genes of the target human acute leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human acute leukemia cell having t(l Iq23)/MLL with the target human acute leukemia cell lacking t(l Iq23)/MLL; or
(d) correlating a detected substantially identical expression of one or more genes of the target human acute leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human acute leukemia cell having t(l Iq23)/MLL with the target human acute leukemia cell having t(l Iq23)/MLL. In some embodiments, the reference human acute leukemia cell lacking t(l Iq23)/MLL is a precursor B-ALL cell with t(9;22), a precursor B-ALL cell with t(8;14), a precursor T-ALL cell (Table 13), an AML cell with t(8;21), an AML cell with t(15;17), an AML cell with inv(16), or an AML cell with a complex aberrant karyotype (Table 14). Other aspects of the invention include methods for distinguishing acute myeloid leukemia (AML) cells with t(l Iq23)/MLL from acute lymphoblastic leukemia (ALL) cells with t(l Iq23)/MLL. Tables 15-20 are versions of the probe lists provided in Tables 8-10 that support the statistical data provided therein with annotations. More specifically, Table 15 annotates the top 50-downregulated or lower expressed genes in ALL with 11 q23 that are listed in Table 8, whereas Table 16 annotates the top 50-upregulated or higher expressed genes in ALL with 1 Iq23 that are provided in Table 8. Table 17 annotates the lower expressed genes in AML with 1 Iq23 that are listed in Table 9, while Table 18 annotates the higher expressed genes in AML with 1 Iq23 that are provided in Table 9. Further, Table 19 annotates the lower expressed genes in 1 Iq23 leukemias that are listed in Table 10, while Table 20 annotates the higher expressed genes in 1 Iq23 leukemias that are provided in Table 10. The use of one or more of the markers described herein, e.g., utilizing a microarray technology or other gene expression profiling techniques, provides various advantages, including: (1) rapid and accurate diagnoses, (2) ease of use in laboratories without specialized knowledge, and (3) eliminates the need for analyzing viable cells for chromosome analysis, thereby eliminating cell sample transport issues. Aspects of the present invention are further illustrated in the examples provided below.
In practicing the present invention, many conventional techniques in, hematology, molecular biology and recombinant DNA are optionally used. These techniques are well known and are explained in, for example, Current Protocols in Molecular Biology, Volumes I, II, and III, 1997 (F. M. Ausubel ed.); Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd Ed., Cold Spring Harbor Laboratory
Press, Cold Spring Harbor, N.Y., 2001; Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology volume 152 Academic Press, Inc., San Diego, CA (Berger), DNA Cloning: A Practical Approach, Volumes I and II, 1985 (D. N. Glover ed.); Oligonucleotide Synthesis, 1984 (M. L. Gait ed.); Nucleic Acid Hybridization, 1985, (Hames and Higgins); Transcription and Translation,
1984 (Hames and Higgins eds.); Animal Cell Culture, 1986 (Freshney ed.); Immobilized Cells and Enzymes, 1986 (IRL Press); Perbal, 1984, A Practical Guide to Molecular Cloning; the series, Methods in Enzymology (Academic Press, Inc.); Gene Transfer Vectors for Mammalian Cells, 1987 (J. H. Miller and M. P. Calos eds., Cold Spring Harbor Laboratory); Greer et al. (Eds.), Wintrobe's Clinical
Hematology. 1 1th Ed., Lippincott Williams & Wilkins (2003); Shirlyn et al., Clinical Laboratory Hematology, Prentice Hall (2002); Lichtman et al., Williams Manual of Hematology, 6th Ed., McGraw-Hill Professional (2002); and Methods in Enzymology Vol. 154 and Vol. 155 (Wu and Grossman, and Wu, eds., respectively), all of which are incorporated by reference.
In addition to the methods of detecting and genotyping leukemia, the related kits and systems are also described further below.
SAMPLE COLLECTION AND PREPARATION
Samples are collected and prepared for analysis using essentially any technique known to those of skill in the art. In certain embodiments, for example, blood samples are obtained from subjects via venipuncture. Whole blood specimens are optionally collected in EDTA, Heparin or ACD vacutainer tubes. In other embodiments, the samples utilized for analysis comprise bone marrow aspirates, which are optionally processed, e.g., by erythrocyte lysis techniques, Ficoll density gradient centrifugations, or the like. Samples are typically either analyzed immediately following acquisition or stored frozen at, e.g., -8O0C until being subjected to analysis. Sample collection and handling are also described in, e.g., Garland et al., Handbook of Phlebotomy and Patient Service Techniques, Lippincott Williams & Wilkins (1998), and Slockbower et al. (Eds.), Collection and Handling of Laboratory Specimens: A Practical Guide, Lippincott Williams & Wilkins (1983), which are both incorporated by reference.
Treatment of Cells
The cells lines or sources containing the target nucleic acids and/or expression products thereof, are optionally subjected to one or more specific treatments that induce changes in gene expression, e.g., as part of processes to identify candidate modulators of gene expression. For example, a cell or cell line can be treated with or exposed to one or more chemical or biochemical constituents, e.g., pharmaceuticals, pollutants, DNA damaging agents, oxidative stress-inducing agents, pH-altering agents, membrane-disrupting agents, metabolic blocking agent, a chemical inhibitors, cell surface receptor ligands, antibodies, transcription promoters/enhancers/inhibitors, translation promoters/enhancers/inhibitors, protein- stabilizing or destabilizing agents, various toxins, carcinogens or teratogens, characterized or uncharacterized chemical libraries, proteins, lipids, or nucleic acids. Optionally, the treatment comprises an environmental stress, such as a change in one or more environmental parameters including, but not limited to, temperature (e.g. heat shock or cold shock), humidity, oxygen concentration (e.g., hypoxia), radiation exposure, culture medium composition, or growth saturation. Responses to these treatments may be followed temporally, and the treatment can be imposed for various times and at various concentrations. Target sequences can also be derived from cells exposed to multiple specific treatments as described above, either concurrently or in tandem (e.g., a cancerous cell or tissue sample may be further exposed to a DNA damaging agent while grown in an altered medium composition). RNA Isolation
In some embodiments, total RNA is isolated from samples for use as target sequences. Cellular samples are lysed once culture with or without the treatment is complete by, for example, removing growth medium and adding a guanidinium- based lysis buffer containing several components to stabilize the RNA. In certain embodiments, the lysis buffer also contains purified RNAs as controls to monitor recovery and stability of RNA from cell cultures. Examples of such purified RNA templates include the Kanamycin Positive Control RNA from Promega (Madison, WI, USA), and 7.5 kb Poly(A)-Tailed RNA from Life Technologies (Rockville, MD, USA). Lysates may be used immediately or stored frozen at, e.g., -8O0C.
Optionally, total RNA is purified from cell lysates (or other types of samples) using silica-based isolation in an automation-compatible, 96-well format, such as the Rneasy® purification platform (Qiagen, Inc. (Valencia, CA, USA)). Alternatively, RNA is isolated using solid-phase oligo-dT capture using oligo-dT bound to microbeads or cellulose columns. This method has the added advantage of isolating mRNA from genomic DNA and total RNA, and allowing transfer of the mRNA-capture medium directly into the reverse transcriptase reaction. Other RNA isolation methods are contemplated, such as extraction with silica-coated beads or guanidinium. Further methods for RNA isolation and preparation can be devised by one skilled in the art.
Alternatively, the methods of the present invention are performed using crude cell lysates, eliminating the need to isolate RNA. RNAse inhibitors are optionally added to the crude samples. When using crude cellular lysates, genomic DNA could contribute one or more copies of target sequence, depending on the sample. In situations in which the target sequence is derived from one or more highly expressed genes, the signal arising from genomic DNA may not be significant. But for genes expressed at very low levels, the background can be eliminated by treating the samples with DNAse, or by using primers that target splice junctions. One skilled in the art can design a variety of specialized priming applications that would facilitate use of crude extracts as samples for the purposes of this invention. GENE EXPRESSION PROFILING
The determination of gene expression levels may be effected at the transcriptional and/or translational level, i.e., at the level of mRNA or at the protein level. Essentially any method of gene expression profiling can be used or adapted for use in performing the methods described herein including, e.g., methods based on hybridization analysis of polynucleotides, and methods based on sequencing of polynucleotides. To illustrate, commonly used methods for the quantification of mRNA expression in a sample include northern blotting and in situ hybridization (Parker & Barnes, Methods in Molecular Biology 106:247-283 (1999)), RNAse protection assays (Hod, Biotechniques 13:852-854 (1992)), and reverse transcription polymerase chain reaction (RT-PCR) (Weis et al., Trends in Genetics 8:263-264 (1992)). Alternatively, antibodies may be employed that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes. Representative methods for sequencing-based gene expression analysis include Serial Analysis of Gene Expression (SAGE), and gene expression analysis by massively parallel signature sequencing (MPSS). Optionally, molecular species, such as antibodies, aptamers, etc. that can specifically bind to proteins or fragments thereof are used for analysis (see, e.g., Beilharz et al., Brief Funct Genomic Proteomic 3(2):1O3-111 (2004)). Some of these techniques, with a certain degree of overlap in some cases, are described further below.
In certain embodiments, for example, the methods described herein include determining the expression levels of transcribed polynucleotides. In some of these embodiments, the transcribed polynucleotide is an mRNA, a cDNA and/or a cRNA. Transcribed polynucleotides are typically isolated from a sample, reverse transcribed and/or amplified, and labeled by techniques referred to above or otherwise known to persons skilled in the art. In order to determine the expression level of transcribed polynucleotides, the methods of the invention generally include hybridizing transcribed polynucleotides to a complementary polynucleotide, or a portion thereof, under a selected hybridization condition (e.g., a stringent hybridization condition), as described herein. In some embodiments, the detection and quantification of amounts of polynucleotides to determine the level of expression of a marker are performed according to those described by, e.g., Sambrook et al., supra, or real time methods known in the art as 5'-nuclease methods disclosed in, e.g., WO 92/02638, U.S. Pat. No. 5,210,015, U.S. Pat. No. 5,804,375, and U.S. Pat. No. 5,487,972, which are each incorporated by reference. In some embodiments, for example, 5 '-nuclease methods utilize the exonuclease activity of certain polymerases to generate signals. In these approaches, target nucleic acids are detected in processes that include contacting a sample with an oligonucleotide containing a sequence complementary to a region of the target nucleic acid component and a labeled oligonucleotide containing a sequence complementary to a second region of the same target nucleic acid component sequence strand, but not including the nucleic acid sequence defined by the first oligonucleotide, to create a mixture of duplexes during hybridization conditions, wherein the duplexes comprise the target nucleic acid annealed to the first oligonucleotide and to the labeled oligonucleotide such that the
3 '-end of the first oligonucleotide is adjacent to the 5'-end of the labeled oligonucleotide. Then this mixture is treated with a template-dependent nucleic acid polymerase having a 5' to 3' nuclease activity under conditions sufficient to permit the to 3' nuclease activity of the polymerase to cleave the annealed, labeled oligonucleotide and release labeled fragments. The signal generated by the hydrolysis of the labeled oligonucleotide is detected and/or measured. 5'-nuclease technology eliminates the need for a solid phase bound reaction complex to be formed and made detectable. Other exemplary methods include, e.g., fluorescence resonance energy transfer between two adjacently hybridized probes as used in the LightCycler® format described in, e.g., U.S. Pat. No. 6,174,670, which is incorporated by reference.
In one protocol, the marker, i.e., the polynucleotide, is in form of a transcribed nucleotide, where total RNA is isolated, cDNA and, subsequently, cRNA is synthesized and biotin is incorporated during the transcription reaction. The purified cRNA is applied to commercially available arrays that can be obtained from, e.g., Affymetrix, Inc. (Santa Clara, CA USA). The hybridized cRNA is optionally detected according to the methods described in the examples provided below. The arrays are produced by photolithography or other methods known to persons skilled in the art. Some of these techniques are also described in, e.g. U.S. Pat. No. 5,445,934, U.S. Pat. No. 5,744,305, U.S. Pat. No. 5,700,637, U.S. Pat. No. 5,945,334, EP 0 619 321, and EP 0 373 203, which are each incorporated by reference.
In another embodiment, the polynucleotide or at least one of the polynucleotides is in form of a polypeptide (e.g., expressed from the corresponding polynucleotide). The expression level of the polynucleotides or polypeptides is optionally detected using a compound that specifically binds to target polynucleotides or target polypeptides.
[0001] These and other exemplary gene expression profiling techniques are described further below.
Blotting Techniques
Some of the earliest expression profiling methods are based on the detection of a label in RNA hybrids or protection of RNA from enzymatic degradation (see, e.g.,
Ausubel et al., supra). Methods based on detecting hybrids include northern blots and slot/dot blots. These two techniques differ in that the components of the sample being analyzed are resolved by size in a northern blot prior to detection, which enables identification of more than one species simultaneously. Slot blots are generally carried out using unresolved mixtures or sequences, but can be easily performed in serial dilution, enabling a more quantitative analysis.
In Situ Hybridization
In situ hybridization is a technique that monitors transcription by directly visualizing RNA hybrids in the context of a whole cell. This method provides information regarding subcellular localization of transcripts (see, e.g., Suzuki et al.,
Pigment Cell Res. 17(1): 10-4 (2004)).
Assays Based on Protection from Enzymatic Degradation
Techniques to monitor RNA that make use of protection from enzymatic degradation include Sl analysis and RNAse protection assays (RPAs). Both of these assays employ a labeled nucleic acid probe, which is hybridized to the RNA species being analyzed, followed by enzymatic degradation of single-stranded regions of the probe. Analysis of the amount and length of probe protected from degradation is used to determine the quantity and endpoints of the transcripts being analyzed.
Reverse Transcriptase PCR (RT-PCR) and Real-Time Detection RT-PCR can be used to compare, e.g., mRNA levels in different sample populations, in normal and tumor tissues, with or without drug treatment, to characterize patterns of gene expression, to discriminate between closely related mRNAs, and to analyze RNA structure. These assays are derivatives of PCR in which amplification is preceded by reverse transcription of mRNA into cDNA. Accordingly, an initial step in these processes is generally the isolation of mRNA from a target sample (e.g., leukemia cells). The starting material is typically total RNA isolated from cancerous tissues or cells (e.g., bone marrow, peripheral blood aliquots, etc.), and in certain embodiments, from corresponding normal tissues or cells. General methods for mRNA extraction are well known in the art and are disclosed in standard textbooks of molecular biology, including Ausubel et al., supra. Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker, Lab Invest. 56:A67 (1987), and De Andres et al., BioTechniques 18:42044 (1995), which are each incorporated by reference. In particular, RNA isolation can be performed using purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions. For example, total RNA from cells in culture can be isolated using Qiagen Rneasy® mini-columns (referred to above). Other commercially available RNA isolation kits include MasterPure™ Complete DNA and RNA Purification Kit (EPICENTRE™, Madison, Wis.), and Paraffin Block
RNA Isolation Kit (Ambion, Inc.). Total RNA from tissue samples can be isolated using RNA Stat-60 (Tel-Test). RNA prepared from tumor can be isolated, for example, by cesium chloride density gradient centrifugation.
Since RNA generally cannot serve as a template for PCR, the process of gene expression profiling by RT-PCR typically includes the reverse transcription of the
RNA template into cDNA, followed by its exponential amplification in a PCR reaction. Two commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase (AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MMLV-RT). The reverse transcription step is typically primed using specific primers, random hexamers, or oligo-dT primers, depending on the particular circumstances of expression profiling analysis. For example, extracted
RNA can be reverse-transcribed using a GeneAmp RNA PCR kit (Perkin Elmer, CA, USA), following the manufacturer's instructions. The derived cDNA can then be used as a template in the subsequent PCR reaction.
Although the PCR step can use a variety of thermostable DNA-dependent DNA polymerases, it typically employs the Taq DNA polymerase, which has a 5'-3' nuclease activity but lacks a 3'-5' proofreading endonuclease activity. Thus, TaqMan® PCR typically utilizes the 5'-nuclease activity of Taq or Tth polymerase to hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with equivalent 5' nuclease activity can be used. Pairs of primers are generally used to generate amplicons in PCR reactions. A third oligonucleotide, or probe, is designed to bind to nucleotide sequence located between PCR primer pairs. Probe are generally non-extendible by Taq DNA polymerase enzyme, and are typically labeled with, e.g., a reporter fluorescent dye and a quencher fluorescent dye. Laser-induced emission from the reporter dye is quenched by the quenching dye when the two dyes are located close together, such as in an intact probe. During the amplification reaction, the Taq DNA polymerase enzyme cleaves the probe in a template-dependent manner. The resultant probe fragments disassociate in solution, and signal from the released reporter dye is free from the quenching effect of the second fluorophore. One molecule of reporter dye is typically liberated for each new molecule synthesized, and detection of the unquenched reporter dye provides the basis for quantitative interpretation of the data.
TaqMan® RT-PCR can be performed using commercially available equipment, such as, for example, a LightCycler® system (Roche Molecular Biochemicals, Mannheim, Germany) or an ABI PRISM 7700™ Sequence Detection System™ (Perkin-Elmer-Applied Biosystems, Foster City, CA, USA). To minimize errors and the effect of sample-to-sample variation, RT-PCR is typically performed using an internal standard. An ideal internal standard is expressed at a relatively constant level among different cells or tissues, and is unaffected by the experimental treatment. Exemplary RNAs frequently used to normalize patterns of gene expression are mRNAs transcribed from for the housekeeping genes glyceraldehyde-3-phosphate-dehydrogenase (GAPDH) and β- actin.
Other exemplary methods for targeted mRNA analysis include differential display reverse transcriptase PCR (DDRT-PCR) and RNA arbitrarily primed PCR (RAP- PCR) (see, e.g., U.S. Patent No. 5,599,672; Liang and Pardee (1992) Science
257:967-971 ; Welsh et al. (1992) Nucleic Acids Res. 20:4965-4970, which are each incorporated by reference). Both methods use random priming to generate RT-PCR fingerprint profiles of transcripts in an unfractionated RNA preparation. The signal generated in these types of analyses is a pattern of bands separated on a sequencing gel. Differentially expressed genes appear as changes in the fingerprint profiles between two samples, which can be loaded in separate wells of the same gel. This type of readout allows identification of both up- and down-regulation of genes in the same reaction, appearing as either an increase or decrease in intensity of a band from one sample to another. Molecular beacons are oligonucleotides designed for real time detection and quantification of target nucleic acids. The 5' and 3' termini of molecular beacons collectively comprise a pair of moieties, which confers the detectable properties of the molecular beacon. One of the termini is attached to a fluorophore and the other is attached to a quencher molecule capable of quenching a fluorescent emission of the fluorophore. To illustrate, one example fluorophore-quencher pair can use a fluorophore, such as EDANS or fluorescein, e.g., on the 5'-end and a quencher, such as Dabcyl, e.g., on the 3'-end. When the molecular beacon is present free in solution, i.e., not hybridized to a second nucleic acid, the stem of the molecular beacon is stabilized by complementary base pairing. This self-complementary pairing results in a "hairpin loop" structure for the molecular beacon in which the fluorophore and the quenching moieties are proximal to one another. In this confirmation, the fluorescent moiety is quenched by the quenching moiety. The loop of the molecular beacon typically comprises the oligonucleotide probe and is accordingly complementary to a sequence to be detected in the target microbial nucleic acid, such that hybridization of the loop to its complementary sequence in the target forces disassociation of the stem, thereby distancing the fluorophore and quencher from each other. This results in unquenching of the fluorophore, causing an increase in fluorescence of the molecular beacon.
Details regarding standard methods of making and using molecular beacons are well established in the literature and molecular beacons are available from a number of commercial reagent sources. Further details regarding methods of molecular beacon manufacture and use are found, e.g., in Leone et al. (1995) "Molecular beacon probes combined with amplification by NASBA enable homogenous real-time detection of RNA," Nucleic Acids Res. 26:2150-2155; Kostrikis et al. (1998) "Molecular beacons: spectral genotyping of human alleles" Science 279:1228-1229; Fang et al. (1999) "Designing a novel molecular beacon for surface-immobilized DNA hybridization studies" J. Am. Chem. Soc. 121 :2921- 2922; and Marras et al. (1999) "Multiplex detection of single-nucleotide variation using molecular beacons" Genet. Anal. Biomol. Eng. 14:151-156, all of which are incorporated by reference. A variety of commercial suppliers produce standard and custom molecular beacons, including Oswel Research Products Ltd. (UK),
Research Genetics (a division of Invitrogen, Huntsville, AL, USA), the Midland Certified Reagent Company (Midland, TX, USA), and Gorilla Genomics, LLC (Alameda, CA, USA). A variety of kits which utilize molecular beacons are also commercially available, such as the Sentinel™ Molecular Beacon Allelic Discrimination Kits from Stratagene (La Jolla, CA, USA) and various kits from
Eurogentec SA (Belgium) and Isogen Bioscience BV (Netherlands).
Nucleic Acid Array-Based Analysis
Differential gene expression can also be identified, or confirmed using arrayed oligonucleotides (e.g., microarrays), which have the benefit of assaying for sample hybridization to a large number of probes in a highly parallel fashion. In these approaches, polynucleotide sequences of interest (e.g., probes, such as cDNAs, mRNAs, oligonucleotides, etc.) are plated, synthesized, or otherwise disposed on a microchip substrate or other type of solid support (see, e.g., U.S. Patent Nos. 5,143,854 and 5,807,522; Fodor et al. (1991) Science 251 :767-773; and Schena et al. (1995) Science 270:467-470, which are each incorporated by reference). Sequences of interest can be obtained, e.g., by creating a cDNA library from an mRNA source or by using publicly available databases, such as GenBank, to annotate the sequence information of custom cDNA libraries or to identify cDNA clones from previously prepared libraries. The arrayed sequences are then hybridized with target nucleic acids from cells or tissues of interest. As in the RT- PCR assays referred to above, the source of mRNA typically is total RNA isolated from a sample.
In certain embodiments, high-density oligonucleotide arrays are produced using a light-directed chemical synthesis process (i.e., photolithography). Unlike common cDNA arrays, oligonucleotide arrays (according, e.g., to the Affymetrix technology) typically use a single-dye technology. Given the sequence information of the probes or markers, the sequences are typically synthesized directly onto the array, thus, bypassing the need for physical intermediates, such as PCR products, commonly utilized in making cDNA arrays. For this purpose, selected markers, or partial sequences thereof, can be represented by, e.g., between about 14 to 20 features, typically by less then 14 features, more typically less then about 10 features, even more typically by about 6 features or less, with each feature generally being a short sequence of nucleotides (oligonucleotide), which is typically a perfect match (PM) to a segment of the respective gene. The PM oligonucleotides are paired with mismatch (MM) oligonucleotides, which have a single mismatch at the central base of the nucleotide and are used as "controls". The chip exposure sites are typically defined by masks and are de-protected by the use of light, followed by a chemical coupling step resulting in the synthesis of one nucleotide. The masking, light deprotection, and coupling process can then be repeated to synthesize the next nucleotide, until the nucleotide chain is of the specified length. To illustrate other embodiments of microarray-based assays, PCR amplified inserts of cDNA clones are applied to a substrate in a dense array. In some embodiments, for example, at least 10,000 different cDNA probe sequences are applied to a given solid support. Fluorescently labeled cDNA targets may be generated through incorporation of fluorescent nucleotides by reverse transcription of RNA extracted from the samples of interest. Labeled cDNA targets applied to the chip hybridize with corresponding probes on the array. After washing (e.g., under stringent conditions) to remove non-specifically bound probes, the chip is typically scanned by confocal laser microscopy or by another detection method, such as a CCD camera. Quantitation of hybridization of each arrayed element allows for assessment of corresponding mRNA abundance. With dual color fluorescence, for example, separately labeled cDNA probes generated from two sources of RNA can be hybridized concurrently to the arrayed probes. The relative abundance of the transcripts from the two sources corresponding to each specified gene can thus be determined simultaneously. The miniaturized scale of the hybridization affords a convenient and rapid evaluation of the expression pattern for large numbers of genes. Such methods have been shown to have the sensitivity required to detect rare transcripts, which are expressed at a few copies per cell, and to reproducibly detect at least approximately two-fold differences in the expression levels (Schena et al., Proc. Natl. Acad. Sci. USA 93(2):106-149 (1996), which is incorporated by reference). Other microarray-based assay formats are also optionally utilized. Microarray analysis can be performed using commercially available equipment, following manufacturer's protocols, such as by using the Affymetrix GeneChip® technology, or Agilent's microarray technology.
If the polynucleotide being detected is mRNA, cDNA may be prepared into which a detectable label, as exemplified herein, is incorporated. For example, labeled cDNA, in single-stranded form, may then be hybridized (e.g., under stringent or highly stringent conditions) to a panel of single-stranded oligonucleotides representing different genes and affixed to a solid support, such as a chip. Upon applying appropriate washing steps, those cDNAs that have a counterpart in the oligonucleotide panel or array will be detected (e.g., quantitatively detected). Various advantageous embodiments of this general method are feasible. For example, mRNA or cDNA may be amplified, e.g., by a polymerase chain reaction or another nucleic acid amplification technique. In some embodiments, where quantitative assessments are sought, it is generally desirable that the number of amplified copies corresponds to the number of mRNAs originally present in the cell. Optionally, cDNAs are transcribed into cRNAs prior to hybridization steps in a given assay. In these embodiments, labels can be attached or incorporated cRNAs during or after the transcription step.
To further illustrate, one exemplary embodiment of the methods of the invention includes, as follows (1) obtaining a sample, e.g. bone marrow or peripheral blood aliquots, from a patient; (2) extracting RNA, e.g., mRNA, from the sample; (3) reverse transcribing the RNA into cDNA; (4) in vitro transcribing the cDNA into cRNA; (5) fragmenting the cRNA; (6) hybridizing the fragmented cRNA on selected microarrays (e.g., the HG-U133 microarray set available from Affymetrix, Inc. (Santa Clara, CA USA)); and (7) detecting hybridization.
Serical Analysis of Gene Expression (SAGE)
Serial analysis of gene expression (SAGE) is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need for providing an individual hybridization probe for each transcript. Initially, a short sequence tag (e.g., about 10-14 bp) is generated that contains sufficient information to uniquely identify a transcript, provided that the tag is obtained from a unique position within each transcript. Then, many transcripts are linked together to form long serial molecules, that can be sequenced, revealing the identity of the multiple tags simultaneously. The expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag. SAGE-based assays are also described in, e.g. Velculescu et al., Science 270:484- 487 (1995) and Velculescu et al., Cell 88:243-51 (1997), which are both incorporated by reference.
Gene Expression Analysis by Massively Parallel Signature Sequencing (MPSS)
These sequencing approaches generally combine non-gel-based signature sequencing with in vitro cloning of millions of templates on separate 5 μm diameter microbeads. Typically, a microbead library of DNA templates is constructed by in vitro cloning. This is generally followed by the assembly of a planar array of the template-containing microbeads in a flow cell at a high density (typically greater than 3 x 106 microbeads/cm2). The free ends of the cloned templates on each microbead are analyzed simultaneously, using a fluorescence- based signature sequencing method that does not require DNA fragment separation. This method can be used to simultaneously and accurately provide, in a single operation, hundreds of thousands of gene signature sequences from cDNA libraries. MPSS is also described in, e.g., Brenner et al., (2000) Nature BiotechnoloRv 18:630-634, which is incorporated by reference.
Immunoassays and proteomics Essentially any available technique for the detection of proteins is optionally utilized in the methods of the invention. Exemplary protein analysis technologies include, e.g., one- and two-dimensional SDS-P AGE-based separation and detection, immunoassays (e.g., western blotting, etc.), aptamer-based detection, mass spectrometric detection, and the like. These and other techniques are generally well-known in the art.
To illustrate, immunohistochemical methods are optionally used for detecting the expression levels of the targets described herein. Thus, antibodies or antisera (e.g., polyclonal antisera) and in certain embodiments, monoclonal antibodies specific for particular targets are used to detect expression. In some of these embodiments, antibodies are directly labeled, e.g., with radioactive labels, fluorescent labels, haptens, chemiluminescent dyes, enzyme substrates or co-factors, enzyme inhibitors, free radicals, enzymes (e.g., horseradish peroxidase or alkaline phosphatase), or the like. Such labeled reagents may be used in a variety of well known assays, such as radioimmunoassays, enzyme immunoassays, e.g., ELISA, fluorescent immunoassays, and the like. See, e.g., U.S. Pat. Nos. 3,766,162;
3,791,932; 3,817,837; and 4,233,402, which are each incorporated by reference. Additional labels are described further herein. Alternatively, unlabeled primary antibodies are used in conjunction with labeled secondary antibodies, comprising antisera, polyclonal antisera or a monoclonal antibody specific for the primary antibody. Immunohistochemistry protocols and kits are well known in the art and are commercially available. To further illustrate, proteins from a cell or tissue under investigation may be contacted with a panel or array of aptamers or of antibodies or fragments or derivatives thereof. These biomolecules may be affixed to a solid support, such as a chip. The binding of proteins indicative of a given leukemia type or subtype is optionally verified by binding to a detectably labeled secondary antibody or aptamer. The labeling of antibodies is also described in, e.g., Harlow and Lane, Antibodies, a laboratory manual, CSH Press (1988), which is incorporated by reference. To further illustrate, a minimum set of proteins necessary for detecting various leukemia types or subtypes may be selected for the creation of a protein array for use in making diagnoses with, e.g., protein lysates of bone marrow samples directly. Protein array systems for the detection of specific protein expression profiles are commercially available from various suppliers, including the Bio-Plex™ platform available from BIO-RAD Laboratories (Munich, Germany). In some embodiments of the invention, antibodies against the target proteins are produced and immobilized on a solid support, e.g., a glass slide or a well of a microtiter plate. The immobilized antibodies can be labeled with a reactant that is specific for the target proteins. These reactants can include, e.g., enzyme substrates, DNA, receptors, antigens or antibodies to create for example a capture sandwich immunoassay. Target proteins can also be detected using aptamers including photoaptamers.
Aptamers generally are single-stranded oligonucleotides (e.g., typically DNA for diagnostic applications) that assume a specific, sequence-dependent shape and binds to target proteins based on a "lock-and-key" fit between the two molecules. Aptamers can be identified using the SELEX process (Gold (1996) "The SELEX process: a surprising source of therapeutic and diagnostic compounds," Harvey
Lect. 91 :47-57, which is incorporated by reference). Aptamer arrays are commercially available from various suppliers including, e.g., SomaLogic, Inc. (Boulder, CO, USA).
The detection of proteins via mass includes various formats that can be adapted for use in the methods of the invention. Exemplary formats include matrix assisted laser desorption/ionization- (MALDI) and surface enhanced laser desoφtion/ionization-based (SELDI) detection. MALDI- and SELDI-based detection are also described in, e.g., Weinberger et al. (2000) "Recent trends in protein biochip technology," Pharmaco genomics 1(4):395-416, Forde et al. (2002) "Characterization of transcription factors by mass spectrometry and the role of SELDI-MS," Mass Spectrom. Rev. 21(6):419-439, and Leushner (2001) "MALDI
TOF mass spectrometry: an emerging platform for genomics and diagnostics," Expert Rev. MoI. Diagn. 1(1):11-18, which are each incorporated by reference. Protein chips and related instrumentation are available from commercial suppliers, such as Ciphergen Biosystems, Inc. (Fremont, CA, USA).
OLIGONUCLEOTIDE PREPARATION
Various approaches can be utilized by one of skill in the art to design oligonucleotides for use as probes and/or primers. To illustrate, the DNAstar software package available from DNASTAR, Inc. (Madison, WI) can be used for sequence alignments. For example, target nucleic acid sequences and non-target nucleic acid sequences can be uploaded into DNAstar EditSeq program as individual files, e.g., as part of a process to identify regions in these sequences that have low sequence similarity. To further illustrate, pairs of sequence files can be opened in the DNAstar MegAlign sequence alignment program and the Clustal W method of alignment can be applied. The parameters used for Clustal W alignments are optionally the default settings in the software. MegAlign typically does not provide a summary of the percent identity between two sequences. This is generally calculated manually. From the alignments, regions having, e.g., less than 85% identity with one another are typically identified and oligonucleotide sequences in these regions can be selected. Many other sequence alignment algorithms and software packages are also optionally utilized. Sequence alignment algorithms are also described in, e.g., Mount, Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press (2001), and Durbin et al., Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press (1998), which are both incorporated by reference. To further illustrate, optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman (1981) Adv. Appl. Math. 2:482, by the homology alignment algorithm of Needleman & Wunsch (1970) J. MoI. Biol. 48:443, by the search for similarity method of Pearson & Lipman (1988) Proc. Nat'l. Acad. Sci. USA 85:2444, which are each incorporated by reference, and by computerized implementations of these algorithms (e.g., GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin
Genetics Software Package, Genetics Computer Group (Madison, WI)), or by even by visual inspection.
Another example algorithm that is suitable for determining percent sequence identity is the BLAST algorithm, which is described in, e.g., Altschul et al. (1990) J. MoI. Biol. 215:403-410, which is incorporated by reference. Software for performing versions of BLAST analyses is publicly available through the National Center for Biotechnology Information on the world wide web at ncbi.nlm.nih.gov/ as of 11/4/2004.
An additional example of a useful sequence alignment algorithm is PILEUP. PILEUP creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. PILEUP uses a simplification of the progressive alignment method of Feng & Doolittle (1987) J. MoI. Evol. 35:351-360, which is incorporated by reference. Oligonucleotide probes and primers are optionally prepared using essentially any technique known in the art. In certain embodiments, for example, the oligonucleotide probes and primers are synthesized chemically using essentially any nucleic acid synthesis method, including, e.g., according to the solid phase phosphoramidite method described by Beaucage and Caruthers (1981) Tetrahedron Letts. 22(20): 1859-1862, which is incorporated by reference. To further illustrate, oligonucleotides can also be synthesized using a triester method (see, e.g., Capaldi et al. (2000) "Highly efficient solid phase synthesis of oligonucleotide analogs containing phosphorodithioate linkages" Nucleic Acids Res. 28(9):e40 and Eldrup et al. (1994) "Preparation of oligodeoxyribonucleoside phosphorodithioates by a triester method" Nucleic Acids Res. 22(10): 1797-1804, which are both incorporated by reference). Other synthesis techniques known in the art can also be utilized, including, e.g., using an automated synthesizer, as described in Needham-VanDevanter et al. (1984) Nucleic Acids Res. 12:6159-6168, which is incorporated by reference. A wide variety of equipment is commercially available for automated oligonucleotide synthesis. Multi-nucleotide synthesis approaches (e.g., tri-nucleotide synthesis, etc.) are also optionally utilized. Moreover, the primer nucleic acids optionally include various modifications. In certain embodiments, for example, primers include restriction site linkers, e.g., to facilitate subsequent amplicon cloning or the like. To further illustrate, primers are also optionally modified to improve the specificity of amplification reactions as described in, e.g., U.S. Pat. No. 6,001,611, entitled "MODIFIED NUCLEIC ACID
AMPLIFICATION PRIMERS," issued December 14, 1999 to Will, which is incorporated by reference. Primers and probes can also be synthesized with various other modifications as described herein or as otherwise known in the art.
Probes and/or primers utilized in the methods and other aspects of the invention are typically labeled to permit detection of probe-target hybridization duplexes. In general, a label can be any moiety that can be attached to a nucleic acid and provide a detectable signal (e.g., a quantifiable signal). Labels may be attached to oligonucleotides directly or indirectly by a variety of techniques known in the art. To illustrate, depending on the type of label used, the label can be attached to a terminal (5' or 3' end of an oligonucleotide primer and/or probe) or a non-terminal nucleotide, and can be attached indirectly through linkers or spacer arms of various sizes and compositions. Using commercially available phosphoramidite reagents, one can produce oligonucleotides containing functional groups (e.g., thiols or primary amines) at either the 5' or 3' terminus via an appropriately protected phosphoramidite, and can label such oligonucleotides using protocols described in, e.g., lnnis et al. (Eds.) PCR Protocols: A Guide to Methods and Applications, Elsevier Science & Technology Books (1990)(Innis), which is incorporated by reference.
Essentially any labeling moiety is optionally utilized to label a probe and/or primer by techniques well known in the art. In some embodiments, for example, labels comprise a fluorescent dye (e.g., a rhodamine dye (e.g., R6G, Rl 10, TAMRA, ROX, etc.), a fluorescein dye (e.g., JOE, VIC, TET, HEX, FAM, etc.), a halofluorescein dye, a cyanine dye (e.g., CY3, CY3.5, CY5, CY5.5, etc.), a BODIPY® dye (e.g., FL, 530/550, TR, TMR, etc.), an ALEXA FLUOR® dye (e.g., 488, 532, 546, 568, 594, 555, 653, 647, 660, 680, etc.), a dichlororhodamine dye, an energy transfer dye (e.g., BIGDYE™ v 1 dyes, BIGDYE™ v 2 dyes,
BIGDYE™ v 3 dyes, etc.), Lucifer dyes (e.g., Lucifer yellow, etc.), CASCADE BLUE®, Oregon Green, and the like. Additional examples of fluorescent dyes are provided in, e.g., Haugland, Molecular Probes Handbook of Fluorescent Probes and Research Products, Ninth Ed. (2003) and the updates thereto, which are each incorporated by reference. Fluorescent dyes are generally readily available from various commercial suppliers including, e.g., Molecular Probes, Inc. (Eugene, OR), Amersham Biosciences Corp. (Piscataway, NJ), Applied Biosystems (Foster City, CA), etc. Other labels include, e.g., biotin, weakly fluorescent labels (Yin et al. (2003) Appl Environ Microbiol. 69(7):3938, Babendure et al. (2003) Anal. Biochem. 317(1):! . and Jankowiak et al. (2003) Chem Res Toxicol. 16(3):304), non-fluorescent labels, colorimetric labels, chemiluminescent labels (Wilson et al. (2003) Analyst. 128(5):480 and Roda et al. (2003) Luminescence 18(2):72), Raman labels, electrochemical labels, bioluminescent labels (Kitayama et al. (2003) Photochem Photobiol. 77(3):333, Arakawa et al. (2003) Anal. Biochem. 314(2):206, and Maeda (2003) J. Pharm. Biomed. Anal. 30(6): 1725), and an alpha- methyl-PEG labeling reagent as described in, e.g., U.S. Provisional Patent Application No. 60/428,484, filed on Nov. 22, 2002, which references are each incorporated by reference. Nucleic acid labeling is also described further below. In some embodiments, labeling is achieved using synthetic nucleotides (e.g., synthetic ribonucleotides, etc.) and/or recombinant phycoerythrin (PE).
In addition, whether a fluorescent dye is a label or a quencher is generally defined by its excitation and emission spectra, and the fluorescent dye with which it is paired. Fluorescent molecules commonly used as quencher moieties in probes and primers include, e.g., fluorescein, FAM, JOE, rhodamine, R6G, TAMRA, ROX, DABCYL, and EDANS. Many of these compounds are available from the commercial suppliers referred to above. Exemplary non-fluorescent or dark quenchers that dissipate energy absorbed from a fluorescent dye include the Black HoIe Quenchers™ or BHQ™, which are commercially available from Biosearch Technologies, Inc. (Novato, CA, USA).
To further illustrate, essentially any nucleic acid (and virtually any labeled nucleic acid, whether standard or non-standard) can be custom or standard ordered from any of a variety of commercial sources, such as The Midland Certified Reagent
Company, The Great American Gene Company, ExpressGen Inc., Operon Technologies Inc., Proligo LLC, and many others.
In certain embodiments, modified nucleotides are included in probes and primers. To illustrate, the introduction of modified nucleotide substitutions into oligonucleotide sequences can, e.g., increase the melting temperature of the oligonucleotides. In some embodiments, this can yield greater sensitivity relative to corresponding unmodified oligonucleotides even in the presence of one or more mismatches in sequence between the target nucleic acid and the particular oligonucleotide. Exemplary modified nucleotides that can be substituted or added in oligonucleotides include, e.g., C5-ethyl-dC, C5-methyl-dU, C5-ethyl-dU, 2,6- diaminopurines, C5-propynyl-dC, C7-propynyl-dA, C7-propynyl-dG, C5- propargylamino-dC, C5-propargylamino-dU, C7-propargylamino-dA, C7- propargylamino-dG, 7-deaza-2-deoxyxanthosine, pyrazolopyrimidine analogs, pseudo-dU, nitro pyrrole, nitro indole, 2'-0-methyl Ribo-U, 2'-0-methyl Ribo-C, an 8-aza-dA, an 8-aza-dG, a 7-deaza-dA, a 7-deaza-dG, N4-ethyl-dC, N6-methyl-dA, etc. To further illustrate, other examples of modified oligonucleotides include those having one or more LNA™ monomers. Nucleotide analogs such as these are also described in, e.g., U.S. Pat. No. 6,639,059, entitled "SYNTHESIS OF [2.2.I]BICYCLO NUCLEOSIDES," issued October 28, 2003 to Kochkine et al, U.S. Pat. No. 6,303,315, entitled "ONE STEP SAMPLE PREPARATION AND
DETECTION OF NUCLEIC ACIDS IN COMPLEX BIOLOGICAL SAMPLES," issued October 16, 2001 to Skouv, and U.S. Pat. Application Pub. No. 2003/0092905, entitled "SYNTHESIS OF [2.2.1]BICYCLO NUCLEOSIDES," by Kochkine et al. that published May 15, 2003, which are each incorporated by reference. Oligonucleotides comprising LNA™ monomers are commercially available through, e.g., Exiqon A/S (Vedbask, DK). Additional oligonucleotide modifications are referred to herein, including in the definitions provided above.
ARRAY FORMATS
In certain embodiments, oligonucleotide probes designed to hybridize with target nucleic acids are covalently or noncovalently attached to solid supports. In these embodiments, labeled amplicons derived from patient samples are typically contacted with these solid support-bound probes to effect hybridization and detection. In other embodiments, amplicons are attached to solid supports and contacted with labeled probes. Optionally, antibodies, aptamers, or other probe biomolecules utilized in a given assay are similarly attached to solid supports.
Essentially any substrate material can be adapted for use as a solid support. In certain embodiments, for example, substrates are fabricated from silicon, glass, or polymeric materials (e.g., glass or polymeric microscope slides, silicon wafers, wells of microwell plates, etc.). Suitable glass or polymeric substrates, including microscope slides, are available from various commercial suppliers, such as Fisher
Scientific (Pittsburgh, PA, USA) or the like. In some embodiments, solid supports utilized in the invention are membranes. Suitable membrane materials are optionally selected from, e.g. polyaramide membranes, polycarbonate membranes, porous plastic matrix membranes (e.g., POREX® Porous Plastic, etc.), nylon membranes, ceramic membranes, polyester membranes, polytetrafluoroethylene
(TEFLON®) membranes, nitrocellulose membranes, or the like. Many of these membranous materials are widely available from various commercial suppliers, such as, PJ. Cobert Associates, Inc. (St. Louis, MO, USA), Millipore Corporation (Bedford, MA, USA), or the like. Other exemplary solid supports that are optionally utilized include, e.g., ceramics, metals, resins, gels, plates, beads (e.g., magnetic microbeads, etc.), whiskers, fibers, combs, single crystals, self- assembling monolayers, and the like.
Nucleic acids are directly or indirectly (e.g., via linkers, such as bovine serum albumin (BSA) or the like) attached to the supports, e.g., by any available chemical or physical method. A wide variety of linking chemistries are available for linking molecules to a wide variety of solid supports. More specifically, nucleic acids may be attached to the solid support by covalent binding, such as by conjugation with a coupling agent or by non-covalent binding, such as electrostatic interactions, hydrogen bonds or antibody-antigen coupling, or by combinations thereof. Typical coupling agents include biotin/avidin, biotin/streptavidin, Staphylococcus aureus protein A/IgG antibody Fc fragment, and streptavidin/protein A chimeras (Sano et al. (1991) Bio/Technology 9:1378, which is incorporated by reference), or derivatives or combinations of these agents. Nucleic acids may be attached to the solid support by a photocleavable bond, an electrostatic bond, a disulfide bond, a peptide bond, a diester bond or a combination of these bonds. Nucleic acids are also optionally attached to solid supports by a selectively releasable bond such as
4,4'-dimethoxytrityl or its derivative.
Cleavable attachments can be created by attaching cleavable chemical moieties between the probes and the solid support including, e.g., an oligopeptide, oligonucleotide, oligopolyamide, oligoacrylamide, oligoethylene glycerol, alkyl chains of between about 6 to 20 carbon atoms, and combinations thereof. These moieties may be cleaved with, e.g., added chemical agents, electromagnetic radiation, or enzymes. Exemplary attachments cleavable by enzymes include peptide bonds, which can be cleaved by proteases, and phosphodiester bonds which can be cleaved by nucleases. Chemical agents such as β-mercaptoethanol, dithiothreitol (DTT) and other reducing agents cleave disulfide bonds. Other agents which may be useful include oxidizing agents, hydrating agents and other selectively active compounds. Electromagnetic radiation such as ultraviolet, infrared and visible light cleave photocleavable bonds. Attachments may also be reversible, e.g., using heat or enzymatic treatment, or reversible chemical or magnetic attachments. Release and reattachment can be performed using, e.g., magnetic or electrical fields.
A number of array systems have been described and can be adapted for use in the detection of target microbial nucleic acids. Aspects of array construction and use are also described in, e.g., Sapolsky et al. (1999) "High-throughput polymorphism screening and genotyping with high-density oligonucleotide arrays" Genetic
Analysis: Biomolecular Engineering 14:187-192, Lockhart (1998) "Mutant yeast on drugs" Nature Medicine 4:1235-1236, Fodor (1997) "Genes, Chips and the Human Genome" FASEB Journal 11 :A879, Fodor (1997) "Massively Parallel Genomics" Science 277: 393-395, and Chee et al. (1996) "Accessing Genetic Information with High-Density DNA Arrays" Science 274:610-614, all of which are incorporated by reference.
NUCLEIC ACID HYBRIDIZATION
The length of complementary region or sequence between primer or probes and their binding partners (e.g., target nucleic acids) should generally be sufficient to allow selective or specific hybridization of the primers or probes to the targets at the selected annealing temperatures used for a particular nucleic acid amplification protocol, expression profiling assay, etc. Although other lengths are optionally utilized, complementary regions of, for example, between about 10 and about 50 nucleotides (e.g., about 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, or 25 or more nucleotides) are typically used in a given application. "Stringent hybridization wash conditions" in the context of nucleic acid hybridization experiments, such as Southern and northern hybridizations, are sequence dependent, and are different under different environmental parameters. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993), supra, and in Hames and Higgins 1 and Hames and Higgins 2, supra. For purposes of the present invention, generally, "highly stringent" hybridization and wash conditions are selected to be about 5° C or less lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH (as noted below, highly stringent conditions can also be referred to in comparative terms). The T111 is the temperature (under defined ionic strength and pH) at which 50% of the test sequence hybridizes to a perfectly matched primer or probe. Very stringent conditions are selected to be equal to the T111 for a particular primer or probe.
The T,n is the temperature of the nucleic acid duplexes indicates the temperature at which the duplex is 50% denatured under the given conditions and its represents a direct measure of the stability of the nucleic acid hybrid. Thus, the Tm corresponds to the temperature corresponding to the midpoint in transition from helix to random coil; it depends on length, nucleotide composition, and ionic strength for long stretches of nucleotides.
After hybridization, unhybridized nucleic acid material can be removed by a series of washes, the stringency of which can be adjusted depending upon the desired results. Low stringency washing conditions (e.g., using higher salt and lower temperature) increase sensitivity, but can product nonspecific hybridization signals and high background signals. Higher stringency conditions (e.g., using lower salt and higher temperature that is closer to the hybridization temperature) lowers the background signal, typically with only the specific signal remaining. See, e.g., Rapley et al. (Eds.), Molecular Biomethods Handbook (Humana Press, Inc. 1998), which is incorporated by reference.
Thus, one measure of stringent hybridization is the ability of the primer or probe to hybridize to one or more of the target nucleic acids (or complementary polynucleotide sequences thereof) under highly stringent conditions. Stringent hybridization and wash conditions can easily be determined empirically for any test nucleic acid.
For example, in determining highly stringent hybridization and wash conditions, the hybridization and wash conditions are gradually increased (e.g., by increasing temperature, decreasing salt concentration, increasing detergent concentration and/or increasing the concentration of organic solvents, such as formalin, in the hybridization or wash), until a selected set of criteria is met. For example, the hybridization and wash conditions are gradually increased until a target nucleic acid, and complementary polynucleotide sequences thereof, binds to a perfectly matched complementary nucleic acid. A target nucleic acid is said to specifically hybridize to a primer or probe nucleic acid when it hybridizes at least Vi as well to the primer or probe as to a perfectly matched complementary target, i.e., with a signal to noise ratio at least Vi as high as hybridization of the primer or probe to the target under conditions in which the perfectly matched primer or probe binds to the perfectly matched complementary target with a signal to noise ratio that is at least about 2.5x-10x, typically 5x-10x as high as that observed for hybridization to any of the unmatched target nucleic acids. NUCLEIC ACID AMPLIFICATION
In some embodiments, RNA is converted to cDNA in a reverse-transcription (RT) reaction using, e.g., a target-specific primer complementary to the RNA for each gene target being monitored. Methods of reverse transcribing RNA into cDNA are well known, and described in Sambrook, supra. Alternative methods for reverse transcription utilize thermostable DNA polymerases, as described in the art. As an exemplary embodiment, avian myeloblastosis virus reverse transcriptase (AMV- RT), or Maloney murine leukemia virus reverse transcriptase (MoMLV-RT) is used, although other enzymes are also optionally utilized. An advantage of using target-specific primers in the RT reaction is that only the desired sequences are converted into a PCR template. Superfluous primers or cDNA products are generally not carried into subsequent PCR amplifications. In another embodiment, RNA targets are reverse transcribed using non-specific primers, such as an anchored oligo-dT primer, or random sequence primers. An advantage of this embodiment is that the "unfractionated" quality of the mRNA sample is maintained because the sites of priming are non-specific, i.e., the products of this RT reaction will serve as template for any desired target in the subsequent PCR amplification. This allows samples to be archived in the form of DNA, which is more stable than RNA. In other embodiments, transcription-based amplification systems (TAS) are used, such as that first described by Kwoh et al. (Proc. Natl. Acad. Sci. (1989) 86(4): 1173-7), or isothermal transcription-based systems such as 3SR (Self- Sustained Sequence Replication; Guatelli et al. (1990) Proc. Natl. Acad. Sci. 87:1874-1878) or NASBA (nucleic acid sequence based amplification; Kievits et al. (1991) J Virol Methods. 35(3):273-86), which are each incorporated by reference. In these methods, the mRNA target of interest is copied into cDNA by a reverse transcriptase. The primer for cDNA synthesis includes the promoter sequence of a designated DNA-dependent RNA polymerase 5' to the primer's region of homology with the template. The resulting cDNA products can then serve as templates for multiple rounds of transcription by the appropriate RNA polymerase. Transcription of the cDNA template rapidly amplifies the signal from the original target mRNA. The isothermal reactions bypass the need for denaturing cDNA strands from their RNA templates by including RNAse H to degrade RNA hybridized to DNA.
In other exemplary embodiments, amplification is accomplished by used of the ligase chain reaction (LCR), disclosed in European Patent Application No. 320,308 (Backman and Wang), or by the ligase detection reaction (LDR), disclosed in U.S.
Patent No. 4,883,750 (Whiteley et al.), which are each incorporated by reference. In LCR, two probe pairs are typically prepared, which are complimentary each other, and to adjacent sequences on both strands of the target. Each pair will bind to opposite strands of the target such that they abut. Each of the two probe pairs can then be linked to form a single unit, using a thermostable ligase. By temperature cycling, as in PCR, bound ligated units dissociate from the target, then both molecules can serve as "target sequences" for ligation of excess probe pairs, providing for an exponential amplification. The LDR is very similar to LCR. In this variation, oligonucleotides complimentary to only one strand of the target are used, resulting in a linear amplification of ligation products, since only the original target DNA can serve as a hybridization template. It is used following a PCR amplification of the target in order to increase signal.
In further embodiments, several methods generally known in the art would be suitable methods of amplification. Some additional examples include, but are not limited to, strand displacement amplification (Walker et al. (1992) Nucleic Acids
Res. 20:1691-1696), repair chain reaction (REF), cyclic probe reaction (REF), solid-phase amplification, including bridge amplification (Mehta and Singh (1999) BioTechniques 26(6): 1082-1086), rolling circle amplification (Kool, U.S. Patent No. 5,714,320), rapid amplification of cDNA ends (Frohman (1988) Proc. Natl. Acad. Sci. 85: 8998-9002), and the "invader assay" (Griffin et al. (1999) Proc.
Natl. Acad. Sci. 96: 6301 -6306), which are each incorporated by reference. Amplicons are optionally recovered and purified from other reaction components by any of a number of methods well known in the art, including electrophoresis, chromatography, precipitation, dialysis, filtration, and/or centrifugation. Aspects of nucleic acid purification are described in, e.g., Douglas et al., DNA
Chromatography. Wiley, John & Sons, Inc. (2002), and Schott, Affinity Chromatography: Template Chromatography of Nucleic Acids and Proteins, Chromatographic Science Series, #27, Marcel Dekker (1984), both of which are incorporated by reference. In certain embodiments, amplicons are not purified prior to detection, such as when amplicons are detected simultaneous with amplification.
DATA COLLECTION
The number of species than can be detected within a mixture depends primarily on the resolution capabilities of the separation platform used, and the detection methodology employed. In some embodiments, separation steps are is based upon size-based separation technologies. Once separated, individual species are detected and quantitated by either inherent physical characteristics of the molecules themselves, or detection of an associated label.
Embodiments employing other separation methods are also described. For example, certain types of labels allow resolution of two species of the same mass through deconvolution of the data. Non-size based differentiation methods (such as deconvolution of data from overlapping signals generated by two different fluorophores) allow pooling of a plurality of multiplexed reactions to further increase throughput.
Separation Methods
Certain embodiments of the invention incorporate a step of separating the products of a reaction based on their size differences. The PCR products generated during an amplification reaction typically range from about 50 to about 500 bases in length, which can be resolved from one another by size. Any one of several devices may be used for size separation, including mass spectrometry, any of several electrophoretic devices, including capillary, polyacrylamide gel, or agarose gel electrophoresis, or any of several chromatographic devices, including column chromatography, HPLC, or FPLC.
In some embodiments, sample analysis includes the use of mass spectrometry. Several modes of separation that determine mass are possible, including Time-of- Flight (TOF), Fourier Transform Mass Spectrometry (FTMS), and quadruple mass spectrometry. Possible methods of ionization include Matrix-Assisted Laser
Desorption and Ionization (MALDI) or Electrospray Ionization (ESI). A preferred embodiment for the uses described in this invention is MALDI-TOF (Wu, et al. (1993) Rapid Communications in Mass Spectrometry 7:142-146, which is incorporated by reference). This method may be used to provide unfragmented mass spectra of mixed-base oligonucleotides containing between about 1 and about 1000 bases. In preparing the sample for analysis, the analyte is mixed into a matrix of molecules that resonantly absorb light at a specified wavelength. Pulsed laser light is then used to desorb oligonucleotide molecules out of the absorbing solid matrix, creating free, charged oligomers and minimizing fragmentation. An exemplary solid matrix material for this purpose is 3-hydroxypicolinic acid (Wu, supra), although others are also optionally used. In another embodiment, a microcapillary is used for analysis of nucleic acids obtained from the sample. Microcapillary electrophoresis generally involves the use of a thin capillary or channel, which may optionally be filled with a particular medium to improve separation, and employs an electric field to separate components of the mixture as the sample travels through the capillary. Samples composed of linear polymers of a fixed charge-to-mass ratio, such as DNA or
RNA, will separate based on size. The high surface to volume ratio of these capillaries allows application of very high electric fields across the capillary without substantial thermal variation, consequently allowing very rapid separations. When combined with confocal imaging methods, these methods provide sensitivity in the range of attomoles, comparable to the sensitivity of radioactive sequencing methods. The use of microcapillary electrophoresis in size separation of nucleic acids has been reported in Woolley and Mathies (Proc. Natl. Acad. Sci. USA (1994) 91 :11348-11352), which is incorporated by reference. Capillaries are optionally fabricated from fused silica, or etched, machined, or molded into planar substrates. In many microcapillary electrophoresis methods, the capillaries are filled with an appropriate separation/sieving matrix. Several sieving matrices are known in the art that may be used for this application, including, e.g., hydroxyethyl cellulose, polyacrylamide, agarose, and the like. Generally, the specific gel matrix, running buffers and running conditions are selected to obtain the separation required for a particular application. Factors that are considered include, e.g., sizes of the nucleic acid fragments, level of resolution, or the presence of undenatured nucleic acid molecules. For example, running buffers may include agents such as urea to denature double-stranded nucleic acids in a sample.
Microfluidic systems for separating molecules such as DNA and RNA are commercially available and are optionally employed in the methods of the present invention. For example, the "Personal Laboratory System" and the "High
Throughput System" have been developed by Caliper Lifesciences Corp. (Mountain View, CA). The Agilent 2100, which uses Caliper Lifesciences' LabChip™ microfluidic systems, is available from Agilent Technologies (Palo Alto, CA, USA). Currently, specialized microfluidic devices, which provide for rapid separation and analysis of both DNA and RNA are available from Caliper
Lifesciences for the Agilent 2100.
Other embodiments are generally known in the art for separating PCR amplification products by electrophoresis through gel matrices. Examples include polyacrylamide, agarose-acrylamide, or agarose gel electrophoresis, using standard methods (Sambrook, supra).
Alternatively, chromatographic techniques may be employed for resolving amplification products. Many types of physical or chemical characteristics may be used to effect chromatographic separation in the present invention, including adsorption, partitioning (such as reverse phase), ion-exchange, and size exclusion. Many specialized techniques have been developed for their application including methods utilizing liquid chromatography or HPLC (Katz and Dong (1990) BioTechniques 8(5):546-55; Gaus et al. (1993) J. Immunol. Methods 158:229-236). In yet another embodiment, cDNA products are captured by their affinity for certain substrates, or other incorporated binding properties. For example, labeled cDNA products such as biotin or antigen can be captured with beads bearing avidin or antibody, respectively. Affinity capture is utilized on a solid support to enable physical separation. Many types of solid supports are known in the art that would be applicable to the present invention. Examples include beads (e.g. solid, porous, magnetic), surfaces (e.g. plates, dishes, wells, flasks, dipsticks, membranes), or chromatographic materials (e.g. fibers, gels, screens).
Certain separation embodiments entail the use of microfluidic techniques. Technologies include separation on a microcapillary platform, such as designed by ACLARA BioSciences Inc. (Mountain View, CA), or the LabChip™ microfluidic devices made by Caliper Lifesciences Corp. Another technology developed by Nanogen, Inc. (San Diego, CA), utilizes microelectronics to move and concentrate biological molecules on a semiconductor microchip. The microfluidics platforms developed at Orchid Biosciences, Inc. (Princeton, NJ), including the Chemtel™
Chip, which provides for parallel processing of hundreds of reactions, can also be used in certain embodiments. These microfluidic platforms require only nanoliter sample volumes, in contrast to the microliter volumes required by other conventional separation technologies. Some of the processes usually involved in genetic analysis have been miniaturized using microfluidic devices. For example, PCT publication WO 94/05414 reports an integrated micro-PCR apparatus for collection and amplification of nucleic acids from a specimen. U.S. Patent Nos. 5,304,487 (Wilding et al.) and 5,296,375 (Kricka et al.) discuss devices for collection and analysis of cell-containing samples. U.S. Patent No. 5,856,174 (Lipshutz et al.) describes an apparatus that combines the various processing and analytical operations involved in nucleic acid analysis. Each of these references is incorporated by reference. Additional technologies are also contemplated. For example, Kasianowicz et al. (Proc. Natl. Acad. Sci. USA (1996) 93:13770-13773, which is incorporated by reference) describes the use of ion channel pores in a lipid bilayer membrane for determining the length of polynucleotides. In this system, an electric field is generated by the passage of ions through the pores. Polynucleotide lengths are measured as a transient decrease of ionic current due to blockage of ions passing through the pores by the nucleic acid. The duration of the current decrease was shown to be proportional to polymer length. Such a system can be applied as a size separation platform in certain embodiments of the present invention. Primers are useful both as reagents for hybridization in solution, such as priming PCR amplification, as well as for embodiments employing a solid phase, such as microarrays. With microarrays, sample nucleic acids such as mRNA or DNA are fixed on a selected matrix or surface. PCR products may be attached to the solid surface via one of the amplification primers, then denatured to provide single- stranded DNA. This spatially-partitioned, single-stranded nucleic acid is then subject to hybridization with selected probes under conditions that allow a quantitative determination of target abundance. In this embodiment, amplification products from each individual reaction are not physically separated, but are differentiated by hybridizing with a set of probes that are differentially labeled. Alternatively, unextended amplification primers may be physically immobilized at discreet positions on the solid support, then hybridized with the products of a nucleic acid amplification for quantitation of distinct species within the sample. In this embodiment, amplification products are separated by way of hybridization with probes that are spatially separated on the solid support. Separation platforms may optionally be coupled to utilize two different separation methodologies, thereby increasing the multiplexing capacity of reactions beyond that which can be obtained by separation in a single dimension. For example, some of the RT-PCR primers of a multiplex reaction may be coupled with a moiety that allows affinity capture, while other primers remain unmodified. Samples are then passed through an affinity chromatography column to separate PCR products arising from these two classes of primers. Flow-through fractions are collected and the bound fraction eluted. Each fraction may then be further separated based on other criteria, such as size, to identify individual components.
Detection Methods Following separation of the different products of a multiplex amplification, one or more of the amplicons are detected and/or quantitated. Some embodiments of the methods of the present invention enable direct detection of products. Other embodiments detect reaction products via a label associated with one or more of the amplification primers. Many types of labels suitable for use in the present invention are known in the art, including chemiluminescent, isotopic, fluorescent, electrochemical, inferred, or mass labels, or enzyme tags. In further embodiments, separation and detection may be a multi-step process in which samples are fractionated according to more than one property of the products, and detected one or more stages during the separation process. An exemplary embodiment of the invention that does not use labeling or modification of the molecules being analyzed is detection of the mass-to-charge ratio of the molecule itself. This detection technique is optionally used when the separation platform is a mass spectrometer. An embodiment for increasing resolution and throughput with mass detection is in mass-modifying the amplification products. Nucleic acids can be mass-modified through either the amplification primer or the chain-elongating nucleoside triphosphates. Alternatively, the product mass can be shifted without modification of the individual nucleic acid components, by instead varying the number of bases in the primers. Several types of moieties have been shown to be compatible with analysis by mass spectrometry, including polyethylene glycol, halogens, alkyl, aryl, or aralkyl moieties, peptides (described in, for example, U.S. Patent No. 5,691,141, which is incorporated by reference). Isotopic variants of specified atoms, such as radioisotopes or stable, higher mass isotopes, are also used to vary the mass of the amplification product. Radioisotopes can be detected based on the energy released when they decay, and numerous applications of their use are generally known in the art. Stable (non-decaying) heavy isotopes can be detected based on the resulting shift in mass, and are useful for distinguishing between two amplification products that would otherwise have similar or equal masses. Other embodiments of detection that make use of inherent properties of the molecule being analyzed include ultraviolet light absorption (UV) or electrochemical detection. Electrochemical detection is based on oxidation or reduction of a chemical compound to which a voltage has been applied. Electrons are either donated
(oxidation) or accepted (reduction), which can be monitored as current. For both UV absorption and electrochemical detection, sensitivity for each individual nucleotide varies depending on the component base, but with molecules of sufficient length this bias is insignificant, and detection levels can be taken as a direct reflection of overall nucleic acid content.
Some embodiments of the invention include identifying molecules indirectly by detection of an associated label. A number of labels may be employed that provide a fluorescent signal for detection. If a sufficient quantity of a given species is generated in a reaction, and the mode of detection has sufficient sensitivity, then some fluorescent molecules may be incorporated into one or more of the primers used for amplification, generating a signal strength proportional to the concentration of DNA molecules. Several fluorescent moieties, including Alexa 350, Alexa 430, AMCA, BODIPY 630/650, BODIPY 650/665, BODIPY-FL, BODIPY-R6G, BODIPY-TMR, BODIPY-TRX, carboxyfluorescein, Cascade Blue, Cy3, Cy5, 6-FAM, Fluorescein, HEX, 6-JOE, Oregon Green 488, Oregon Green 500, Oregon Green 514, Pacific Blue, REG, Rhodamine Green, Rhodamine Red, ROX, TAMRA, TET, Tetramethylrhodamine, and Texas Red, are generally known in the art and routinely used for identification of discrete nucleic acid species, such as in sequencing reactions. Many of these dyes have emission spectra distinct from one another, enabling deconvolution of data from incompletely resolved samples into individual signals. This allows pooling of separate reactions that are each labeled with a different dye, increasing the throughput during analysis, as described in more detail below. Additional examples of suitable labels are described herein.
The signal strength obtained from fluorescent dyes can be enhanced through use of related compounds called energy transfer (ET) fluorescent dyes. After absorbing light, ET dyes have emission spectra that allow them to serve as "donors" to a secondary "acceptor" dye that will absorb the emitted light and emit a lower energy fluorescent signal. Use of these coupled-dye systems can significantly amplify fluorescent signal. Examples of ET dyes include the ABI PRISM BigDye terminators, recently commercialized by Perkin-Elmer Corporation (Foster City, CA, USA) for applications in nucleic acid analysis. These chromophores incorporate the donor and acceptor dyes into a single molecule and an energy transfer linker couples a donor fluorescein to a dichlororhodamine acceptor dye, and the complex is attached, e.g., to a primer. Fluorescent signals can also be generated by non-covalent intercalation of fluorescent dyes into nucleic acids after their synthesis and prior to separation.
This type of signal will vary in intensity as a function of the length of the species being detected, and thus signal intensities must be normalized based on size. Several applicable dyes are known in the art, including, but not limited to, ethidium bromide and Vistra Green. Some intercalating dyes, such as YOYO or TOTO, bind so strongly that separate DNA molecules can each be bound with a different dye and then pooled, and the dyes will not exchange between DNA species. This enables mixing separately generated reactions in order to increase multiplexing during analysis.
Alternatively, technologies such as the use of nanocrystals as a fluorescent DNA label (Alivisatos, et al. (1996) Nature 382:609-11, which is incorporated by reference) can be employed in the methods of the present invention. Another method, described by Mazumder, et al. (Nucleic Acids Res. (1998) 26:1996-2000, which is incorporated by reference), describes hybridization of a labeled oligonucleotide probe to its target without physical separation from unhybridized probe. In this method, the probe is labeled with a chemiluminescent molecule that in the unbound form is destroyed by sodium sulfite treatment, but is protected in probes that have hybridized to target sequence.
In other embodiments, both electrochemical and infrared methods of detection can be amplified over the levels inherent to nucleic acid molecules through attachment of EC or IR labels. Their characteristics and use as labels are described in, for example, PCT publication WO 97/27327, which is incorporated by reference.
Some preferred compounds that can serve as an IR label include an aromatic nitrile, aromatic alkynes, or aromatic azides. Numerous compounds can serve as an EC label; many are listed in PCT publication WO 97/27327. Enzyme-linked reactions are also employed in the detecting step of the methods of the present invention. Enzyme-linked reactions theoretically yield an infinite signal, due to amplification of the signal by enzymatic activity. In this embodiment, an enzyme is linked to a secondary group that has a strong binding affinity to the molecule of interest. Following separation of the nucleic acid products, enzyme is bound via this affinity interaction. Nucleic acids are then detected by a chemical reaction catalyzed by the associated enzyme. Various coupling strategies are possible utilizing well-characterized interactions generally known in the art, such as those between biotin and avidin, an antibody and antigen, or a sugar and lectin. Various types of enzymes can be employed, generating colorimetric, fluorescent, chemiluminescent, phosphorescent, or other types of signals. As an illustration, a primer may be synthesized containing a biotin molecule. After amplification, amplicons are separated by size, and those made with the biotinylated primer are detected by binding with streptavidin that is covalently coupled to an enzyme, such as alkaline phosphatase. A subsequent chemical reaction is conducted, detecting bound enzyme by monitoring the reaction product. The secondary affinity group may also be coupled to an enzymatic substrate, which is detected by incubation with unbound enzyme. One of skill in the art can conceive of many possible variations on the different embodiments of detection methods described above.
In some embodiments, it may be desirable prior to detection to separate a subset of amplification products from other components in the reaction, including other products. Exploitation of known high-affinity biological interactions can provide a mechanism for physical capture. Some examples of high- affinity interactions include those between a hormone with its receptor, a sugar with a lectin, avidin and biotin, or an antigen with its antibody. After affinity capture, molecules are retrieved by cleavage, denaturation, or eluting with a competitor for binding, and then detected as usual by monitoring an associated label. In some embodiments, the binding interaction providing for capture may also serve as the mechanism of detection.
Furthermore, the size of an amplification product or products are optionally changed, or "shifted," in order to better resolve the amplification products from other products prior to detection. For example, chemically cleavable primers can be used in the amplification reaction. In this embodiment, one or more of the primers used in amplification contains a chemical linkage that can be broken, generating two separate fragments from the primer. Cleavage is performed after the amplification reaction, removing a fixed number of nucleotides from the 5' end of products made from that primer. Design and use of such primers is described in detail in, for example, PCT publication WO 96/37630, which is incorporated by reference.
DATA ANALYSIS
For reliably distinguishing leukemias with t(l Iq23)/MLL from other leukemias it is generally desirable to determine the expression of more than one of the markers described herein. As an exemplary criterion for the choice of markers, the statistical significance of markers as expressed in q orp values based on the concept of the false discovery rate is optionally determined. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the/? value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate (see, e.g., Storey et al. (2003) Proc.Natl.Acad.Sci. 100:9440-5, which is incorporated by reference). In some embodiments, the markers described herein have ^-values of less than about 3E-06, typically less than about 1.5E-09, more typically less than about 1.5E- 11, even more typically less than about 1.5E-20, and still more typically less than about 1.5E-30.
Of the markers described or referred to herein, the expression level of at least about two, typically of at least about ten, more typically of at least about 25, and even more typically of at least about 50 of these markers is determined as described herein or by another technique known to those of skill in the art. In some embodiments, for example, expression levels of one or more genes selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14 are determined in a given sample. In certain embodiments, expression levels of each of these genes in a sample is determined and compared with expression levels detected in one or more reference leukemia cells. Furthermore, the International Publication No. WO 03/039443, which is incorporated by reference, discloses certain marker genes the expression levels of which are characteristic for certain leukemia. Certain of the markers and/or methods disclosed therein are optionally utilized in performing the methods described herein.
The level of the expression of a marker is indicative of the genotype of the target cell. The level of expression of a marker or group of markers is measured and is generally compared with the level of expression of the same marker or the same group of markers from other cells or samples. The comparison may be effected in an actual experiment or in silico. There is a meaningful difference in these levels of expression, e.g., when these expression levels (also referred to as expression pattern, expression signature, or expression profile) are measurably different. In some embodiments, the difference is typically at least about 5%, 10% or 20%, more typically at least about 50% or may even be as high as 75% or 100%. To further illustrate, the difference in the level of expression is optionally at least about 200%, i.e., two fold, at least about 500%, i.e., five fold, or at least about 1000%, i.e., 10 fold in some embodiments.
In certain embodiments, for example, the expression level of markers expressed lower in a first subtype than in at least one second subtype, which differs from the first subtype, is at least about 5%, 10% or 20%, more typically at least about 50% or may even be about 75% or about 100%, more typically at least about 10-fold, even more typically at least 50-fold, and still more typically at least about 100-fold lower in the first subtype. On the other hand, the expression level of markers expressed higher in a first subtype than in at least one second subtype, which differs from the first subtype, is at generally least about 5%, 10% or 20%, more generally at least about 50% or may even be about 75% or about 100%, more generally at least 10-fold, still more generally at least about 50-fold, and even more generally at least about 100-fold higher in the first subtype.
The classification accuracy of a given gene list for a set of microarray experiments is preferably estimated using Support Vector Machines (SVM), because there is evidence that SVM-based prediction slightly outperforms other classification techniques, such as k-Nearest Neighbors (k-NN). The LIBSVM software package version 2.36, for example, is optionally used (SVM-type: SVC, linear kernel (http://www.csie.ntu.edu.tw/-cjlin/libsvrn/)). Machine learning algorithms are also described in, e.g., Brown et al. (2000) Proc.Natl.Acad.Sci.. 97:262-267, Furey et al.
(2000) Bioinformatics, 16:906-914, and Vapnik, Statistical Learning Theory, Wiley (1998), which are each incorporated by reference.
To further illustrate, the classification accuracy of a given gene list for a set of microarray experiments can be estimated using Support Vector Machines (SVM) as supervised learning techniques. Generally, SVMs are trained using differentially expressed genes, which were identified on a subset of the data and then this trained model is employed to assign new samples to those trained groups from a second and different data set. Differentially expressed genes are optionally identified, e.g., applying analysis of variance (ANOVA) and t-test-statistics (Welch t-test). Based on identified distinct gene expression signatures, respective training sets consisting of, e.g., 2/3 of cases and test sets with 1/3 of cases to assess classification accuracies can be designated. Assignment of cases to training and test sets is optionally randomized and balanced by diagnosis. Based on the training set, a Support Vector Machine (SVM) model can be built using this approach.
The apparent accuracy of prediction, i.e., the overall rate of correct predictions of the complete data set can be estimated by, e.g., lOfold cross validation. This process typically includes dividing the data set into 10 approximately equally sized subsets, training an SVM-model for 9 subsets, and generating predictions for the remaining subset. This training and prediction process can be repeated 10 times to include predictions for each subset. Subsequently the data set can be split into a training set, consisting of two thirds of the samples, and a test set with the remaining one third. Apparent accuracy for the training set can also be estimated by lOfold cross validation (analogous to apparent accuracy for complete set). An SVM-model of the training set is optionally built to predict diagnosis in the independent test set, thereby estimating true accuracy of the prediction model. This prediction approach can be applied both for overall classification (multi-class) and binary classification (diagnosis X => yes or no). For the latter, sensitivity and specificity are optionally calculated, as follows:
Sensitivity = (number of positive samples predicted)/(number of true positive) Specificity = (number of negative samples predicted)/(number of true negatives).
SYSTEMS FOR GENE EXPRESSION ANALYSIS
The present invention also provides systems for analyzing gene expression. The system includes one or more probes that correspond to at least portions of genes or expression products thereof. The genes are generally selected from the markers listed in, e.g., Table 8, Table 9, Table 10, Table 13, and/or Table 14. In some embodiments, for example, the probes are nucleic acids (e.g., oligonucleotides, cDNAs, cRNAs, etc.), whereas in other embodiments, the probes are biomolecules (e.g., antibodies, aptmers, etc.) designed to detect expression products of the genes (e.g., proteins or fragments thereof). In certain embodiments, the probes are arrayed on a solid support, whereas in others, they are provided in one or more containers, e.g., for assays performed in solution. The system also includes at least one reference data bank or database for correlating detected expression levels of polynucleotides and/or polypeptides in at least one target leukemia cell from a human subject, which polynucleotides and/or polypeptides are targets of one or more of the probes, with the target leukemia cell comprising a t(l Iq23)/MLL. In some embodiments, the reference data bank is backed up on a computational data memory chip or other computer readable medium, which can be inserted in as well as removed from system of the present invention, e.g., like an interchangeable module, in order to use another data memory chip containing a different reference data bank. In certain embodiments, the systems also include detectors (e.g., spectrometers, etc.) that detect binding between the probes and targets. Other detectors are described further below. In addition, the systems also generally include at least one controller operably connected to the reference data bank and/or to the detector. In some embodiments, for example, the controller is integral with the reference data bank.
The systems of the present invention that include a desired reference data bank can be used in a way such that an unknown sample is, first, subjected to gene expression profiling, e.g., by microarray analysis in a manner as described herein or otherwise known to person skilled in the art, and the expression level data obtained by the analysis are, second, fed into the system and compared with the data of the reference data bank obtainable by the above method. For this purpose, the apparatus suitably contains a device for entering the expression level of the data, for example, a control panel such as a keyboard. The results, whether and how the data of the unknown sample fit into the reference data bank can be made visible on a monitor or display screen and, if desired, printed out on an incorporated of connected printer. Computer components are described further below. In some embodiments, a system optionally further includes a thermal modulator operably connected to containers to modulate temperature in the containers (e.g., to effect thermocycling when target nucleic acids are amplified in the containers), and/or fluid transfer components (e.g., automated pipettors, etc.) that transfer fluid to and/or from the containers. Optionally, these systems also include robotic components for translocating solid supports, containers, and the like, and/or separation components (e.g., microfluidic devices, chromatography columns, etc.) for separating the products of amplification reactions from one another.
The invention further provides a computer or computer readable medium that includes a data set that comprises a plurality of character strings that correspond to a plurality of sequences (or subsequences thereof) that correspond to genes selected from, e.g., the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14. Typically, the computer or computer readable medium further includes an automatic synthesizer coupled to an output of the computer or computer readable medium. The automatic synthesizer accepts instructions from the computer or computer readable medium, which instructions direct synthesis of, e.g., one or more probe nucleic acids that correspond to one or more character strings in the data set.
Detectors are structured to detect detectable signals produced, e.g., in or proximal to another component of the system (e.g., in container, on a solid support, etc.). Suitable signal detectors that are optionally utilized, or adapted for use, in these systems detect, e.g., fluorescence, phosphorescence, radioactivity, absorbance, refractive index, luminescence, or the like. Detectors optionally monitor one or a plurality of signals from upstream and/or downstream of the performance of, e.g., a given assay step. For example, the detector optionally monitors a plurality of optical signals, which correspond in position to "real time" results. Example detectors or sensors include photomultiplier tubes, CCD arrays, optical sensors, temperature sensors, pressure sensors, pH sensors, conductivity sensors, scanning detectors, or the like. Each of these as well as other types of sensors is optionally readily incorporated into the systems described herein. Optionally, the systems of the present invention include multiple detectors.
More specific exemplary detectors that are optionally utilized in these systems include, e.g., a resonance light scattering detector, an emission spectroscope, a fluorescence spectroscope, a phosphorescence spectroscope, a luminescence spectroscope, a spectrophotometer, a photometer, and the like. Various synthetic components are also utilized, or adapted for, use in the systems of the invention including, e.g., automated nucleic acid synthesizers, e.g., for synthesizing the oligonucleotides probes described herein. Detectors and synthetic components that are optionally included in the systems of the invention are described further in, e.g., Skoog et al., Principles of Instrumental Analysis, 5th Ed., Harcourt Brace College Publishers (1998) and Currell, Analytical Instrumentation: Performance Characteristics and Quality, John Wiley & Sons, Inc. (2000), both of which are incorporated by reference.
The systems of the invention also typically include controllers that are operably connected to one or more components (e.g., detectors, synthetic components, thermal modulator, fluid transfer components, etc.) of the system to control operation of the components. More specifically, controllers are generally included either as separate or integral system components that are utilized, e.g., to receive data from detectors, to effect and/or regulate temperature in the containers, to effect and/or regulate fluid flow to or from selected containers, or the like. Controllers and/or other system components is/are optionally coupled to an appropriately programmed processor, computer, digital device, or other information appliance
(e.g., including an analog to digital or digital to analog converter as needed), which functions to instruct the operation of these instruments in accordance with preprogrammed or user input instructions, receive data and information from these instruments, and interpret, manipulate and report this information to the user. Suitable controllers are generally known in the art and are available from various commercial sources.
Any controller or computer optionally includes a monitor which is often a cathode ray tube ("CRT") display, a flat panel display (e.g., active matrix liquid crystal display, liquid crystal display, etc.), or others. Computer circuitry is often placed in a box, which includes numerous integrated circuit chips, such as a microprocessor, memory, interface circuits, and others. The box also optionally includes a hard disk drive, a floppy disk drive, a high capacity removable drive such as a writeable CD-ROM, and other common peripheral elements. Inputting devices such as a keyboard or mouse optionally provide for input from a user. These components are illustrated further below. The computer typically includes appropriate software for receiving user instructions, either in the form of user input into a set of parameter fields, e.g., in a GUI, or in the form of preprogrammed instructions, e.g., preprogrammed for a variety of different specific operations. The software then converts these instructions to appropriate language for instructing the operation of one or more controllers to carry out the desired operation. The computer then receives the data from, e.g., sensors/detectors included within the system, and interprets the data, either provides it in a user understood format, or uses that data to initiate further controller instructions, in accordance with the programming, e.g., such as controlling fluid flow regulators in response to fluid weight data received from weight scales or the like.
The computer can be, e.g., a PC (Intel x86 or Pentium chip-compatible DOS™, OS2™, WINDOWS™, WINDOWS NT™, WINDOWS95™, W1NDOWS98™, WINDOWS2000™, WINDOWS XP™, LINUX-based machine, a MACINTOSH™, Power PC, or a UNIX-based (e.g., SUN™ work station) machine) or other common commercially available computer which is known to one of skill. Standard desktop applications such as word processing software (e.g., Microsoft Word™ or Corel WordPerfect™) and database software (e.g., spreadsheet software such as Microsoft Excel™, Corel Quattro Pro™, or database programs such as Microsoft Access™ or Paradox™) can be adapted to the present invention. Software for performing, e.g., controlling temperature modulators and fluid flow regulators is optionally constructed by one of skill using a standard programming language such as Visual basic, Fortran, Basic, Java, or the like.
Reference data banks can be produced by, e.g., (a) compiling a gene expression profile of a patient sample by determining the expression level of at least one marker selected from those listed in, e.g., Table 8, Table 9, Table 10, Table 13, and/or Table 14, and (b) classifying the gene expression profile using a machine learning algorithm. Exemplary machine learning algorithms are optionally selected from, e.g., Weighted Voting, K-Nearest Neighbors, Decision Tree Induction, Support Vector Machines (SVM), and Feed-Forward Neural Networks. In some embodiments, for example, the machine learning algorithm is an SVM, such as polynomial kernel, linear kernel, and Gaussian Radial Basis Function-kernel SVM models.
KITS
The present invention also provides kits that include at least one probe as described herein for genotyping leukemia cells. The kits also include instructions for correlating detected expression levels of polynucleotides and/or polypeptides in at least one target leukemia cell from a human subject, which polynucleotides and/or polypeptides are targets of one or more of the probes, with the target leukemia cell comprising a t(l Iq23)/MLL. Typically, the kit includes suitable auxiliaries, such as buffers, enzymes, labeling compounds, and/or the like. In some embodiments, probes are attached to solid supports, e.g. the wells of microtiter plates, nitrocellulose membrane surfaces, glass surfaces, to particles in solution, etc. As another option, probes are provided free in solution in containers, e.g., for performing the methods of the invention in a solution phase. In certain embodiments, kits also contain at least one reference for a leukemia that, e.g., lacks or comprises t(l Iq23)/MLL. For example, the reference can be a sample, a database, or the like. In some embodiments, the kit includes primers and other reagents for amplifying target nucleic acids. Typically, kits also include at least one container for packaging the probes, the set of instructions, and any other included components.
EXAMPLES
It is understood that the examples and embodiments described herein are for illustrative purposes only and are not intended to limit the scope of the claimed invention. It is also understood that various modifications or changes in light the examples and embodiments described herein will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. EXAMPLE 1: GENE EXPRESSION ANALYSIS OF MLL GENE REARRANGED ACUTE LEUKEMIAS
INTRODUCTION
The MLL gene (also termed ALL-I, HRX, and TRXl) located at chromosome band 1 Iq23 is a recurrent target of chromosomal translocations in acute leukemias, particularly prevalent in infant leukemias and treatment-related secondary leukemias, and associated with dismal prognosis. " Reciprocal translocations associated with the MLL gene result in in-frame fusion transcripts with various partner genes from at least 50 distinct gene loci. In addition, a partial tandem duplication of the MLL gene has been reported.5
The class of oncogenic MLL fusion proteins consists of the N-terminal portion of the MLL protein fused to C-terminal portions of a fusion partner. Experimental systems in which MLL fusion proteins were generated to induce leukemia in mice demonstrated that this fusion to a C-terminal partner is necessary for immortalization. Two critical regions within MLL were identified: a region with three AT hook DNA-binding motifs and the DNA methyltransferase homology region.6 The MLL fusion partners act via dominant gain of function and seem to play a role in two main functional categories, namely signaling molecules that normally localize to the cytoplasm/cell junctions or nuclear factors implicated in regulatory processes of transcription.7
With respect to the oncogenic activation of MLL in leukemia So et al. proposed two mechanisms. One subset of fusion partners already displays the required transcriptional activation potential required for leukemogenesis. The other subset acts via their homodimerization or oligodimerization domains and therefore can lead in a dimerization-dependent pathway to deregulated transcription.8
Interestingly, distinct MLL fusion partners suggest a possible role in the tropism of the leukemia. Certain partner proteins not only convert MLL to an oncogenic fusion protein but also direct the lineage susceptibility for transformation. MLL- AF4 expressing leukemias are mainly diagnosed as pro B ALL, whereas e.g. fusion partners AF9, AF6, or AFlO are common in myelomonocytic or monoblastic AML subtypes.9 High-density DNA-oligonucleotide microarrays simultaneously assess the abundance of thousands of messenger RNA transcripts.10 During the past few years powerful algorithms have been developed and adapted to mine microarray data." More recently also applications to interpret gene expression signatures in terms of pathways and networks have evolved. In this example, from a series of 363 acute leukemia patient samples hybridized to a set of high-density microarrays representing a near complete human genome, analyses were performed to (i) identify t(l Iq23)/MLL gene signatures compared to numerous specific subtypes of acute leukemias, (ii) discriminate t(l Iq23)/MLL positive AML from t(l Iq23)/MLL ALL samples, (iii) investigate signatures correlated to MLL- AF9 and other MLL partner genes (iv) decipher common biological networks.. More specifically, the analysis addressed how the differing MLL partner genes influence the global gene expression signature and whether pathways could be identified to explain the molecular determination of MLL leukemias occuring in both the myeloid and lymphoid lineages.
MATERIALS AND METHODS
Patient samples
This study included bone marrow samples from 363 adult acute leukemia patients at diagnosis representing distinct precursor B-ALL subtypes t(l Iq23)/MLL, t(8;14), t(9;22) and precursor T-ALL as well as AML subtypes with t(l Iq23)/MLL, t(8;21), t(l 5;17), inv(16), or complex aberrant karyotype (Tables 7-10). See also, Table 13, which provides ALL genes for classification, and Table 14, which lists AML genes for classification. All samples were received between December 1998 and February 2004 for reference diagnostics and were registered in a leukemia database.12 The samples were received either from local hospitals or by overnight mail. Prior to therapy all patients gave their informed consent for participation in the current evaluation after having been advised about the purpose and investigational nature of the study as well as of potential risks. The study design adhered to the declaration of Helsinki. The diagnosis was performed by an individual combination of cytomorphology, cytogenetics, fluorescence in situ hybridization (FISH), multiparameter-immunophenotyping and molecular genetics. In particular, a thorough characterization of the t(l Iq23)/MLL samples was warranted. Cytogenetic characterization, FISH on interphase nuclei and/or metaphases, and MLL fusion transcripts PCR detection was performed as previously described.
Gene expression profiling and data transformation Microarray analyses were performed as previously described utilizing the
GeneChip® System (Affymetrix, Inc., Santa Clara, CA, USA) and the HG-U 133 microarray set.13"16 This two-array set provides comprehensive coverage of well- substantiated genes in the human genome. It can be used to analyse the expression level of 39,000 transcripts and variants, including greater than 33,000 genes. The two arrays comprise more than 44,000 probe sets and 1 ,000,000 distinct oligonucleotide features. For gene expression profiling cell lysates of the leukemia samples were thawed, homogenized (QIAshredder, Qiagen), and total RNA was extracted (RNeasy Mini Kit, Qiagen). The subsequent target preparation steps as well as hybridization, washing and staining of the probe arrays were performed according to recommended protocols (Affymetrix Technical Manual). The
Affymetrix software package (Microarray Suite 5.0) extracted fluorescence intensities from each element on the microarrays as detected by confocal laser scanning.17 Detection calls (present, marginal, or absent) were determined by default parameters. Signal intensity values were calculated by scaling the raw data intensities to a common target intensity (U 133 mask file; TGT value: 5000).
Each human GeneChip expression array features 100 human maintenance genes that serve as a tool to normalise and scale the data before performing data comparisons. As recommended by the manufacturer, these 100 probe sets were used for normalization (on the world wide web at affymetrix. com/support/technical/mask_files.affx as of October 27, 2004). The minimal quality control parameters for inclusion of an expression profile in this analysis took into account more than 30% present calls (Ul 33 A microarray) and a low 375' ratio of represented glyceraldehyde-3'-phosphate dehydrogenase gene (GAPDH) probe sets.
Statistical methods
For supervised statistical analyses samples were accordingly grouped and for each disease entity differential genes were calculated by means oft-test-statistic (two- sample t-test, unequal variances).19 The software package R version 1.7.1 (http://www.r-project.org/) was applied. To address the multiple testing problem, false discovery rates (FDR) of genes were calculated according to Storey et al (additional description is provided below).20 The class prediction was performed using support vector machines (SVM),21 because there is evidence that SVM-based prediction slightly outperforms other classification techniques.22'23 SVM models were built with libsvm (on the world wide web at csie.ntu.edu.tw/~cjlin/libsvm/ as of October 27, 2004). Briefly, the complete data set was randomly split into respective training and independent test cohorts. Then differentially expressed genes were identified in the training set, and a learning model was built including the top differentially expressed genes. Using this approach, the algorithm learns to discriminate between the respective subtypes based on gene expression data in the given training patient cohort. Having learned the expression features of the classes, the algorithm could recognize and predict new samples as class members based on their expression patterns in the test cohort. The prediction accuracy was estimated by 10-fold cross-validation and assessed for robustness in a resampling approach (additional description is provided below). As an additional method to extract differentially expressed genes the SAM software program (MS Excel application) was used.24 Microarray signal intensities were transformed as described above and subsequently imputed into the software. A stringent cutoff for significance (tuning parameter delta) for <1 false positive rated gene was chosen.
The resulting gene expression data was visualized using hierarchical cluster analysis and principal component analysis (GeneMaths XT, Applied Maths, Belgium). For visualization of unsupervised data analyses a variation filter was applied. In order to remove probesets that demonstrated minimal variation across the complete data matrix was filter for standard variances and probes demonstrating the largest variance were selected for analysis.
Additional information on the false discovery rate
The false discovery rate is an accepted methodology to calculate statistical significance in microarray studies.64'65 A measure of statistical significance called the q value is associated with each tested feature taking automatically the fact into account, that thousands of genes are simultaneously being tested. The q value of a particular feature in a microarray data set is the expected proportion of false positives incurred when calling that feature significant.
Additional information on SVM-based classification
A Support Vector Machine (SVM) is a supervised learning algorithm developed over the past decade by Vapnik et al.66 and has also recently been used for gene expression data analysis.67"70 The SVM algorithm operates by mapping the given training set of samples into a possibly high-dimensional feature space and attempting to locate in that space a plane that separates the positive from the negative examples. Having found such a plane, the SVM can then predict the classification of an unlabeled example by mapping it into the feature space and asking on which side of the separating plane the example lies.
In this example, multi-class SVM classifiers were built with linear kernels based on class-specific genes using library LIBSVM version 2.36 (on the world wide web at csie.ntu.edu.tw/~cjlin/libsvm/ as of October 27, 2004). Apparent accuracy of the complete data set was estimated by 1 Ofold cross validation. This means that the data set was divided into 10, balanced by diagnosis, equally sized subsets, an SVM-model was trained for 9 subsets and predictions were generated for the remaining subset. This training / prediction process was repeated 10 times to include predictions for each subset. Apparent accuracy is the overall rate of correct predictions. Sensitivity and specificity were calculated as follows:
- Sensitivity = (number of positive samples predicted)/(number of true positives)
- Specificity = (number of negative samples predicted)/(number of true negatives)
A resampling approach was applied to assess robustness of class prediction: The data set was randomly, but balanced by the respective subtypes, split into a training set, consisting of two thirds of samples, and an independent test set with the remaining one third. Differentially expressed genes were identified in the training set in a one-versus-all (OVA) approach (t-test-statistic), an SVM-model was built from the training set and predictions were made in the test set. This complete process was repeated 100 times. By this means 95% confidence intervals were estimated for accuracy, sensitivity and specificity.
Biological networks analysis
Biological networks were generated through the use of Ingenuity Pathways Analysis (January 2004 release version), a web-delivered application that generates networks using differentially expressed genes from expression array data analyses. Networks were generated addressing two different questions, (i) discrimination of t(l Iq23)/MLL from other genetically defined acute leukemia subtypes, and (ii) discrimination of ALL with t(l Iq23)/MLL from AML with t(l Iq23)/MLL samples.
The networks are displayed graphically as nodes (genes/gene products) and edges (the biological relationships between the nodes). The intensity of the node color indicates the degree of up- (green) or down- (red) regulation. As described in the legends below, nodes are displayed using various shapes that represent the functional class of the gene product. Edges are displayed with various labels that describe the nature of the relationship between the nodes (e.g., B for binding, T for transcription). The length of an edge reflects the evidence supporting that node-to- node relationship, in that edges supported by more articles from the literature are shorter. Details relating to Ingenuity Pathways Analysis are also available on the world wide web at ingenuity.com as of 11/4/2004. In addition, Figure 1 is a schematic that provides a biological network node shape description, Figure 2 is a schematic that provides biological network edge labels, and Figure 3 is a schematic that shows biological network edge types.
A) Discrimination of t(llq23)/MLL from other acute leukemia subgroups
First, biological networks were generated that were based on genes discriminating t(l Iq23)/MLL samples from other distinct acute leukemia subclasses. Here t(l Iq23)/MLL samples from both myeloid and lymphoblastic leukemias were combined into one single group. Differentially expressed genes were identified between t(l Iq23)/MLL and all other classes, i.e. AML with t(8;21), inv(16), t(15;17), or complex chromosomal aberrations and distinct precursor B-ALL subtypes with t(8;14), t(9;22), or precursor T-ALL, in a supervised analysis approach (OVA, one-versus all). Statistically significant probe sets were identified and further filtered for probe sets demonstrating a 1.5 fold cut-off (both up- and downregulated). In doing so, a total number of n=193 upregulated probe sets (i.e. higher expression in t(l Iq23)/MLL samples) and n=l,194 downregulated probe sets (i.e. lower expression in t(l Iq23)/MLL samples) was prepared for upload into the pathway application. A data set containing the n=l,387 gene identifiers in probe set format and their corresponding fold change characteristic was uploaded as a tab-delimited text file into the Ingenuity Pathways Knowledge Base. Then each probe set was automatically mapped to its corresponding data base gene object to designate focus genes. Focus genes are genes from the analysis input data file that meet both of the following criteria: These genes have been designated as being of interest, i.e. discriminating t(l Iq23)/MLL samples statistically significant from other acute leukemia subclasses. Additionally, they directly interact with other genes (non-focus genes) in the Ingenuity global molecular network, which consists of direct physical, enzymatic, and transcriptional interactions between mammalian orthologs from the published, peer-reviewed content in Ingenuity's Pathways Knowledge Base (IPKB). A total number of n=402 focus genes were used as the starting point for generating biological networks. To start building the networks, the application queries the Ingenuity Pathways Knowledge Base for interactions between focus genes and all other gene objects stored in the knowledge base, and generates a set of networks with a network size of 35 genes/gene products. The application then computes a score for each network according to the fit of the user's set of significant genes. The score is derived from a p-value and indicates the likelihood of the focus genes in a network being found together due to random chance. A score of 2 indicates that there is a 1 in 100 chance that the focus genes are together in a network due to random chance. Therefore, scores of 2 or higher have at least a 99% confidence of not being generated by random chance alone. Biological functions are then calculated and assigned to each network. Four networks were further closer evaluated (see, Figure 4). The networks are graphically presented in Figures 5-8, respectively. Additionally, information on differentially expressed genes as well as gene expression signal intensities for all genes included in the four networks are given in Table 10. B) Discrimination of ALL with tfllq23VMLL from AML with t(llq23VMLL
Biological networks were generated that are based on genes discriminating ALL with t(l Iq23)/MLL samples from AML with t(l Iq23)/MLL. In this analysis, ALL with t(l 1 q23)/MLL samples were compared against AML with t(l 1 q23)/MLL samples using a supervised approach and differentially expressed genes were identified. Statistically significant probe sets were exported and further filtered for probe sets demonstrating a 2.0 fold cut-off (both up- and downregulated). In doing so, a total number of n=430 upregulated probe sets (i.e. higher expression in ALL with t(l Iq23)/MLL samples) and n=l,038 downregulated probe sets (i.e. lower expression in ALL with t(l Iq23)/MLL samples) was prepared for upload into the pathway application. A data set containing the n=l,468 gene identifiers in probe set format and their corresponding fold change characteristic was uploaded as a tab- delimited text file into the Ingenuity Pathways Knowledge Base. Then each probe set was automatically mapped to its corresponding data base gene object to designate focus genes. Focus genes are genes from the analysis input data file that meet both of the following criteria: These genes have been designated as being of interest, i.e. discriminating ALL with t(l Iq23)/MLL samples statistically significant from AML with t(l Iq23)/MLL. Additionally, they directly interact with other genes (non-focus genes) in the Ingenuity global molecular network, which consists of direct physical, enzymatic, and transcriptional interactions between mammalian orthologs from the published, peer-reviewed content in Ingenuity's Pathways Knowledge Base (IPKB). A total number of n=416 focus genes were used as the starting point for generating biological networks. Eight networks were further closer evaluated (see, Figure 9). The networks are graphically presented in
Figures 10-17, respectively. In addition, gene expression signal intensities for all genes included in the 8 networks are given in Table 1.
Supporting data sets
Data set 1: This data set contains the data provided in Tables 7-10. The differentially expressed genes depicted in the tables are listed according to the corresponding Affymetrix probe set identifier, fold change, q-value, and t-test statistic, respectively. Table 7: Detailed information on the t(l lq23)/MLL patient samples (age, sex, MLL translocation partner, immunophenotype, karyotype) Table 8: Top 50 lower/higher expressed genes in ALL with t(l Iq23)/MLL compared to precursor B-ALL cases with t(9;22), t(8;14), and precursor T- ALL
Table 9: Top 50 lower/higher expressed genes in AML with t(l Iq23)/MLL compared to AML with t(8;21), t(15;17), inv(16), and samples with complex aberrant karyotypes
Table 10: Top 50 lower/higher expressed genes in t(l Iq23)/MLL leukemias (ALL and AML) compared to precursor B-ALL cases with t(9;22), t(8;14), and precursor T-ALL as well as AML with t(8;21), t(15;17), inv(16), and samples with complex aberrant karyotypes.
Data set 2:
This data set is supporting the networks visualizing genes distinguishing t(l Iq23)/MLL leukemias from other acute leukemia subtypes. It contains gene expression information on all genes depicted in one of the four t(l Iq23)/MLL specific networks (see, Figure 4). Values in the columns reflect signal intensities and a call of "Present", "Absent", or "Marginal" to each probe set. This corresponds to Tables 2-6, which provide the raw expression intensities of the genes contained in the networks (termed as MLL targets).
Data set 3:
This data set is supporting the networks visualizing differentially expressed genes between ALL with t(l Iq23)/MLL and AML with t(l 1 q23)/MLL. It contains gene expression information on all genes depicted in one of the eight t(l Iq23)/MLL specific networks (see, Figure 9). Values in the columns reflect signal intensities and a call of "Present", "Absent", or "Marginal" to each probe set. This corresponds to Table 1.
RESULTS
Distinct gene expression signatures in t(llq23)/MLL leukemias The expression profiles of all 73 adult t(l Iq23)/MLL positive samples (n=25 ALL and n=48 AML with t(l Iq23)/MLL) were compared against 204 adult myeloid and 85 lymphoblastic leukemia samples with other defined genetic aberrations. In a supervised data analysis approach a robust set of differentially expressed genes was identified which accurately stratified the samples according to their underlying cytogenetic and immunophenotypic characteristics, i.e. myeloid subclasses, precursor B-lineage or precursor T-lineage ALL. More specifically, for lymphoblastic leukemias, t(l Iq23)/MLL samples (n=25) were accurately separated from precursor B-ALL cases with t(9;22) (n=42), t(8;14) (n=12), and precursor T- ALL (n=32). Figure 18A displays a principal component analysis of 111 ALL samples based on the differential expression of 262 genes (Table 13). When projected into the expression space of these informative genes, the four distinct
ALL subclasses accurately cluster together. The top 50 genes with higher expression or lower expression, respectively, in ALL with t(l Iq23)/MLL are given in the Table 8.
Likewise, by use of the differential expression of 416 genes, the 252 AML samples could accurately be stratified (Table 14). Specific patterns in gene expression were correlated with t(l lq23)/MLL (n=48), t(8;21) (n=38), t(15;17) (n=42), inv(16) (n=49), and AML samples with complex aberrant karyotypes (n=75). This finding is also visualized by a principal component analysis (Figure 18B). The top 50 genes with higher expression or lower expression, respectively, in AML with t(l lq23)/MLL are given in Table 9.
Thus, in both types of acute leukemias, t(l Iq23)/MLL positive samples are clearly distinct from other subtypes of same cell lineage, i.e. myeloid or lymphoblastic. They have a characteristic underlying expression signature compared to other distinct acute leukemia subclasses. Subsequently, all samples were included into one comprehensive analysis. A supervised data analysis algorithm was applied to identify genes that separate each of the nine subtypes from the remaining classes. As shown in Figure 19, the nine distinct acute leukemia subtypes can accordingly be separated. The hierarchical clustering algorithm identified common expression signatures and orders the patient samples accurately by similarities. Interestingly, t(l Iq23)/MLL positive samples are not found to cluster together but rather according to the lineage they are derived from, i.e. a lymphoblastic t(l Iq23)/MLL cluster and a myeloid t(l Iq23)/MLL cluster can be observed. In the top dendrogram ALL samples with t(l Iq23)/MLL are grouped next to ALL with t(9;22) and t(8;14), and AML with t(l Iq23)/MLL are grouped next to AML with t(15;17) or AML with t(8;21) cases.
Common MLL target genes
In order to identify common MLL target genes both types of t(l Iq23)/MLL leukemias were grouped together and were compared to the various types of precursor B- and T-lineage ALLs as well as to other cytogenetically defined AML subtypes. In doing so, a set of differentially expressed genes specifically associated with t(l 1 q23)/MLL leukemias was specified. When this set of genes was inputted into network analysis software, a number of significant biological networks was calculated. As given in Figure 5 H0XA9 as well as MEISl show up as genes with higher expression in both t(l Iq23)/MLL leukemias. Other genes with higher expression in this network included NICAL and chromatin remodeling actor RUNX2. Downregulated genes included, for example, TNF-receptor superfamily members TNFRSFlOA and TNFRSFlOD, or MADHl, functioning downstream of TGF-beta receptor serine/threonine kinases. Three additional networks are available in Figures 6-8. They visualize networks containing other genes with known relationship with t(l Iq23)/MLL leukemias, e.g. HOXA cluster genes (H0XA5, HOXAlO), as well as the Hox coregulator PBX3, or the tyrosine kinase
FLTi. Other target genes with higher expression in t(l Iq23)/MLL leukemias included HIPl, so far associated with prostate cancer progression, proto-oncogene FRATl, TAFlB, playing a role in the tumori genesis of colorectal carcinomas, and ZFHXlB, a transcriptional corepressor. The top 50 genes with higher expression or lower expression, respectively, in both leukemias with t(l 1 q23)/MLL combined are given in Table 10.
Unsupervised hierarchical cluster analysis of MLL translocation positive acute leukemias
The analysis next addressed the question whether an unsupervised analysis including exclusively MLL gene rearranged leukemias was also able to distinguish between the different lineages they are derived from. Both a principal component analysis and a two-dimensional hierarchical cluster analysis of the 25 ALL and 48 AML with MLL gene translocation were performed. As demonstrated in Figure 20, panel A, although both types of acute leukemias are characterized by a MLL gene rearrangement, an unsupervised data analysis approach clearly separates the samples according to their hematopoietic lineage, i.e. myeloid or lymphoblastic origin. Moreover, given the dendrogram from the unsupervised hierarchical cluster analysis no clear subclustering of cases with identical MLL partner genes can be observed (Figure 20, panel B). In ALL with t(l Iq23)/MLL the MLL-ENL cases intercalate with the MLL- AF4 samples. In AML with t(l Iq23)/MLL no obvious structure, neither according to FAB criteria nor to the MLL partner genes can be observed. The MLL- AF6, MLL-AFl 0, MLL-ELL, as well as rare cases (MLL- p300, MLL-AFl 7, MLL-SMAPl, MLL-X) are intercalated between the MLL-AF9 samples. Thus, two independent unsupervised algorithms consistently separate MLL gene rearranged leukemias into ALL and AML subgroups but not with respect to the partner genes.
Supervised analysis to discriminate t(llq23VMLL translocation positive leukemias
The analysis next directly compared expression signatures of ALL with t(l Iq23)/MLL against AML with t(l Iq23)/MLL in a supervised algorithm. Among the differentially expressed genes, upregulated candidates in lymphoblastic t(l Iq23)/MLL leukemias demonstrated a dominant pattern according to B-lineage commitment. PAX5, the B-cell lineage specific activator was designated as one of the top-ranked differentially expressed genes. In line with this finding, PAX5 target genes BLK and CD19 could also be confirmed upregulated in ALL with t(l Iq23)/MLL by microarray analysis. An upregulated expression of IGHM (encoding the IgM heavy chain), VPREBl (surrogate light-chain, important for forming the pre-B cell receptor) and CD22 or CD79A further elucidates the B- lineage commitment of ALL with t(l Iq23)/MLL.
In addition, the list of differentially expressed genes was also inputted into a pathway analysis application. Various networks of functionally related genes were obtained (see the overview in Figure 9). In Figure 10, a biological network is represented. In this network, LEFl, a transcriptional regulator is connected to PAX5 and its target CD79A, which is included in the B-cell antigen receptor. These genes, as well as the transcriptional regulators MEF2A and TCF3 demonstrated a higher expression in ALL with t(l Iq23)/MLL profiles compared to AML with t(l Iq23)/MLL cases. Reversely, genes with higher expression in t(l Iq23)/MLL positive AML included the transcriptional acivator CEBPB, protein tyrosine kinase KIT, MADH2, a transcription factor binding protein and MITF, a transcriptional regulator.
Seven additional networks are provided in Figures 11-17. They visualize networks containing genes that further separate t(l Iq23)/MLL leukemias. A myeloid commitment through higher expression in AML with t(l Iq23)/MLL could be demonstrated by differential expression of CEBPA (CCAAT/enhancer binding protein-alpha), a transcription factor required for differentiation of myeloid progenitors, as well as SPIl (PU.1), a critical player in myeloid development, or GM-CSFR, and G-CSFR genes.
Further interesting differentially expressed candidate genes with higher expression in t(l Iq23)/MLL positive ALL include BCLIlA, also involved in lymphoid malignancies, transcription regulator ETS2, chromatin binding proteins CBX2 and CBX4, and early B cell factor EBF, which can restrict lymphopoiesis to the B cell lineage and works in concert with PAX5 to activate genes required for B cell differentiation. The supplementary networks also contain other differentially expressed genes with higher expression in t(l Iq23)/MLL positive AML. For example, FES, a tyrosine kinase oncogene, MNDA, encoding the myeloid cell nuclear differentiation antigen, and CITED4, a CBP/p300-interacting transcriptional transactivator are significantly higher expressed. Also, a different repertoire of expression of suppressors of cytokine signaling (SOCS) family members as well as members of the tumor necrosis factor superfamily could be observed.
Influence of MLL translocation partners on the gene expression signatures
In the cohort of AML and t(l Iq23)/MLL samples, the group of t(9;l 1) positive cases (n=23) was compared to non-t(9;l 1) positive samples (n=25). Neither supervised nor unsupervised analyses revealed a specific expression signature associated with the MLL translocation partner AF9. In Figure 21, SAM plots demonstrate that compared to the previous analysis of ALL with t(l Iq23)/MLL vs. AML with t(l Iq23)/MLL no significantly differentially expressed genes clearly correlate to the MLL-AF9 translocation (left plot). The q-values of the top differentially expressed genes ranged between 0.75 and 0.82, i.e. calling this set of genes significant would result in a false discovery rate (FDR) of > 75%. For comparison, a very high number of differentially expressed genes can be identified when comparing ALL with t(l Iq23)/MLL versus AML with t(l Iq23)/MLL (right plot).
Furthermore, as demonstrated in Figure 22, the unsupervised data analysis approach including all t(l Iq23)/MLL samples did also not reveal any specific patterns associated with the distinct MLL partner genes. It is interesting to note that MLL-ENL samples, included both in the AML and ALL patient cohorts are separated. Four ALL cases with MLL-ENL intercalate with the MLL- AF4 samples, two AML with MLL-ENL samples are distributed between the various cases in the AML cluster.
A more detailed analysis then aimed at mining the data supervised for differential gene expression between various MLL partner genes and the robustness of the gene expression patterns was addressed with a classification algorithm. Here, six groups of MLL patient samples were included: AML cases with t(9;l 1)/MLL-AF9 (n=23), t(6; 11 )/MLL-AF6 (n=7), t( 10; 11 )/MLL- AF 10 (n=4) and t( 11 ; 19)/MLL-ELL cases
(n=3), as well as ALL samples with t(4;l l)/MLL-AF4 (n=21) and t(l 1 ;19)/MLL- ENL (n=4). In this data set no statistically significant expression signatures were found to be specifically correlated to one of the distinct partner genes. Predicting the respective partner gene based on differential gene expression signatures was approached using Support Vector Machines (SVM). The complete data set was randomly, but balanced for the six different subgroups split into a training cohort and an independent test cohort. Then differentially expressed genes were identified in the training set, calculated by means of t-test-statistic, and a SVM model was built based on the top 100 genes that demonstrate differential expression between the respective subclasses in the training set. This SVM model was used to predict samples in the test cohort. Table 11 represents a confusion matrix of MLL subgroup predictions based on their gene expression signature using a 10-fold crossvalidation approach (9/10 for training and 1/10 for testing; 10 iterations so that each sample is classified once). It can be observed that the classifier is good at predicting the MLL partner genes AF9 and AF4, the two major groups in the AML and ALL patient cohorts, respectively. Other partner genes are not accurately identified. The misclassifications mainly occur in the corresponding myeloid or lymphoblastic compartment. For example, of n=21 MLL- AF4 samples, twenty are accurately identified and one sample is classified as MLL-ENL. Likewise, MLL-AFlO or MLL- AF 6 samples are classified as MLL- AF9 samples. Thus, there is only a strong correlation with the lineage the MLL leukemias are derived from. The gene expression profile does not support the hypothesis of a clear distinct signature associated with one of the various partner genes that can interact with the MLL gene.
TABLE 11. MLL PARTNER GENE CONFUSION MATRIX DETERMINED BY 10-FOLD CROSS VALIDATION.
Figure imgf000083_0001
Note, the matrix shows the predicted MLL fusion partner gene. Misclassified samples are given by bold letters.
In order to assess the robustness of partner gene prediction a resampling approach was applied, i.e. the complete SVM classification procedure was repeated for 100 times. The training set included 2/3 of patients, the test set 1/3, respectively. Here, the test set for each of the 100 runs included 20 samples which were randomly chosen from the total patient cohort to include 1 MLL-AFlO, 2 MLL- AF6, 8 MLL- AF9, 1 MLL-ELL, 7 MLL-AF4, and 1 MLL-ENL sample, respectively. Given the differential gene expression mainly the MLL partner genes AF9 and AF4, dominating the patient cohort, are given correct class labels by the classification algorithm (Table 12). For example, 7 MLL-AF4 samples have been predicted by the algorithm 700 times (each sample 100 times). Of the 700 predictions the class label MLL-AF4 has been given correctly 659 times, i.e. on average 6.59 per run. In 9 individual predictions, a MLL-AF4 sample has been predicted as MLL- AF9, in 1 prediction as MLL-ELL, and in 31 predictions as MLL-ENL, respectively.
TABLE 12. MLL PARTNER GENE CONFUSION MATRIX DETERMINED
BY RESAMPLING.
Figure imgf000084_0001
Note, the matrix shows the predicted MLL fusion partner gene as determined after 100 runs of SVM-based classifications. Misclassified samples are given by bold letters. Average numbers of predictions per run are given.
DISCUSSION
Recent studies established the use of microarray technology to classify known hematological malignancies, as well as to discover novel subtypes and to identify genetic differences associated with distinct prognostic subgroups.25" 7 In pediatric and adult acute leukemias distinct gene expression signatures were correlated to t(l Iq23)/MLL positive cases.14'15'28"33 Using both a larger cohort of patients and an up-to-date microarray design, this analysis confirmed data that AML subtypes carrying the specific balanced chromosomal aberrations t(8;21), t(15;17), and inv(16) demonstrate highly characteristic gene expression signatures. Furthermore, this analysis was extended to include AML cases with MLL gene rearrangements. The AML with t(l Iq23)/MLL, representing an entity conform with the current
WHO classification scheme distinct from the prognostically favourable AML subtypes, can also be associated with a distinct expression signature. More recently, similar t(l Iq23)/MLL signatures have also been confirmed by cDNA microarrays, an alternative microarray platform.29 Four differing adult ALL subtypes were also analyzed. Precursor B-ALL with t(l Iq23)/MLL, t(9;22), or t(8;14) and precursor T-ALL all form distinct clusters in various data analysis approaches which reflect their highly differing underlying gene expression profiles. This is in line with previous reports showing that pediatric and adult ALL with t(l Iq23)/MLL, t(9;22), or precursor T-ALL samples, respectively, can be separated and also predicted with high accuracies using microarray technology.14'3 ' 3 This analysis demonstrated that in a comprehensive analysis including numerous classes of defined acute leukemia subtypes t(l Iq23)/MLL patient samples were distinct. As such the analysis added important evidence to the finding that MLL gene rearranged leukemias can accurately be characterized by gene expression profiling and microarrays would further allow the identification of MLL target genes and associations with distinct translocation partner genes.
This study further aimed at identifying common targets of MLL chimeric fusion genes. In order to designate common target genes both types of acute leukemias with MLL translocations were combined and were compared to various types of other precursor B- and T-lineage ALLs as well as to other cytogenetically defined
AML subtypes. This supervised analysis of the global expression data resulted in a list of statistical significant differentially expressed genes irrespective of lineage. A closer examination of these genes showed that a significantly overexpressed "Hox code" was detectable, i.e. overexpression of HOX-A cluster members.34 Other genes with higher expression in t(l Iq23)/MLL leukemias have also been previously reported to be implicated in MLL gene related leukemogenesis, i.e. MEISl, and PAO.35'36
However, here it could further be demonstrated how the t(l Iq23)/MLL leukemia- associated genes are related to each other in a novel constellation. As given in the respective networks consistently upregulated candidates with oncogenic potential included for example RUNX2, HIPl, FRATl, TAFlB and ZFHXL RUNX2 normally plays a key role in osteogenesis but also a direct oncogenic role had been proposed.37'38 HIPl encodes an endocytic protein with transforming properties that is involved in a cancer-causing translocation and which is overexpressed in a variety of human cancers.39 Proto-oncogene FRATl represents the human homologue to mouse proto-oncogene Fratl, which promotes carcinogenesis through activation of the Wnt/beta-catenin/TCF signaling pathway.40 TAFlB has been identified to play a role in the tumorigenesis of colorectal carcinomas with mi crosatellite instability.41 ZFHXl encoding Smad-interacting protein 1 (SIPl), directly represses E-cadherin gene transcription and activates cancer invasion via the upregulation of the matrix metalloproteinase gene family.42
Consistently downregulated genes in t(l lq23)/MLL leukemias included TNF- receptor superfamily members required in TRAIL-mediated apoptosis, TNFRSFlOA and TNFRSFlOD,43 or MADHl (SMADl), functioning downstream of TGF-beta receptor serine/threonine kinases.44 However, it only can be speculated whether the dysregulated expression of these genes confer any resistance to apoptotic stimuli.
The t(l Iq23)/MLL leukemias are generally associated with a high risk of treatment failure and therefore novel therapeutic strategies are needed to improve outcome in patients with 1 Iq23 abnormalities. Small molecule inhibitors of FLT3, a receptor tyrosine kinase, may prove to be beneficial.45 It can be speculated that beside the known mutations affecting the juxtamembrane region and receptor activation loop a constitutive FLT3 signaling caused by high level expression also contributes to the development and maintenance of MLL. In recent studies high levels of FLT3 expression in patients with MLL rearrangements have been identified and FLT3 successfully has been validated as a therapeutic target.28'46 One also can observe an overexpression of FLT3 in both t(l Iq23)/MLL leukemias compared to other acute leukemia classes (see, e.g., Figure 6).
The analysis further demonstrated that ALL and AML cases with t(l Iq23) segregate according the lineage they are derived from, i.e. myeloid, or lymphoblastic, respectively. In unsupervised data analyses the cases with MLL gene translocations did not cluster as a unique subgroup, but instead clustered according to their lineage of origin. Therefore, it is proposed that MLL aberrations lead to specific expression signatures but that there is a clear identification of lymphoblastic lineage commitment for ALL with t(l Iq23)/MLL. This seems to be in contrast with the previously reported finding that MLL positive leukemias are unique and should be constituted as a distinct disease. In contrast, it can now could be demonstrated that this cellular differentiation can be explained by a transcriptional program and further elucidated this through the use of biological network analysis. Among the top ranked differentially expressed genes to discriminate ALL and AML cases with t(l Iq23) PAX5 was represented. PAX5 restricts the developmental options of lymphoid progenitors to the B cell lineage by repressing the transcription of lineage-inappropriate genes and simultaneously activating the expression of B-lymphoid signaling molecules. Its influence can also be followed more downstream when the analysis focused on PAX5 target genes, also included in the list of top ranked differential genes. It is known that e.g. BLK, or CD 19 are controlled by PAX5. As visualized in the respective biological networks, these and other B-lineage characteristic candidates {CD79A, VPREBl, CD22) were grouped together, all with higher expression in MLL gene rearranged ALL compared to AML samples. Interestingly, not only PAX5 but also EBF a second essential regulator of early B cell development was higher expressed in ALL with t(l Iq23)/MLL. Specific activities of these proteins include roles in chromatin remodeling and recruitment of partner proteins.48 Taken together, a multitude of genes visualized a strong B-lineage commitment in lymphoblastic t(l Iq23)/MLL leukemias.
With respect to AML with t(l Iq23)/MLL, in another network a transcriptional pattern for myeloid commitment was represented through the higher expression of key players in myeloid development, CEBPA and SPIl. The finding that C/EBPalpha binds and activates the endogenous PU.1 gene in myeloid cells further contributes to the specification of myeloid progenitors.49 Also genes encoding the receptors for granulocyte/macrophage colony-stimulating factor (GM-CSFR) and granulocyte colony-stimulating factor (G-CSFR) clearly underline a completely differing transcriptional program since it has been suggested that G-CSFR signals may play a role in directing the commitment of primitive hematopoietic progenitors to the common myeloid lineage.50 Also, the down-regulation of GM-CSFR represents a critical event in producing cells with a lymphoid-restricted lineage potential.51 Other differentially expressed genes with higher expression in t(l Iq23)/MLL positive AML included for example FES, a tyrosine kinase oncogene, implicated in signaling downstream from hematopoietic cytokines.52 FES may be a key component of the granulocyte differentiation machinery and contributes to lineage determination at the level of multi-lineage hematopoietic progenitors as well as the more committed granulo-monocytic progenitors.53 Another gene which may be involved in myeloid differentiation is MNDA, encoding the myeloid cell nuclear differentiation antigen.54 It is expressed exclusively in maturing myeloid cells and cell lines, and is not expressed in lymphoid cells. Recent data suggest that there is a strong correlation between MNDA expression and myeloid differentiation.55 Here, MNDA expression further elucidates the myeloid lineage specificity in t(l Iq23)/MLL positive AML. Lastly, CITED4, a CBP/p300-interacting transcriptional transactivator is significantly higher expressed in AML with t(l Iq23)/MLL.56 It may function as a co-activator for transcription factor AP-2 and possible roles for CITED4 in regulation of gene expression during development and differentiation of blood cells have been implied.57 Moreover, an exploration of the biological networks identified in this analysis may provide new insights into the altered biology of these leukemias and may lead to useful target genes for follow-up experiments. Interesting candidates with higher expression in ALL with t(l Iq23)/MLL for subsequent experimentation include CBX2 (the homologue of the murine Polycomb-like gene M33) and CBX4 (hovel human Pc homolog, hPcI), both components of the chromatin-associated polycomb complex (PcG). Polycomb group (PcG) proteins assemble to form large multiprotein complexes are thought to repress their targets by modifying chromatin structure.58 It has been suggested that interference with CBX4 function can lead to derepression of proto-oncogene transcription and subsequently to cellular transformation.59
A major goal of this study was to directly assess the influence of the different MLL translocation partners on the transcriptional program in MLL leukemias. First, a supervised pairwise comparison of MLL-AF9 positive samples against MLL- AF9 negative samples in AML was performed. No statistically significant differences in their gene expression signatures were found. Using SAM plots in order to visualize the degree of differences in their gene expression pattern it was observed that within AML the MLL- AF9 positive samples were very similar compared to the MLL- AF9 negative samples. Furthermore, as demonstrated by an unsupervised data analysis no clear subclustering of t(9;l 1)/MLL-AF9 positive samples was observed. Instead of being distinct from other AML with differing MLL gene rearrangements global gene expression patterns of t(9;l 1)/MLL-AF9 intercalated with other AML with t(l Iq23)/MLL cases. This transcriptional concordance is an unexpected result. However, it would correlate with the observation of comparable clinical outcome in those subset of AML patients.3
When the algorithm was used to plot signatures of ALL with t(l Iq23)/MLL versus AML with t(l Iq23)/MLL their completely differing underlying transcriptional profile is visible. This repeatedly reflects the previous finding from the unsupervised two-dimensional hierarchical clustering where t(l Iq23)/MLL samples segregated according to their lineage of origin.
The analysis failed at identifying clearly differentially expressed genes when six different MLL partner genes, i.e. MLL-AF9, MLL- AF6, MLL-AFl 0, and MLL-
ELL in AML and MLL- AF4 as well as MLL-ELL in ALL, respectively, were examined. At this step no statistically significant expression signatures were found to be specifically correlated to one of the distinct partner genes. This also explains the failure to predict the respective partner gene based on differential gene expression signatures using Support Vector Machines (SVM) as classification algorithm. It can be observed that the classifier is good at predicting the MLL partners AF9 and AF4. However, these sets of samples are the two major groups in the AML and ALL patient cohorts, respectively, and might bias the result. All other groups are not accurately identified. Misclassifications, however, occur only in the corresponding myeloid or lymphoblastic compartment, respectively. Given the presented data, the global gene expression profile analysis does not reveal a clear distinct pattern associated with one of the various partner genes in t(l Iq23)/MLL leukemias.
Although it has been shown that the gene expression profile of t(l Iq23)/MLL leukemias is dictated by the specific MLL-molecular lesion, further experiments are required to investigate why most of the MLL partner genes are strictly correlated with a specific leukemia subtype. Gene expression is determined not only by the available combination of transcription factors, but also by the structure of the local chromatin, which is the physiological substrate for all nuclear processes including transcription and recombination.47 Therefore, it can be speculated that at the time point of the chromosomal aberration the hematopoietic progenitor target cell already is committed to a myeloid or lymphoid lineage development. Given the differing chromatin structure and its accessibility to regulatory factors thus only certain genes would be suitable as fusion partner, e.g. AF4 in lymphoblastic leukemias, or AF9 in myeloid leukemias. On the other hand, if the progenitor target cell is not committed to a particular lineage the fusion partner might be able to contribute to cell-fate decisions. Then the different MLL fusion proteins would dictate the respective differentiation pathway by facilitating the establishment of lineage-specific gene expression programs. In the gene expression patterns described here a strong association of lymphoid commitment in ALL with t(l Iq23)/MLL was observed. The coexpression of PAX5, the critical B-lineage commitment factor that restricts the developmental options of early progenitors to the B cell pathway, and early B cell factor EBF in these samples suggests that the leukemogenic hit did occur in the earliest phase of B-lymphopoiesis. In contrast, AML with t(l Iq23)/MLL samples expressed key players for myeloid development. Interestingly, in the cohort, myeloid and lymphoblastic gene expression profiles of
MLL-ENL samples were separated. The t(l 1 ;19)(q23;pl3.1) chromosomal translocation fuses the gene encoding transcriptional elongation factor ELL to the MLL gene.60 Recent data indicates that neoplastic transformation by the MLL-ELL fusion protein is likely to result from aberrant transcriptional activation of MLL target genes.61 The clustering described here would further support a hypothesis of tumor tropism where the MLL-ENL fusion protein can no longer influence the differentiation pathway. As a consequence these data may explain that not the translocation partner gene but rather the cellular lineage are influencing the observed major changes in expression signatures in t(l Iq23)/MLL leukemias. Another hallmark of MLL gene associated leukemias is their frequency as chemotherapy-related leukemias.62 This was not in the focus of the presented analyses. However, in a recent study, a significant difference in outcome was demonstrated in AML with t(l Iq23)/MLL rearrangement between de novo and therapy-related cases.3 Therefore, future studies may also be directed to study gene expression profiles in these patient cohorts. Here, microarray analyses might help to further understand the biology in these leukemias that develop after a relatively short latent period after treatment of a primary malignancy and often follow the use of drugs that inhibit the activity of DNA-topoisomerase II. Differing transcriptomes between de novo and therapy-related cases may explain in part the even more unfavorable outcome of this AML subgroup. In conclusion, the results of this analysis underline, for example, that AML with t(l Iq23)/MLL and ALL with t(l Iq23)/MLL are distinct entities as proposed in the current WHO classification of hematological malignancies.63 Both subtypes share a distinct gene expression signature with upregulation of HOX genes but on the other hand vary substantially in the expression of genes determining the lymphoid or myeloid lineage. While a clear gene expression pattern with respect to the lineage was identified, a specific signature associated with the different MLL partner genes was not observed. Microarray technology demonstrated that based on a cohort of thoroughly characterized leukemia samples, expression signatures lead to a better understanding of biological features of these specific acute leukemia subtypes. Novel networks of candidate genes were depicted and may inspire follow-up studies to elucidate the events leading to these types of prognostically unfavorable acute leukemias and may be exploited to identify new therapeutic targets.
EXAMPLE 2: GENERAL MATERIALS. METHODS AND DEFINITIONS OF FUNCTIONAL ANNOTATIONS The methods section contains both information on statistical analyses used for identification of differentially expressed genes and detailed annotation data of identified microarray probe sets.
AFFYMETRIX PROBESET ANNOTATION
All annotation data of GeneChip® arrays are extracted from the NetAffx™ Analysis Center (available on the world wide web at affymetrix.com as of October
27, 2004). Files for U133 set arrays, including U133A and U133B microarrays are derived from the June 2003 release. The original publication refers to: Liu et al. (2003) "NetAffx: Affymetrix probe sets and annotations," Nucleic Acids Res. 31(l):82-6, which is incorporated by reference.
The sequence data are omitted due to their large size, and because they do not change, whereas the annotation data are updated periodically, for example new information on chromosomal location and functional annotation of the respective gene products. Sequence data are available to download in the NetAffx Download Center on the world wide web at affymetrix.com.
DATA FIELDS In the following section, the content of each field of the data files is described.
Microarray probe sets, for example, found to be differentially expressed between different types of leukemia samples are further described by additional information. The fields are of the following types:
1. GeneChip Array Information 2. Probe Design Information
3. Public Domain and Genomic References
1. GeneChip Array Information HG-Ul 33 ProbeSetJD:
HG-U 133 ProbeSetJD describes the probe set identifier. Examples are: 200007 _at, 20001 l_s_at,200012_x_at.
Sequence Type The Sequence Type indicates whether the sequence is an Exemplar, Consensus or
Control sequence. An Exemplar is a single nucleotide sequence taken directly from a public database. This sequence could be an mRNA or an expressed sequence tag (EST). A Consensus sequence is a nucleotide sequence assembled by
Affymetrix, based on one or more sequence taken from a public database.
Transcript ID: The cluster identification number with a sub-cluster identifier appended. Sequence Derived From: The accession number of the single sequence, or representative sequence on which the probe set is based. Refer to the "Sequence Source" field to determine the database used.
Sequence ID:
For Exemplar sequences: Public accession number or GenBank identifier. For
Consensus sequences: Affymetrix identification number or public accession number.
Sequence Source The database from which the sequence used to design this probe set was taken.
Examples are: GenBank®, RefSeq, UniGene, TIGR (annotations from The Institute for Genomic Research).
2. Public Domain and Genomic References Most of the data in this section is from the LocusLink and UniGene databases, and are annotations of the reference sequence on which the probe set is modeled.
Gene Symbol and Title: A gene symbol and a short title, when one is available. Such symbols are assigned by different organizations for different species. Affymetrix annotational data comes from the UniGene record. There is no indication which species-specific databank was used, but some of the possibilities include for example HUGO: The
Human Genome Organization.
MapLocation: The map location describes the chromosomal location when one is available.
Unigene Accession: UniGene accession number and cluster type. Cluster type can be "full length" or
"est", or "—" if unknown.
LocusLink: This information represents the LocusLink accession number. FuIl Length Ref. Sequences Indicates the references to multiple sequences in RefSeq. The field contains the ID and description for each entry, and there can be multiple entries per probeSet.
REFERENCES 1. Biondi A, Cimino G, Pieters R, Pui CH. Biological and therapeutic aspects of infant leukemia. Blood. 2000;96:24-33.
2. Pui CH, Relling MV, Downing JR. Acute lymphoblastic leukemia. N.Engl.J.Med. 2004;350: 1535-1548.
3. Schoch C, Schnittger S, Klaus M et al. AML with 1 Iq23/MLL abnormalities as defined by the WHO classification: incidence, partner chromosomes, FAB subtype, age distribution, and prognostic impact in an unselected series of 1897 cytogenetically analyzed AML cases. Blood. 2003;102:2395-2402.
4. Huret JL, Dessen P, Bernheim A. An atlas of chromosomes in hematological malignancies. Example: I lq23 and MLL partners. Leukemia. 2001;15:987-989. 5. Schnittger S, Kinkelin U, Schoch C et al. Screening for MLL tandem duplication in 387 unselected patients with AML identify a prognostically unfavorable subset of AML. Leukemia. 2000; 14:796-804.
6. Ernst P, Wang J, Korsmeyer SJ. The role of MLL in hematopoiesis and leukemia. Curr.Opin.Hematol. 2002;9:282-287. 7. Ayton PM, Cleary ML. Molecular mechanisms of leukemogenesis mediated by
MLL fusion proteins. Oncogene. 2001 ;20:5695-5707.
8. So CW, Cleary ML. Dimerization: A versatile switch for oncogenesis. Blood. 2004
9. Collins EC, Rabbitts TH. The promiscuous MLL gene links chromosomal translocations to cellular differentiation and tumour tropism. Trends MoI. Med.
2002;8:436-442.
10. Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat.Genet. 1999;21 :20-24. 11. Slonim DK. From patterns to pathways: gene expression data analysis comes of age. Nat.Genet. 2002;32 Suppl:502-508.
12. Dugas M, Schoch C, Schnittger S et al. A comprehensive leukemia database: integration of cytogenetics, molecular genetics and microarray data with clinical information, cytomorphology and immunophenotyping. Leukemia. 2001 ;15: 1805-
1810.
13. Kern W, Kohlmann A, Wuchter C et al. Correlation of protein expression and gene expression in acute leukemia. Cytometry. 2003;55B:29-36.
14. Kohlmann A, Schoch C, Schnittger S et al. Molecular characterization of acute leukemias by use of microarray technology. Genes Chromosomes. Cancer.
2003;37:396-405.
15. Kohlmann A, Schoch C, Schnittger S et al. Pediatric acute lymphoblastic leukemia (ALL) gene expression signatures classify an independent cohort of adult ALL patients. Leukemia. 2004;18:63-71. 16. Schoch C, Kohlmann A, Schnittger S et al. Acute myeloid leukemias with reciprocal rearrangements can be distinguished by specific gene expression profiles. Proc.Natl.Acad.Sci.U.S.A. 2002;99:10008-10013.
17. Hubbell E, Liu WM, Mei R. Robust estimators for expression analysis. Bioinformatics. 2002;l 8: 1585-1592. 18. Liu WM, Mei R, Di X et al. Analysis of high density expression microarrays with signed-rank call algorithms. Bioinformatics. 2002;l 8:1593-1599.
19. Yeang CH, Ramaswamy S, Tamayo P et al. Molecular classification of multiple tumor types. Bioinformatics. 2001 ;17 Suppl 1 :S316-S322.
20. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc.Natl.Acad.Sci.U.S.A. 2003;100:9440-9445.
21. Vapnik V. Statistical Learning Theory. Wiley. 1998;New York.
22. Furey TS, Cristianini N, Duffy N et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics. 2000;16:906-914. 23. Brown MP, Grundy WN, Lin D et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc.Natl.Acad.Sci.U.S.A. 2000;97:262-267.
24. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc.Natl.Acad.Sci.U.S.A. 2001 ;98:5116-5121.
25. Ebert BL, Golub TR. Genomic Approaches to Hematologic Malignancies. Blood. 2004;DOI 10.1182/blood-2004-01-0274
26. Grimwade D, Haferlach T. Gene-expression profiling in acute myeloid leukemia. N.EngU.Med. 2004;350: 1676-1678. 27. Haferlach T, Kohlmann A, Kern W et al. Gene expression profiling as a tool for the diagnosis of acute leukemias. Semin.Hematol. 2003 ;40:281-295.
28. Armstrong SA, Staunton JE, Silverman LB et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 2002;30:41-47. 29. Bullinger L, Dohner K, Bair E et al. Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N.EngU.Med. 2004;350:1605-1616.
30. Ross ME, Zhou X, Song G et al. Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. Blood. 2003; 102:2951-2959. 31. Rozovskaia T, Ravid-Amir O, Tillib S et al. Expression profiles of acute lymphoblastic and myeloblasts leukemias with ALL-I rearrangements. Proc.Natl.Acad.Sci.U.S.A. 2003;100:7853-7858.
32. VaIk PJ, Verhaak RG, Beijen MA et al. Prognostically useful gene-expression profiles in acute myeloid leukemia. N.EngU.Med. 2004;350:l 617-1628. 33. Yeoh EJ, Ross ME, Shurtleff SA et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002;l :133-143.
34. Kumar AR, Hudson WA, Chen W et al. Hoxa9 influences the phenotype but not the incidence of M11-AF9 fusion gene leukemia. Blood. 2004;103:1823-1828. 35. Rozovskaia T, Feinstein E, Mor O et al. Upregulation of Meisl and HoxA9 in acute lymphocytic leukemias with the t(4 : 11) abnormality. Oncogene. 2001;20:874-878.
36. Thorsteinsdottir U, Kroon E, Jerome L, Blasi F, Sauvageau G. Defining roles for HOX and MEISl genes in induction of acute myeloid leukemia. MoI. Cell Biol.
2001;21 :224-234.
37. Ito Y. Oncogenic potential of the RUNX gene family: 'overview1. Oncogene. 2004;23:4198-4208.
38. Stewart M, Terry A, Hu M et al. Pro viral insertions induce the expression of bone-specific isoforms of PEBP2alphaA (CBFAl ): evidence for a new myc collaborating oncogene. Proc.Natl.Acad.Sci. U.S.A. 1997;94:8646-8651.
39. Hyun TS, Ross TS. HIPl : trafficking roles and regulation of tumori genesis. Trends Mol.Med. 2004;10:194-199.
40. Saitoh T, Mine T, Katoh M. Molecular cloning and expression of proto- oncogene FRATl in human cancer. Int.J.Oncol. 2002;20:785-789.
41. Kim NG, Rhee H, Li LS et al. Identification of MARCKS, FLJl 1383 and TAFlB as putative novel target genes in colorectal carcinomas with microsatellite instability. Oncogene. 2002;21 :5081-5087.
42. Miyoshi A, Kitajima Y, Sumi K et al. Snail and SIPl increase cancer invasion by upregulating MMP family in hepatocellular carcinoma cells. BrJ. Cancer.
2004;90:1265-1273.
43. Almasan A, Ashkenazi A. Apo2L/TRAIL: apoptosis signaling, biology, and potential for cancer therapy. Cytokine Growth Factor Rev. 2003; 14:337-348.
44. ten Dijke P, Goumans MJ, Itoh F, Itoh S. Regulation of cell proliferation by Smad proteins. J.Cell Physiol. 2002; 191: 1-16.
45. Gilliland DG, Griffin JD. The roles of FLT3 in hematopoiesis and leukemia. Blood. 2002;100:1532-1542. 46. Armstrong SA, Kung AL, Mabon ME et al. Inhibition of FLT3 in MLL. Validation of a therapeutic target identified by gene expression based classification. Cancer Cell. 2OO3;3: 173-183.
47. Busslinger M. Transcriptional control of early B cell developmentl . Annu.Rev.Immunol. 2004;22:55-79.
48. Maier H, Hagman J. Roles of EBF and Pax-5 in B lineage commitment and development. S emin. Immunol. 2002;14:415-422.
49. Kummalue T, Friedman AD. Cross-talk between regulators of myeloid development: C/EBPalpha binds and activates the promoter of the PU.1 gene. J.Leukoc.Biol. 2003;74:464-470.
50. Richards MK, Liu F, Iwasaki H, Akashi K, Link DC. Pivotal role of granulocyte colony-stimulating factor in the development of progenitors in the common myeloid pathway. Blood. 2003;l 02:3562-3568.
51. Iwasaki-Arai J, Iwasaki H, Miyamoto T, Watanabe S, Akashi K. Enforced granulocyte/macrophage colony-stimulating factor signals do not support lymphopoiesis, but instruct lymphoid to myelomonocytic lineage conversion. J.Exp.Med. 2003;197:131 1-1322.
52. Sangrar W, Gao Y, Zirngibl RA, Scott ML, Greer PA. The fps/fes proto- oncogene regulates hematopoietic lineage output. Exp.Hematol. 2003;31 :1259- 1267.
53. Kim J, Ogata Y, Feldman RA. Fes tyrosine kinase promotes survival and terminal granulocyte differentiation of factor-dependent myeloid progenitors (32D) and activates lineage-specific transcription factors. J.Biol. Chem. 2003 ;278: 14978- 14984. 54. Cousar JB, Briggs RC. Expression of human myeloid cell nuclear differentiation antigen (MNDA) in acute leukemias. Leuk.Res. 1990; 14:915-920.
55. Asefa B, Klarmann KD, Copeland NG et al. The interferon-inducible p200 family of proteins: a perspective on their roles in cell cycle regulation and differentiation. Blood Cells Mol.Dis. 2004;32: 155-167. 56. Braganca J, Swingler T, Marques FI et al. Human CREB-binding protein/p300- interacting transactivator with ED-rich tail (CITED) 4, a new member of the CITED family, functions as a co-activator for transcription factor AP-2. J.Biol.Chem. 2002;277:8559-8565. 57. Yahata T, Takedatsu H, Dunwoodie SL et al. Cloning of mouse Cited4, a member of the CITED family p300/CBP-binding transcriptional coactivators: induced expression in mammary epithelial cells. Genomics. 2002;80:601-613.
58. Pirrotta V. Polycombing the genome: PcG, trxG, and chromatin silencing. Cell. 1998;93:333-336. 59. Satijn DP, Olson DJ, van d, V et al. Interference with the expression of a novel human polycomb protein, hPc2, results in cellular transformation and apoptosis. Mol.Cell Biol. 1997;17:6076-6086.
60. Rubnitz JE, Morrissey J, Savage PA, Cleary ML. ENL, the gene fused with HRX in t(l 1 ;19) leukemias, encodes a nuclear protein with transcriptional activation potential in lymphoid and myeloid cells. Blood. 1994;84: 1747- 1752.
61. DiMartino JF, Miller T, Ayton PM et al. A carboxy-terminal domain of ELL is required and sufficient for immortalization of myeloid progenitors by MLL-ELL. Blood. 2000;96:3887-3893.
62. Super HJ, McCabe NR, Thirman MJ et al. Rearrangements of the MLL gene in therapy-related acute myeloid leukemia in patients previously treated with agents targeting DNA-topoisomerase II. Blood. 1993;82:3705-3711.
63. Jaffe ES, Harris NL, Stein H, Vardiman JW. World Health Organization Classification of Tumours. Pathology and Genetics of Tumours of Haematopoietic and Lymphoid Tissues. IARC Press. 2001 ;Lyon. 64. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc.Natl.Acad.Sci.U.S.A. 2001 ;98:5116-5121.
65. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc.Natl.Acad.Sci.U.S.A. 2003;l 00:9440-9445.
66. Vapnik, V. Statistical Learning Theory. 1998. New York, Wiley. 67. Furey TS, Cristianini N, Duffy N et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics. 2000;16:906-914.
68. Brown MP, Grundy WN, Lin D et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc.Natl.Acad.Sci.U.S.A.
2000;97:262-267.
69. Yeoh EJ, Ross ME, Shurtleff SA et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002;1 :133-143. 70. Ross ME, Zhou X, Song G et al. Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. Blood. 2003;102:2951-2959.
71. Kohlmann A, Schoch C, Schnittger S et al. Pediatric acute lymphoblastic leukemia (ALL) gene expression signatures classify an independent cohort of adult ALL patients. Leukemia. 2004;18:63-71. While the foregoing invention has been described in some detail for purposes of clarity and understanding, it will be clear to one' skilled in the art from a reading of this disclosure that various changes in form and detail can be made without departing from the true scope of the invention. For example, all the techniques and apparatus described above can be used in various combinations. All publications, patents, patent applications, and/or other documents cited in this application are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication, patent, patent application, and/or other document were individually indicated to be incorporated by reference for all purposes.
Table 1 MLL-tineage networks
Figure imgf000101_0001
Table 1 MLL-lineagβ networks
Figure imgf000102_0001
Table 1 MLl-liπeage networks
Figure imgf000103_0001
Table 1 MLL-lmeagβ networks
Figure imgf000104_0001
Table 1 MLL-lmeage networks
Figure imgf000105_0001
Table 1 MLL-hneage networks
Figure imgf000106_0001
Table 1 MLL-lineage networks
Figure imgf000107_0001
Table 1 MLL-lineage networks
Figure imgf000108_0001
Table 1 MLL-linβagβ networks
Figure imgf000109_0001
Table 1 MLL-lineage networks
Figure imgf000110_0001
Table 1 MLL-lineage networks
Figure imgf000111_0001
Table 1 MLL-hneagβ networks
Figure imgf000112_0001
12
Table 1 MLL-hneage networks
Figure imgf000113_0001
Table 1 MLL-Ii neage networks
Figure imgf000114_0001
Table 1 MlL-lmeage networks
Figure imgf000115_0001
Table 1 MLL-lineage networks
Figure imgf000116_0001
Table 1 MLL-hneage networks
Figure imgf000117_0001
Table 1 MLL-lineagβ networks
Figure imgf000118_0001
18
Table 1 MLL-hneage networks
Figure imgf000119_0001
Table 1 MLHmeage networks
Figure imgf000120_0001
20
Table 1 MLL-hneagβ networks
Figure imgf000121_0001
Table 1 MLL-lineage networks
Figure imgf000122_0001
22
Table 1 MLL-lineage networks
Figure imgf000123_0001
23
Table 1 MLL-liπeagβ networks
Figure imgf000124_0001
Table 1 MLL-hπeage networks
Figure imgf000125_0001
Table 1 MLL-lineage networks
Figure imgf000126_0001
Table 1 MLL -lineage networks
Figure imgf000127_0001
27
Table 1 MLL-hneage networks
Figure imgf000128_0001
28
Table 1 MLL-hneage networks
Figure imgf000129_0001
29
Table 1 MLL-hπeage networks
Figure imgf000130_0001
30
Table 1 MLL-lineagθ networks
Figure imgf000131_0001
Table 1 MLL-lmβage networks
Figure imgf000132_0001
Table 1 MLL-hneage networks
Figure imgf000133_0001
33
Table 1 MLL lineage networks
Figure imgf000134_0001
Table 1 MLL-lineage networks
Figure imgf000135_0001
35
Table 1 MLL-lineagβ networks
Figure imgf000136_0001
Table 1 MLL-lineagβ networks
Figure imgf000137_0001
37
Table 1 MLL-lineagβ networks
Figure imgf000138_0001
Table 1 MLL-hneage networks
Figure imgf000139_0001
Table 1 MLL-hneagθ networks
Figure imgf000140_0001
Table 1 MLL-lmβaga networks
Figure imgf000141_0001
Table 1 MLL-lineage networks
Figure imgf000142_0001
42
Table 1 MLL-lmeage networks
Figure imgf000143_0001
43
Table 1 MLL-lineage networks
Figure imgf000144_0001
Table 1 MLL-lineagθ networks
Figure imgf000145_0001
Table 1 MLL-lmeage networks
Figure imgf000146_0001
46
Table 1 MLL-hneagβ networks
Figure imgf000147_0001
Table 1 MLL-lineage networks
Figure imgf000148_0001
48
Table 1 MLL lineage networks
Figure imgf000149_0001
Table 1 MLL-hnβage networks
Figure imgf000150_0001
50
Table 1 MLL-lineage networks
Figure imgf000151_0001
Table 1 MLL-lineage networks
Figure imgf000152_0001
52
Table 1 MLL-lineage networks
Figure imgf000153_0001
53
Table 1 MLL-hneagβ networks
Figure imgf000154_0001
Table 1 MLL-lineage networks
Figure imgf000155_0001
55
Table 1 MLL-lineage networks
Figure imgf000156_0001
Table 1 MLL-lineage networks
Figure imgf000157_0001
Table 1 MLL-lmeagβ networks
Figure imgf000158_0001
Tablβ 1 MLL -lineage networks
Figure imgf000159_0001
Table 1 MLL-hneagβ networks
Figure imgf000160_0001
Table 1 MLL-Imeage networks
Figure imgf000161_0001
61
Table 1 MLL-hneage networks
Figure imgf000162_0001
Table 1 MLL-lmeage networks
Figure imgf000163_0001
Table 1 MLL-hπeagβ networks
Figure imgf000164_0001
64
Table 1 MLL-lmeage networks
Figure imgf000165_0001
Table 1 MLL-linβage networks
Figure imgf000166_0001
Table 1 MLL-lineage networks
Figure imgf000167_0001
Table 1 MLL-lineaga networks
Figure imgf000168_0001
Table 1 MLL-hπeage networks
Figure imgf000169_0001
Table 1 MLL-lineage networks
Figure imgf000170_0001
Table 1 MLL-lmeage networks
Figure imgf000171_0001
Table 1 MLL-lineage networks
Figure imgf000172_0001
72
Table 1 MLL-hneage networks
Figure imgf000173_0001
Table 1 MLL-lineage networks
Figure imgf000174_0001
Tablθ 1 MLL-lmeage networks
Figure imgf000175_0001
Table 1 MLL-hπeage networks
Figure imgf000176_0001
Table 1 MLL-lineage networks
Figure imgf000177_0001
Table 1 MLL-liπeage networks
Figure imgf000178_0001
Table 1 MLL-tineage networks
Figure imgf000179_0001
79
Table 1 MLL-linβagθ networks
Figure imgf000180_0001
Table 1 MLL-lmeage networks
Figure imgf000181_0001
81
Table 1 MLL lineage networks
Figure imgf000182_0001
82
Table 1 MLL-liπeagβ networks
Figure imgf000183_0001
83
Table 1 MLL-lmeage networks
Figure imgf000184_0001
Table 1 MLL-lmeagβ networks
Figure imgf000185_0001
Table 1 MLL-linβagβ networks
Figure imgf000186_0001
Table 1 MLL-lineage networks
Figure imgf000187_0001
Table 1 MLL lineage networks
Figure imgf000188_0001
Table 1 MLL-hneage networks
Figure imgf000189_0001
Table 1 MLL-hneage networks
Figure imgf000190_0001
90
Table 1 MLL-lmeage networks
Figure imgf000191_0001
91
Table 1 MLL-lineage networks
Figure imgf000192_0001
Table 1 MLL-hneage networks
Figure imgf000193_0001
93
Table 1 MLL-hneage networks
Figure imgf000194_0001
Table 1 MLL-lmβagβ networks
Figure imgf000195_0001
Table 1 MLL-lineage networks
Figure imgf000196_0001
96
Table 1 MLL-lineage networks
Figure imgf000197_0001
Table 1 MLL-linβagβ networks
Figure imgf000198_0001
Table 1 MLL-lmeage networks
Figure imgf000199_0001
99
Table 1 MLL-hneage networks
Figure imgf000200_0001
100
Table 1 MLL-hneage networks
Figure imgf000201_0001
101
Table 1 MLL-lineage networks
Figure imgf000202_0001
Table 1 MLL-lineage networks
Figure imgf000203_0001
Table 1 MLL lineage networks
Figure imgf000204_0001
Table 1 MLL-lmeage networks
Figure imgf000205_0001
105
Table 1 MLL-linβagβ networks
Figure imgf000206_0001
Table 1 MLL-lineagβ networks
Figure imgf000207_0001
Table 1 MLL-hπeage networks
Figure imgf000208_0001
108
TablB 1 MLL-lmeage networks
Figure imgf000209_0001
109
Table 1 MLL-lineage networks
Figure imgf000210_0001
110
Table 1 MLL -lineage networks
Figure imgf000211_0001
Table 1 MLL-lineage networks
Figure imgf000212_0001
112
Table 1 MLL-lmeagβ networks
Figure imgf000213_0001
Table 1 MLL-lineage networks
Figure imgf000214_0001
Table 1 MLL-lineagβ networks
Figure imgf000215_0001
Table 1 MLL-lmeagβ networks
Figure imgf000216_0001
116
Table 1 MLL-lineage networks
Figure imgf000217_0001
Table 1 MLL-hneage networks
Figure imgf000218_0001
Table 1 MLL-hnβagβ networks
Figure imgf000219_0001
Table 1 MLL-hneagβ networks
Figure imgf000220_0001
120
Table 1 MLL -lineage networks
Figure imgf000221_0001
Table 1 MLL-lmeagβ networks
Figure imgf000222_0001
Table 1 MLL-lineage networks
Figure imgf000223_0001
Table 1 MLL-lineage networks
Figure imgf000224_0001
Table 1 MLL-lineage networks
Figure imgf000225_0001
Table 1 MLL-lineage networks
Figure imgf000226_0001
126
Table 1 f MLL-lineagβ networks
Figure imgf000227_0001
Table 1 MLL-hnβage networks
Figure imgf000228_0001
Table 1 MLL-lmeage networks
Figure imgf000229_0001
129
Table 1 MLL-linβage networks
Figure imgf000230_0001
Table 1 MLL-hπeagβ networks
Figure imgf000231_0001
Table 1 MLL-lineaga networks
Figure imgf000232_0001
Table 1 MLL-ltneagθ networks
Figure imgf000233_0001
Table 1 MLL-lineage networks
Figure imgf000234_0001
Table 1 MLL-hπeage networks
Figure imgf000235_0001
135
Table 1 MLL-lineage networks
Figure imgf000236_0001
Table 1 MLL-hneage networks
Figure imgf000237_0001
Table 1 MLL lineage networks
Figure imgf000238_0001
Table 1 MLL-hnβage networks
Figure imgf000239_0001
Table 1 MLL-lineagβ networks
Figure imgf000240_0001
Table 1 MLL-lmeagθ networks
Figure imgf000241_0001
Table 1 MLL lineage networks
Figure imgf000242_0001
Table 1 MLL-hnβsgθ networks
Figure imgf000243_0001
Table 1 MLL lineage networks
Figure imgf000244_0001
Table 1 MLL-lineage networks
Figure imgf000245_0001
145
Table 1 MLL-lineage networks
Figure imgf000246_0001
Tablβ 1 MLL-lineagβ networks
Figure imgf000247_0001
Table 1 MLL-hπeage networks
Figure imgf000248_0001
Table 1 MLL -lineage networks
Figure imgf000249_0001
149
Table 1 MLL-lmeagβ networks
Figure imgf000250_0001
150
Tablθ 1 MLL-lineagβ networks
Figure imgf000251_0001
Table 1 MLL-lineage networks
Figure imgf000252_0001
Table 1 MLL-lineagβ networks
Figure imgf000253_0001
Table 1 MLL-lineage networks
Figure imgf000254_0001
Table 1 MLL-lineagθ networks
Figure imgf000255_0001
155
Table 1 MLL-lineagβ networks
Figure imgf000256_0001
Table 1 MLL-lmeage networks
Figure imgf000257_0001
Table 1 MLL-liπeagθ networks
Figure imgf000258_0001
Table 1 MLL-lmβage networks
Figure imgf000259_0001
159
Table 1 MLL-linθage networks
Figure imgf000260_0001
Table 1 MLL-lmeage networks
Figure imgf000261_0001
Table 1 MLL-lmeage networks
Figure imgf000262_0001
Table 1 MLL-hπeage networks
Figure imgf000263_0001
Table 1 MLL-hnβagβ networks
Figure imgf000264_0001
Table 1 MLL-lineagβ networks
Figure imgf000265_0001
165
Tablθ 1 MLL-lineage networks
Figure imgf000266_0001
Table 1 MLL-lmβage networks
Figure imgf000267_0001
Table 1 MLL-lineage networks
Figure imgf000268_0001
168
Table 1 MLL-lineagβ networks
Figure imgf000269_0001
Table 1 MLL-hneage networks
Figure imgf000270_0001
Table 1 MLL-lmβage networks
Figure imgf000271_0001
171
Table 1 MLL-hneagθ networks
Figure imgf000272_0001
172
Table 1 MLL-lineagβ networks
Figure imgf000273_0001
173
Table 1 MLL-lineage networks
Figure imgf000274_0001
Table 1 MLL-lineage networks
Figure imgf000275_0001
175
Table 1 MLL-hneage networks
Figure imgf000276_0001
Table 1 MLL-tmeage networks
Figure imgf000277_0001
Table 1 MLL-hneage networks
Figure imgf000278_0001
Table 1 MLL-lineage networks
Figure imgf000279_0001
179
Table 1 MLL lineage networks
Figure imgf000280_0001
180
Table 1 MLL lineage networks
Figure imgf000281_0001
Table 1 MLL-lmeage networks
Figure imgf000282_0001
182
Table 1 MLL-hnβage networks
Figure imgf000283_0001
Table 1 MLL lineage networks
Figure imgf000284_0001
Table 1 MLL-hneagβ networks
Figure imgf000285_0001
Table 1 MLL-lineage networks
Figure imgf000286_0001
Table 1 MLL-lineage networks
Figure imgf000287_0001
187
Tablθ 1 MLL-hπeagθ networks
Figure imgf000288_0001
188
Table 1 MLL-lmeage networks
Figure imgf000289_0001
189
Table 1 MLL-lmeage networks
Figure imgf000290_0001
Table 1 MLL lineage networks
Figure imgf000291_0001
191
Table 1 MLL-lmeage networks
Figure imgf000292_0001
Table 1 MLL-liπeage networks
Figure imgf000293_0001
Table 1 MLL-liπeage networks
Figure imgf000294_0001
Table 1 MLL lineage networks
Figure imgf000295_0001
195
Table 1 MLL-lineage networks
Figure imgf000296_0001
196
Table 1 MLL-lineagβ networks
Figure imgf000297_0001
197
Table 1 MLL-lmeage networks
Figure imgf000298_0001
Table 1 MLL-lineage networks
Figure imgf000299_0001
199
Table 1 MLL-ltneage networks
Figure imgf000300_0001
200
Table 1 MLL-lineagβ networks
Figure imgf000301_0001
Table 1 MLL-lineage networks
Figure imgf000302_0001
Table 1 MLL-lmeagθ networks
Figure imgf000303_0001
Table 1 MLL-lineage networks
Figure imgf000304_0001
Table 1 MLL-lineage networks
Figure imgf000305_0001
Table 1 MLL-lineagβ networks
Figure imgf000306_0001
206
Table 1 MLL-Imeage networks
Figure imgf000307_0001
207
Table 1 MLL-lineage networks
Figure imgf000308_0001
Tablθ 1 MLL-lineagβ networks
Figure imgf000309_0001
209
Table 1 MLL-lmeage networks
Figure imgf000310_0001
Table 1 MLL-liπeage networks
Figure imgf000311_0001
Table 1 MLL-lmeage networks
Figure imgf000312_0001
212
Table 1 MLL-lmeage networks
Figure imgf000313_0001
Table 1 MLL-lmβagβ networks
Figure imgf000314_0001
Table 1 MLL-lineage networks
Figure imgf000315_0001
215
Table 1 MLL-lmeage networks
Figure imgf000316_0001
Tablβ 1 MLL-liπeagβ networks
Figure imgf000317_0001
Tablθ 1 MLL-hπeage networks
Figure imgf000318_0001
Table 1 MLL-lineage networks
Figure imgf000319_0001
Table 1 MLL-lineage networks
Figure imgf000320_0001
Table 1 MLL-iineage networks
Figure imgf000321_0001
Table 1 MLL-lineage networks
Figure imgf000322_0001
222
Tablθ 1 MLL-lmeagθ networks
Figure imgf000323_0001
Table 1 MLL-lineagβ networks
Figure imgf000324_0001
Table 1 MLL-lmeage networks
Figure imgf000325_0001
Table 1 MLL-tiπβagθ networks
Figure imgf000326_0001
Table 1 MLL-lmeage networks
Figure imgf000327_0001
Table 1 WLL-lineage networks
Figure imgf000328_0001
228
Table 1 MLL-hneagβ networks
Figure imgf000329_0001
Table 1 MLL-liπeagβ networks
Figure imgf000330_0001
230
Table 1 MLL-lineage networks
Figure imgf000331_0001
231
Table 1 MLL-lmeage networks
Figure imgf000332_0001
Table 1 MLL-lineage networks
Figure imgf000333_0001
233
Table 1 MLL-linβage networks
Figure imgf000334_0001
Table 1 MLL-hnβage networks
Figure imgf000335_0001
235
Table 1 MLL-lineage networks
Figure imgf000336_0001
Table 1 MLL-hneage networks
Figure imgf000337_0001
237
Table 1 MLL-lmeage networks
Figure imgf000338_0001
Table 1 MLL-lmeage networks
Figure imgf000339_0001
Table 1 MLL-lineage networks
Figure imgf000340_0001
Table 1 MLL-lineage networks
Figure imgf000341_0001
241
Table 1 MLL-linβagβ networks
Figure imgf000342_0001
Table 1 MLL-lineage networks
Figure imgf000343_0001
Table 1 MLL-lineage networks
Figure imgf000344_0001
Table 1 MLL-ltneagβ networks
Figure imgf000345_0001
245
Tablβ 1 MLL-linβage networks
Figure imgf000346_0001
246
Table 1 MLL-hneage networks
Figure imgf000347_0001
Table 1 MLL-lineagβ networks
Figure imgf000348_0001
248
Table 1 MLL -lineage networks
Figure imgf000349_0001
Table 1 MLL-lineagβ networks
Figure imgf000350_0001
Table 1 MLl-lineage networks
Figure imgf000351_0001
251
Table 1 MLL-hneage networks
Figure imgf000352_0001
252
Table 1 MLL-lmeagβ networks
Figure imgf000353_0001
253
Table 1 MLL-lineagβ networks
Figure imgf000354_0001
Table 1 MLL -lineage networks
Figure imgf000355_0001
Table 1 MLL-hneagθ networks
Figure imgf000356_0001
256
Table 1 MLL-hneage networks
Figure imgf000357_0001
Table 1 MLL-lmeagβ networks
Figure imgf000358_0001
258
Table 1 MLL-linβagβ networks
Figure imgf000359_0001
Table 1 MLL-hneage networks
Figure imgf000360_0001
260
Table 1 MLL-lineage networks
Figure imgf000361_0001
261
Table 1 MLL -lineage networks
Figure imgf000362_0001
262
Table 1 MLL lineage networks
Figure imgf000363_0001
Table 1 MLL-hneage networks
Figure imgf000364_0001
264
Table 1 MLL-lineagθ networks
Figure imgf000365_0001
Table 1 MLL-hπeage networks
Figure imgf000366_0001
Table 1 MLL-lineagβ networks
Figure imgf000367_0001
Table 1 MLL-hneage networks
Figure imgf000368_0001
Table 1 MLL-liπeage networks
Figure imgf000369_0001
Table 1 MLL-hneagθ networks
Figure imgf000370_0001
Table 1 MLL-lineage networks
Figure imgf000371_0001
271
Table 1 MLL-tmeage networks
Figure imgf000372_0001
Table 1 MLL-lineagβ networks
Figure imgf000373_0001
273
Table 1 MLL-hneagβ networks
Figure imgf000374_0001
Table 1 MLL-lmβage networks
Figure imgf000375_0001
275
Table 1 MLL-lirtθage networks
Figure imgf000376_0001
Table 1 MLL-hπeage networks
Figure imgf000377_0001
277
Table 1 MLL-lmeage networks
Figure imgf000378_0001
Table 1 MLL-linβagθ networks
Figure imgf000379_0001
279
Table 1 MLL-hneagβ networks
Figure imgf000380_0001
280
Table 1 MLL-hnβagβ networks
Figure imgf000381_0001
281
Table 1 MLL-lineagβ networks
Figure imgf000382_0001
Table 1 MLL -lineage networks
Figure imgf000383_0001
Table 1 MLL-lmeage networks
Figure imgf000384_0001
Table 1 MLL-lineage networks
Figure imgf000385_0001
Table 1 MLL-lineagβ networks
Figure imgf000386_0001
Table 1 MLL-hneage networks
Figure imgf000387_0001
287
Table 1 MLL-lmeage networks
Figure imgf000388_0001
288
Table 1 MLL-linβage networks
Figure imgf000389_0001
289
Table 2 t(11q23) samples
Figure imgf000390_0001
Table 2 t(11q23) samples
Figure imgf000391_0001
Table 2 1(11q23) samples
Figure imgf000392_0001
Table 2 t(11q23) samples
Figure imgf000393_0001
Table 2 t(11q23)samptes
Figure imgf000394_0001
Table 2 1(11q23) samples
Figure imgf000395_0001
Table 2 t(11q23) samples
Figure imgf000396_0001
Table 2 t(11q23) samples
Figure imgf000397_0001
Table 2 1(11q23) samples
Figure imgf000398_0001
Table 2 t(11q23) samples
Figure imgf000399_0001
Table 2 t(11q23) samples
Figure imgf000400_0001
Table 2 t(11q23) samples
Figure imgf000401_0001
Table 2 t(11q23)samples
Figure imgf000402_0001
Table 2 t(11q23)samplB5
Figure imgf000403_0001
Table 2 1(11q23) samples
Figure imgf000404_0001
Table 2 t(11q23) samples
Figure imgf000405_0001
Table 2 t(11q23)s3mplβs
Figure imgf000406_0001
Table 2 t(11q23) samples
Figure imgf000407_0001
Table 2 t(11q23) samples
Figure imgf000408_0001
Table 2 I(11q23) samples
Figure imgf000409_0001
Table 2 t(11q23) samples
Figure imgf000410_0001
Table 2 t(11q23) samples
Figure imgf000411_0001
Figure imgf000412_0001
Table 2 t(11q23)sarπples
Figure imgf000413_0001
Table 2 1(11q23) samples
Figure imgf000414_0001
Figure imgf000415_0001
Figure imgf000416_0001
Figure imgf000417_0001
Figure imgf000418_0001
Table 2 I(11q23) samples
Figure imgf000419_0001
Table 2 1(11q23) samples
Figure imgf000420_0001
Table 2 t(11q23) samples
Figure imgf000421_0001
Table 2 1(11q23) samples
Figure imgf000422_0001
Table 2 t(11q23) samples
Figure imgf000423_0001
Table 2 t{11q23) samples
Figure imgf000424_0001
Table 2 t(11q23) samples
Figure imgf000425_0001
Figure imgf000426_0001
Table 2 1(11q23) samples
Figure imgf000427_0001
Table 2 t(1tq23) samples
Figure imgf000428_0001
Table 2 I(11q23) samples
Figure imgf000429_0001
Table 2 t(11q23) samples
Figure imgf000430_0001
Table 2 t(11q23) samples
Figure imgf000431_0001
Table 2 t(11q23) samples
Figure imgf000432_0001
Figure imgf000433_0001
Table 2 1(11q23) samples
Figure imgf000434_0001
Table 2 t(11q23)samples
Figure imgf000435_0001
Table 2 t(11q23) samples
Figure imgf000436_0001
Figure imgf000437_0001
Table 2 1(11q23) samples
Figure imgf000438_0001
Table 2 1(11q23) samples
Figure imgf000439_0001
Table 2 t(11q23) samples
Figure imgf000440_0001
Table 2 t{11q23) samples
Figure imgf000441_0001
Table 2 t(11q23) samples
Figure imgf000442_0001
Figure imgf000443_0001
Table 2 t(11q23) samples
Figure imgf000444_0001
Table 2 t(11q23) samples
Figure imgf000445_0001
Table 2 I(11q23) samples
Figure imgf000446_0001
Figure imgf000447_0001
Table 2 I(11q23) samples
Figure imgf000448_0001
Table 2 1(11q23) samples
Figure imgf000449_0001
Table 2 t(11q23) samples
Figure imgf000450_0001
Table 2 1(11q23) samples
Figure imgf000451_0001
Table 2 t(11q23) samples
Figure imgf000452_0001
Table 2 t(11q23)samples
Figure imgf000453_0001
Table 2 I(11q23) samples
Figure imgf000454_0001
Table 2 t(11q23) samples
Figure imgf000455_0001
Table 2 I(11q23) samples
Figure imgf000456_0001
Figure imgf000457_0001
Table 2 t(11q23) samples
Figure imgf000458_0001
Table 2 I(11q23) samples
Figure imgf000459_0001
-459-
Figure imgf000460_0001
Table 2 t(11q23) samples
Figure imgf000461_0001
Table 2 t(11q23)samples
Figure imgf000462_0001
Table 2 t(11q23) samples
Figure imgf000463_0001
Table 2 t(11q23) samples
Figure imgf000464_0001
Table 2 1(11q23) samples
Figure imgf000465_0001
Table 2 t{11q23) samples
Figure imgf000466_0001
Table 2 1(11q23) samples
Figure imgf000467_0001
Table 2 t(11q23) samples
Figure imgf000468_0001
Table 2 t(11q23) samples
Figure imgf000469_0001
Table 2 t(11q23) samples
Figure imgf000470_0001
Table 2 t(11q23) samples
Figure imgf000471_0001
Figure imgf000472_0001
Table 2 t(11q23)samples
Figure imgf000473_0001
Table 2 I(11q23)sannples
Figure imgf000474_0001
Table 2 t(11q23)samples
Figure imgf000475_0001
Table 2 I(11q23) samples
Figure imgf000476_0001
Table 2 1(11q23) samples
Figure imgf000477_0001
Table 2 t(11q23) samples
Figure imgf000478_0001
Table 2 t(11q23) samples
Figure imgf000479_0001
Table 2 t(11q23) samples
Figure imgf000480_0001
Table 2 t(11q23) samples
Figure imgf000481_0001
Table 2 t(11q23) samples
Figure imgf000482_0001
Table 2 I(11q23)samples
Figure imgf000483_0001
Table 2 t(11q23) samples
Figure imgf000484_0001
Figure imgf000485_0001
Table 2 t(11q23) samples
Figure imgf000486_0001
Table 2 1(11q23) samples
Figure imgf000487_0001
Table 2 1(11q23) samples
Figure imgf000488_0001
Table 2 t{11q23)samples
Figure imgf000489_0001
Table 2 1(11q23) samples
Figure imgf000490_0001
Figure imgf000491_0001
Table 2 t(11q23) samples
Figure imgf000492_0001
Table 2 t(11q23) samples
Figure imgf000493_0001
Table 2 t(11q23) samples
Figure imgf000494_0001
Table 2 t(11q23) samples
Figure imgf000495_0001
TablG 2 t(11q23) samples
Figure imgf000496_0001
Table 2 t(11q23) samples
Figure imgf000497_0001
Figure imgf000498_0001
Table 2 t(11q23) samples
Figure imgf000499_0001
Table 2 t(11q23) samples
Figure imgf000500_0001
Table 2 1(11q23) samples
Figure imgf000501_0001
Table 2 t(11q23) samples
Figure imgf000502_0001
Table 2 1(11q23) samples
Figure imgf000503_0001
Table 2 t(11q23) samples
Figure imgf000504_0001
Table 2 1(1 Iq23) samples
Figure imgf000505_0001
Table 2 1(11q23) samples
Figure imgf000506_0001
Table 2 t(11q23) samples
Figure imgf000507_0001
Table 2 1(11q23) samples
Figure imgf000508_0001
Table 2 I(11q23) samples
Figure imgf000509_0001
Table 2 I(11q23) samples
Figure imgf000510_0001
Figure imgf000511_0001
Table 2 t(11q23) samples
Figure imgf000512_0001
Table 2 t(11q23) samples
Figure imgf000513_0001
Table 2 t(11q23) samples
Figure imgf000514_0001
Table 2 t(11q23) samples
Figure imgf000515_0001
Table 2 1(11q23) samples
Figure imgf000516_0001
Table 2 1(11q23) samples
Figure imgf000517_0001
Table 2 1(11q23) samples
Figure imgf000518_0001
Table 2 1(11q23) samples
Figure imgf000519_0001
Table 3 ALL subtypes
Figure imgf000520_0001
Table 3 ALL subtypes
Figure imgf000521_0001
Table 3 ALL subtypes
Figure imgf000522_0001
Table 3 ALL subtypes
Figure imgf000523_0001
Table 3 ALL subtypes
Figure imgf000524_0001
Table 3 ALL subtypes
Figure imgf000525_0001
Table 3 ALL subtypes
Figure imgf000526_0001
Table 3 ALL subtypes
Figure imgf000527_0001
Table 3 ALL subtypes
Figure imgf000528_0001
Table 3 ALL subtypes
Figure imgf000529_0001
Table 3 ALL subtypes
Figure imgf000530_0001
Table 3 ALL subtypes
Figure imgf000531_0001
Table 3 ALL subtypes
Figure imgf000532_0001
Table 3 ALL subtypes
Figure imgf000533_0001
Table 3 ALL subtypes
Figure imgf000534_0001
Table 3 ALL subtypes
Figure imgf000535_0001
Table 3 ALL subtypes
Figure imgf000536_0001
Table 3 ALL subtypes
Figure imgf000537_0001
Table 3 ALL subtypes
Figure imgf000538_0001
Table 3 ALL subtypes
Figure imgf000539_0001
Table 3 ALL subtypes
Figure imgf000540_0001
Table 3 ALL subtypes
Figure imgf000541_0001
Table 3 ALL subtypes
Figure imgf000542_0001
Table 3 ALL subtypes
Figure imgf000543_0001
Table 3 ALL subtypes
Figure imgf000544_0001
Table 3 ALL subtypes
Figure imgf000545_0001
Figure imgf000546_0001
Table 3 ALL subtypes
Figure imgf000547_0001
Table 3 ALL subtypes
Figure imgf000548_0001
Table 3 ALL subtypes
Figure imgf000549_0001
Table 3 ALL subtypes
Table 3 ALL subtypes
Figure imgf000551_0001
Table 3 ALL subtypes
Figure imgf000552_0001
Table 3 ALL subtypes
Figure imgf000553_0001
Table 3 ALL subtypes
Figure imgf000554_0001
Tabtβ 3 ALL subtypes
Figure imgf000555_0001
Table 3 ALL subtypes
Figure imgf000556_0001
Table 3 ALL subtypes
Figure imgf000557_0001
Table 3 ALL subtypes
Figure imgf000558_0001
-558-
Figure imgf000559_0001
Table 3 ALL subtypes
Figure imgf000560_0001
Table 3 ALL subtypes
Figure imgf000561_0001
Table 3 ALL subtypes
Figure imgf000562_0001
Table 3 ALL subtypes
Figure imgf000563_0001
Table 3 ALL subtypes
Figure imgf000564_0001
Table 3 ALL subtypes
Figure imgf000565_0001
Table 3 ALL subtypes
Figure imgf000566_0001
Table 3 ALL subtypes
Figure imgf000567_0001
Table 3 ALL subtypes
Figure imgf000568_0001
Table 3 ALL subtypes
Figure imgf000569_0001
Table 3 ALL subtypes
Figure imgf000570_0001
Table 3 ALL subtypes
Figure imgf000571_0001
Table 3 ALL subtypes
Figure imgf000572_0001
Table 3 ALL subtypes
Figure imgf000573_0001
Table 3 ALL subtypes
Figure imgf000574_0001
Table 3 ALL subtypes
Figure imgf000575_0001
Table 3 ALL subtypes
Figure imgf000576_0001
Table 3 ALL subtypes
Figure imgf000577_0001
Table 3 ALL subtypes
Figure imgf000578_0001
Table 3 ALL subtypes
Figure imgf000579_0001
Table 3 ALL subtypes
Figure imgf000580_0001
Table 3 ALL subtypes
Figure imgf000581_0001
Table 3 ALL subtypes
Figure imgf000582_0001
Table 3 ALL subtypes
Figure imgf000583_0001
Figure imgf000584_0001
Table 3 ALL subtypes
Figure imgf000585_0001
Table 3 ALL subtypes
Figure imgf000586_0001
Table 3 ALL subtypes
Figure imgf000587_0001
Table 3 ALL subtypes
Figure imgf000588_0001
Table 3 ALL subtypes
Figure imgf000589_0001
Table 3 ALL subtypes
Figure imgf000590_0001
Table 3 ALL subtypes
Figure imgf000591_0001
Table 3 ALL subtypes
Figure imgf000592_0001
Table 3 ALL subtypes
Figure imgf000593_0001
Table 3 ALL subtypes
Figure imgf000594_0001
Table 3 ALL subtypes
Figure imgf000595_0001
Table 3 ALL subtypes
Figure imgf000596_0001
Table 3 ALL subtypes
Figure imgf000597_0001
Table 3 ALL subtypes
Figure imgf000598_0001
Table 3 ALL subtypes
Figure imgf000599_0001
Table 3 ALL subtypes
Figure imgf000600_0001
Table 3 ALL subtypes
Figure imgf000601_0001
Figure imgf000602_0001
Table 3 ALL subtypes
Figure imgf000603_0001
Table 3 ALL subtypes
Figure imgf000604_0001
Table 3 ALL subtypes
Figure imgf000605_0001
Table 3 ALL subtypes
Figure imgf000606_0001
Table 3 ALL subtypes
Figure imgf000607_0001
Table 3 ALL subtypes
Figure imgf000608_0001
Table 3 ALL subtypes
Figure imgf000609_0001
Table 3 ALL subtypes
Figure imgf000610_0001
Table 3 ALL subtypes
Figure imgf000611_0001
Table 3 ALL subtypes
Figure imgf000612_0001
Table 3 ALL subtypes
Figure imgf000613_0001
Table 3 ALL subtypes
Figure imgf000614_0001
Tabtβ 3 ALL subtypes
Figure imgf000615_0001
Table 3 ALL subtypes
Figure imgf000616_0001
Table 3 ALL subtypes
Figure imgf000617_0001
Tablθ 3 ALL subtypes
Figure imgf000618_0001
Figure imgf000619_0001
Table 3 ALL subtypes
Figure imgf000620_0001
Table 3 ALL subtypes
Figure imgf000621_0001
Table 3 ALL subtypes
Figure imgf000622_0001
Table 3 ALL subtypes
Figure imgf000623_0001
Table 3 ALL subtypes
Figure imgf000624_0001
Table 3 ALL subtypes
Figure imgf000625_0001
Table 3 ALL subtypes
Figure imgf000626_0001
Table 3 ALL subtypes
Figure imgf000627_0001
Table 3 ALL subtypes
Figure imgf000628_0001
Table 3 ALL subtypes
Figure imgf000629_0001
Table 3 ALL subtypes
Figure imgf000630_0001
Table 3 ALL subtypes
Figure imgf000631_0001
Table 3 ALL subtypes
Figure imgf000632_0001
Table 3 ALL subtypes
Figure imgf000633_0001
Table 3 ALL subtypes
Figure imgf000634_0001
Table 3 ALL subtypes
Figure imgf000635_0001
Figure imgf000636_0001
Table 3 ALL subtypes
Figure imgf000637_0001
Tabla 3 ALL subtypes
Figure imgf000638_0001
Table 3 ALL subtypes
Figure imgf000639_0001
Table 3 ALL subtypes
Figure imgf000640_0001
-640-
Figure imgf000641_0001
Table 3 ALL subtypes
Figure imgf000642_0001
Table 3 ALL subtypes
Figure imgf000643_0001
Table 3 ALL subtypes
Figure imgf000644_0001
Table 3 ALL subtypes
Figure imgf000645_0001
Table 3 ALL subtypes
Figure imgf000646_0001
Table 3 ALL subtypes
Figure imgf000647_0001
Figure imgf000648_0001
Table 3 ALL subtypes
Figure imgf000649_0001
Table 3 ALL subtypes
Figure imgf000650_0001
Table 3 ALL subtypes
Figure imgf000651_0001
Table 3 ALL subtypes
Figure imgf000652_0001
Table 3 ALL subtypes
Figure imgf000653_0001
Table 3 ALL subtypes
Figure imgf000654_0001
Table 3 ALL subtypes
Figure imgf000655_0001
Table 3 ALL subtypes
Figure imgf000656_0001
Table 3 ALL subtypes
Figure imgf000657_0001
Table 3 ALL subtypes
Figure imgf000658_0001
Table 3 ALL subtypes
Figure imgf000659_0001
Figure imgf000660_0001
Figure imgf000661_0001
Table 3 ALL subtypes
Figure imgf000662_0001
Table 3 ALL subtypes
Figure imgf000663_0001
Table 3 ALL subtypes
Figure imgf000664_0001
Table 3 ALL subtypes
Figure imgf000665_0001
Table 3 ALL subtypes
Figure imgf000666_0001
Table 3 ALL subtypes
Figure imgf000667_0001
Table 3 ALL subtypes
Figure imgf000668_0001
Table 3 ALL subtypes
Figure imgf000669_0001
Table 4 AML WiIh 1(15 17)
Figure imgf000670_0001
Table 4 AML wilh t(15 17)
Figure imgf000671_0001
Table 4 AMLwιtht(1517)
Figure imgf000672_0001
Table 4 AML wilh t( 15 17)
Figure imgf000673_0001
Table 4 AML with 1(15 17)
Figure imgf000674_0001
Table 4 AML v»ιlh 1(15 17)
Figure imgf000675_0001
Table 4 AML with 1(15 17)
Figure imgf000676_0001
Table 4 AML with 1(15 17)
Figure imgf000677_0001
Table 4 AML with 1(15 17)
Figure imgf000678_0001
Table 4 AMLwιlht(1517)
Figure imgf000679_0001
Table 4 AML with 1(15 17)
Figure imgf000680_0001
Table 4 AML with 1(15 17)
Figure imgf000681_0001
Figure imgf000682_0001
Table A AMLwltht(1517)
Figure imgf000683_0001
Table 4 AMLwιtht(1517)
Figure imgf000684_0001
Table 4 AML wilh 1(15 17)
Figure imgf000685_0001
Table 4 AML with 1(15,17)
Figure imgf000686_0001
Table 4 AML with t(15 17)
Figure imgf000687_0001
Table 4 AMLwιtht(1517)
Figure imgf000688_0001
Figure imgf000689_0001
Table 4 AML with 1(15,17)
Figure imgf000690_0001
Table 4 AMLwιtht(1517)
Figure imgf000691_0001
Figure imgf000692_0001
Table 4 AML wilh 1(15 17)
Figure imgf000693_0001
Table 4 AMLwιtht(1517)
Figure imgf000694_0001
Table 4 AML with 1(15 17)
Figure imgf000695_0001
Figure imgf000696_0001
Table 4 AML with t(15 17)
Figure imgf000697_0001
Table A AMLwιtht(1517)
Figure imgf000698_0001
Table 4 AML W(Ih t(15 17)
Figure imgf000699_0001
Table 4 AML with t(15,17)
Figure imgf000700_0001
Table 4 AML with 1(15,17)
Figure imgf000701_0001
Table 4 AMLwιthl(1517)
Figure imgf000702_0001
Table 4 AMLwitht(1517)
Figure imgf000703_0001
Table 4 AMLwιtht(1517)
Figure imgf000704_0001
Figure imgf000705_0001
Table 4 AML with t(15 17)
Figure imgf000706_0001
Table 4 AMLwιthI(1517)
Figure imgf000707_0001
Table 4 AMLwIh 1(15,17)
Figure imgf000708_0001
Table 4 AML wilh 1(15 17)
Figure imgf000709_0001
Figure imgf000710_0001
Table 4 AML with t(15,17)
Figure imgf000711_0001
Figure imgf000712_0001
Table 4 AMLwιtht(1517)
Figure imgf000713_0001
Table 4 AMLwιtht(1517)
Figure imgf000714_0001
Table 4 AML with 1(15 17)
Figure imgf000715_0001
Table 4 AMLwιlht(1517)
Figure imgf000716_0001
Table 4 AMLwι(hl(1517)
Figure imgf000717_0001
Tabla 4 AMLwιtht(1517)
Figure imgf000718_0001
Table 4 AMLwιthl(1517)
Figure imgf000719_0001
-719-
Figure imgf000720_0001
Table 4 AML with 1(15 17)
Figure imgf000721_0001
Table 4 AMLwltht(1517)
Figure imgf000722_0001
Table 4 AML with 1(15 17)
Figure imgf000723_0001
Table 4 AMLwιtht(1517)
Figure imgf000724_0001
Table 4 AMLwιtht(1517)
Figure imgf000725_0001
Table 4 AMLwιtht(1517)
Figure imgf000726_0001
Figure imgf000727_0001
Table 4 AMLwltht(1517)
Figure imgf000728_0001
Table 4 AMLmtht(1517)
Figure imgf000729_0001
Table 4 AML with 1(15 17)
Figure imgf000730_0001
Table 4 AML with 1(15 17)
Figure imgf000731_0001
Table 4 AML with t(15 17)
Figure imgf000732_0001
Table 4 AML with «15 17)
Figure imgf000733_0001
Table 4 AMLwιlht(1517)
Figure imgf000734_0001
Table 4 AMLwιtht(1517)
Figure imgf000735_0001
Figure imgf000736_0001
Table 4 AML with t(15 17)
Figure imgf000737_0001
Table 4 AMLwιtht(1517)
Figure imgf000738_0001
Table 4 AML with t(15 17)
Figure imgf000739_0001
Table 4 AML WiIh 1(15 17)
Figure imgf000740_0001
Table 4 AML with t(15 17)
Figure imgf000741_0001
Table 4 AMlwilht(15;17)
Figure imgf000742_0001
Table 4 AML with 1(15,17)
Figure imgf000743_0001
Table 4 AMLwιtht(1517)
Figure imgf000744_0001
Table 4 AML with t<15.17)
Figure imgf000745_0001
Table 4 AMLwιthl(1517)
Figure imgf000746_0001
Table 4 AML wih 1(15 17)
Figure imgf000747_0001
Table 4 AMLwιlht(1517)
Figure imgf000748_0001
Table 4 AML with 1(15 17)
Figure imgf000749_0001
Table 5 AMLwιlht(821)ιnv(16)
Figure imgf000750_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000751_0001
Table 5 vιtht(821)iπv(16)
Figure imgf000752_0001
Table 5 AMLwιtht(821)ιπv(16)
Figure imgf000753_0001
Table 5 AMLwιthl(821)ιnv(16)
Figure imgf000754_0001
Table 5 AML with 1(82l)ιnv(16)
Figure imgf000755_0001
Table 5 AMLwltht(821)ιπv(16)
Figure imgf000756_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000757_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000758_0001
Table 5 AML with t(821)ιnv{16)
Figure imgf000759_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000760_0001
Table 5 AMLwιlhl(821)ιnv(16)
Figure imgf000761_0001
Table 5 AMLwιthl(821)ιnv(16)
Figure imgf000762_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000763_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000764_0001
Table 5 AMLwιthl(821)ιnv(16)
Figure imgf000765_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000766_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000767_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000768_0001
Table S AML wιtht(B 21) ιπv(16)
Figure imgf000769_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000770_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000771_0001
Table 5 AMLwιthl(821)inv(16)
Figure imgf000772_0001
Table 5 AMLwιtht(821)lnv(16)
Figure imgf000773_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000774_0001
Table 5 AMLwιtht(821)ιπv(1β)
Figure imgf000775_0001
Table 5 AMLwι!hl(821)ιnv(16)
Figure imgf000776_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000777_0001
Table 5 AMLwithl(821)ιπv(16)
Figure imgf000778_0001
Table 5 AMLwιlhl(821)ιnv(16)
Figure imgf000779_0001
Table 5 AMLwιlht(821)ιnv(1β)
Figure imgf000780_0001
TablB 5 AML with t(821)ιπv(16)
Figure imgf000781_0001
Table 5 AMLwιtht(821)lπv(16)
Figure imgf000782_0001
Table 5 AML with 1(821)ιnv(16)
Figure imgf000783_0001
Table 5 AMLwιthl(821)ιπv(16)
Figure imgf000784_0001
Table 5 AMLwιtht(821)lnv(16)
Figure imgf000785_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000786_0001
Table 5 AMLwιlht(821)ιnv(16)
Figure imgf000787_0001
Table 5 AMLwIIhI(B 21 )lnv(1β)
Figure imgf000788_0001
Table 5 AML WiIhI(B 21) ιnv(16)
Figure imgf000789_0001
Table 5 vith t(821)ιnv(16)
Figure imgf000790_0001
Figure imgf000791_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000792_0001
Table 5 AMLwιIhl(821)ιnv(l6)
Figure imgf000793_0001
Table 5 AMLwιlht(821)ιnv(16)
Figure imgf000794_0001
Table 5 AMLwιthl(821)ιπv(16)
Figure imgf000795_0001
Tablθ 5 AMLwιtht(821)ιnv(16)
Figure imgf000796_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000797_0001
Table 5 AMLwιtht(821)lnv(16)
Figure imgf000798_0001
Table 5 AMLwιthl(821)mv(l6)
Figure imgf000799_0001
Table 5 AMLwιlht(821)ιnv(16)
Figure imgf000800_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000801_0001
Table 5 AMLwιlhl(821)ιnv(16)
Figure imgf000802_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000803_0001
Table 5 AMLv»ιtht(821)ιnv(16)
Figure imgf000804_0001
Table 5 AMLwιthl(821)ιπv(1θ)
Figure imgf000805_0001
Table 5 AMLwilht(821)ιπv(16)
Figure imgf000806_0001
Figure imgf000807_0001
Table 5 AML wiih I(821)inv(l6)
Figure imgf000808_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000809_0001
Table 5 AML WiIhI(S 21 )ιnv(1β)
Figure imgf000810_0001
Table 5 AMLwιlht(821)lnv(1β)
Figure imgf000811_0001
Table 5 AML WiIhI(B 21) ιnv(16)
Figure imgf000812_0001
Table 5 AMLwιlht(821)ιnv(16)
Figure imgf000813_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000814_0001
Table 5 AMLwιtht(821)ιm(16)
Figure imgf000815_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000816_0001
Table 5 AML WlIhI(S 21) ιnv(16)
Figure imgf000817_0001
Table 5 AMLwιthl(821)ιnv(16)
Figure imgf000818_0001
Figure imgf000819_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000820_0001
-820-
Figure imgf000821_0001
Table 5 AMLwιthl(821)inv(16)
Figure imgf000822_0001
Table S AMLwιtht(821)ιnv(16)
Figure imgf000823_0001
Tablθ 5 AMLwιlht(821)ιnv(16)
Figure imgf000824_0001
Table 5 AML wilh t(821)ιnv(16)
Figure imgf000825_0001
Table 5 AML with 1(821)ιnv(16)
Figure imgf000826_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000827_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000828_0001
-828-
Figure imgf000829_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000830_0001
Table 5 AMLwιtht(B21)ιπv(16)
Figure imgf000831_0001
Table 5 AML with t(821)ιnv(16)
Figure imgf000832_0001
Table 5 AMLwltht(821)ιnv(16)
Figure imgf000833_0001
Table 5 AML with 1(8,21) ιnv(16)
Figure imgf000834_0001
Table 5 AMLwllht(821)inv(16)
Figure imgf000835_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000836_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000837_0001
Table 5 AMLv»ιlht(821)inv(16)
Figure imgf000838_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000839_0001
Table 5 AMLv»ιtht(821)ιπv(16)
Figure imgf000840_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000841_0001
Table 5 AMLwιlhl(821)ιnv(16)
Figure imgf000842_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000843_0001
Table 5 AMLwιtht(821)ιπv(16)
Figure imgf000844_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000845_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000846_0001
Table 5 AMLwitht(821)mv(16)
Figure imgf000847_0001
Table 5 AML with t(821)ιπv(16)
Figure imgf000848_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000849_0001
Table 5 AMLv»ιtht(821)ιnv(16)
Figure imgf000850_0001
Figure imgf000851_0001
Table 5 AML with 1(8,21) ιnv(16)
Figure imgf000852_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000853_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000854_0001
Table 5 AMLi»ltht(821)l[W(16)
Figure imgf000855_0001
Table 5 AML WlIhI(B 21) lπv(16)
Figure imgf000856_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000857_0001
Table 5 AMLwitht(821)ιnv(16)
Figure imgf000858_0001
Table 5 AML with 1(8,21 )ιnv(16)
Figure imgf000859_0001
Table 5 AMLv»ιtht(821)ιπv(16)
Figure imgf000860_0001
Table 5 AML\»lthl(821)ιnv(16)
Figure imgf000861_0001
Figure imgf000862_0001
Table 5 AMLwιtht(821)ιnv(1β)
Figure imgf000863_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000864_0001
Table 5 AMLwιlht(821)ιnv(16)
Figure imgf000865_0001
Table 5 AMLwιlht(821)ιnv(16)
Figure imgf000866_0001
Table 5 AMLwιthl(821)ιnv(16)
Figure imgf000867_0001
Table 5 AMLwιtht(821)ιπv(16)
Figure imgf000868_0001
Table 5 AML wιlhl(β 21) !nv(16)
Figure imgf000869_0001
Table 5 AML WiIhI(B 21) ιnv(16)
Figure imgf000870_0001
Table 5 AMLwιlhI(821)lnv(16)
Figure imgf000871_0001
Table 5 AMLwιlhl(821)ιnv(16)
Figure imgf000872_0001
123
Figure imgf000873_0001
Table 5 AML wιlhl(β 21) ιnv(16)
Figure imgf000874_0001
Table 5 AMLwιtht(821)lnv(16)
Figure imgf000875_0001
Table 5 AMLwιtht(821)ιπv(16)
Figure imgf000876_0001
Table 5
AMLwιtht(821)ιnv(16)
Figure imgf000877_0001
Table 5 AMLwιlht(821)ιnv(1β)
Figure imgf000878_0001
Table 5 AML with t(821)ιπv(16)
Figure imgf000879_0001
Table 5 AML wιtht(B 21 )lnv(16)
Figure imgf000880_0001
Table 5 AMLwltht(821)ιnv(16)
Figure imgf000881_0001
Table 5 AMLwιthl(821)inv(16)
Figure imgf000882_0001
Tablθ 5 AML with t(8,21)ιnv(16)
Figure imgf000883_0001
Table 5 AMLwιtht(821)lπv(16)
Figure imgf000884_0001
Table 5 AMLwιlht(821)ιnv{16)
Figure imgf000885_0001
Table 5 AMLwιtht(821)inv(16)
Figure imgf000886_0001
Table 5 AMLwltht(821)ιnv(16)
Figure imgf000887_0001
Table S AMLwιlht(821)inv(16)
Figure imgf000888_0001
Table 5 AMLwιtht(β21)ιnv{16)
Figure imgf000889_0001
Table 5 AML wilhl(β 21) ιnv(16)
Figure imgf000890_0001
Table 5 AMLwιtht(821)ιnv(1β)
Figure imgf000891_0001
Table 5 AML wιlhl(β 21) ιnv(16)
Figure imgf000892_0001
Table 5 AMLwιtht(821)ιnv(16)
Figure imgf000893_0001
Table 5 AMLwιlht(821)lnv(16)
Figure imgf000894_0001
Table 5 AML with 1(8,21) ιnv(16)
Figure imgf000895_0001
Table 5 AMLwιlhl(821)ιnv(16)
Figure imgf000896_0001
Table 5 AMLwιlht(821)ιnv(16)
Figure imgf000897_0001
Table 5 AMLwιlhl(821)ιnv(1β)
Figure imgf000898_0001
Table 5 AML»ιtht(821)ιnv(16)
Figure imgf000899_0001
Tablθ 6 AML with complex aberrant kt
Figure imgf000900_0001
-900-
Table 6 AML with complex aberrant kt
Figure imgf000901_0001
Table 6 AML with complex aberrant kt
Figure imgf000902_0001
Table 6 AML with complex aberrant kt
Figure imgf000903_0001
Table 6 AML with complex aberrant kt
Figure imgf000904_0001
Tabtθ 6 AML with complex aberrant kt
Figure imgf000905_0001
Table 6 AML with complex aberrant kt
Figure imgf000906_0001
Table 6 AML with complex aberrant kt
Figure imgf000907_0001
Table 6 AML with complex aberrant kt
Figure imgf000908_0001
Table 6 AML with complex aberrant kt
Figure imgf000909_0001
Table 6 AML with complex aberrant kt
Figure imgf000910_0001
Table 6 AML with complex aberrant kt
Figure imgf000911_0001
-911-
Figure imgf000912_0001
Table 6 AML wilh complex aberrant kt
Figure imgf000913_0001
Figure imgf000914_0001
Figure imgf000915_0001
Figure imgf000916_0001
Table 6 AML with complex aberrant kt
Figure imgf000917_0001
Table 6 AML with complex aberrant kt
Figure imgf000918_0001
Table 6 AML with complex aberrant kt
Figure imgf000919_0001
Table 6 AML with complex aberrant kt
Figure imgf000920_0001
Table 6 AML with complex aberrant kt
Figure imgf000921_0001
Table 6 AML with complex aberrant kt
Figure imgf000922_0001
Table 6 AML with complex aberrant kt
Figure imgf000923_0001
Table 6 AML with complex aberrant kt
Figure imgf000924_0001
Table 6 AML with complex aberrant kt
Figure imgf000925_0001
Table 6 AML with complex aberrant kt
Figure imgf000926_0001
Table 6 AML with complex aberrant kt
Figure imgf000927_0001
Table 6 AML wilh complex aberrant kt
Figure imgf000928_0001
Table 6 AML with complex aberrant kt
Figure imgf000929_0001
Table 6 AML wilh complex aberrant kt
Figure imgf000930_0001
Table 6 AML with complex aberrant kt
Figure imgf000931_0001
Table 6 AML with complex aberrant kt
Figure imgf000932_0001
Figure imgf000933_0001
Table 6 AML with complex aberrant kt
Figure imgf000934_0001
Figure imgf000935_0001
Table 6 AML with complex aberrant kt
Figure imgf000936_0001
Table 6 AML with complex aberrant kt
Figure imgf000937_0001
Table 6 AML with complex aberrant kt
Figure imgf000938_0001
Table 6 AML with complex aberrant kt
Figure imgf000939_0001
Figure imgf000940_0001
Table 6 AMI with complex aberrant kt
Figure imgf000941_0001
Table 6 AML with complex aberrant kt
Figure imgf000942_0001
Table 6 AML wilh complex aberrant kt
Figure imgf000943_0001
Table 6 AML with complex aberrant kt
Figure imgf000944_0001
Table 6 AML wilh complex aberrant kt
Figure imgf000945_0001
Table 6 AML with complex aberrant kt
Figure imgf000946_0001
Table 6 AML with complex aberrant kt
Figure imgf000947_0001
Table 6 AML with complex aberrant kt
Figure imgf000948_0001
Table 6 AML with complex aberrant kt
Figure imgf000949_0001
Table 6 AML with compfex aberrant kt
Figure imgf000950_0001
Table 6 AML with complex aberrant kt
Figure imgf000951_0001
Table 6 AML with complex aberrant kt
Figure imgf000952_0001
Table 6 AML with complex aberrant kt
Figure imgf000953_0001
Table 6 AML with complex aberrant kt
Figure imgf000954_0001
Table 6 AML with complex aberrant kt
Figure imgf000955_0001
Table 6 AML with complex aberrant kt
Figure imgf000956_0001
Table 6 AML with complex aberrant kt
Figure imgf000957_0001
Figure imgf000958_0001
Table 6 AML with complex aberrant kt
Figure imgf000959_0001
Table 6 AML with complex aberrant kt
Figure imgf000960_0001
Table 6 AML with complex aberrant kt
Figure imgf000961_0001
Table 6 AML with complex aberrant kt
Figure imgf000962_0001
Figure imgf000963_0001
Table G AML with complex aberrant kt
Figure imgf000964_0001
Table 6 AML with complex aberrant kt
Figure imgf000965_0001
Table 6 AML with complex aberrant kt
Figure imgf000966_0001
Table 6 AML with complex aberrant kt
Figure imgf000967_0001
Table θ AML with complex aberrant kt
Figure imgf000968_0001
Table 6 AML with complex aberrant kt
Figure imgf000969_0001
Table 6 AML with complex aberrant kt
Figure imgf000970_0001
Table 6 AML wilh complex aberrant kt
Figure imgf000971_0001
Table 6 AML with complex aberrant kt
Figure imgf000972_0001
Table 6 AML with complex aberrant kt
Figure imgf000973_0001
Table 6 AML with complex aberrant kt
Figure imgf000974_0001
Table 6 AML with complex aberrant kt
Figure imgf000975_0001
Table 6 AML with complex aberrant kt
Figure imgf000976_0001
Table 6 AML with complex aberrant kt
Figure imgf000977_0001
Table 6 AML with complex aberrant kl
Figure imgf000978_0001
Table 6 AML with complex aberrant kt
Figure imgf000979_0001
Table 6 AML wilh complex aberrant kt
Figure imgf000980_0001
Table 6 AML with complex aberrant kt
Figure imgf000981_0001
Table 6 AML with complex aberrant kt
Figure imgf000982_0001
Table 6 AML with complex aberrant kt
Figure imgf000983_0001
Table 6 AML with complex aberrant kl
Figure imgf000984_0001
Table 6 AML with complex aberrant Kt
Figure imgf000985_0001
Table 6 AML with complex aberrant kt
Figure imgf000986_0001
Table 6 AML with complex aberrant kt
Figure imgf000987_0001
Table 6 AML with complex aberrant kt
Figure imgf000988_0001
Table 6 AML with complex aberrant kt
Figure imgf000989_0001
Table 6 AML with complex aberrant kt
Figure imgf000990_0001
Table 6 AML with complex aberrant kt
Figure imgf000991_0001
Figure imgf000992_0001
Table 6 AML with complex aberrant kt
Figure imgf000993_0001
Table 6 AML with complex aberrant kt
Figure imgf000994_0001
Table 6 AML with complex aberrant kl
Figure imgf000995_0001
Table 6 AML with complex aberrant kl
Figure imgf000996_0001
Figure imgf000997_0001
Table 6 AML with complex aberrant kt
Figure imgf000998_0001
Table 6 AML wilh complex aberrant kt
Figure imgf000999_0001
100
Table 6 AML with complex aberrant kt
Figure imgf001000_0001
Table 6 AML with complex aberrant kt
Figure imgf001001_0001
Table 6 AML with complex aberrant kt
Figure imgf001002_0001
Table 6 AML with complex aberrant kt
Figure imgf001003_0001
Table 6 AML with complex aberrant kl
Figure imgf001004_0001
Table 6 AML with complex aberrant kt
Figure imgf001005_0001
Table 6 AML with complex aberrant kt
Figure imgf001006_0001
Table 6 AML with complex aberrant kl
Figure imgf001007_0001
Table 6 AML with complex aberrant kt
Figure imgf001008_0001
Figure imgf001009_0001
Table 6 AML with complex aberrant kt
Figure imgf001010_0001
Table 6 AML with complex aberrant kt
Figure imgf001011_0001
Table 6 AML with complex aberrant kt
Figure imgf001012_0001
Table 6 AML with complex aberrant kt
Figure imgf001013_0001
Table 6 AML with complex aberrant kt
Figure imgf001014_0001
Tablθ 6 AML with complex aberrant kt
Figure imgf001015_0001
Table 6 AML with complex aberrant kt
Figure imgf001016_0001
Table 6 AML with complex aberrant kt
Figure imgf001017_0001
Table 6 AML with complex aberrant kt
Figure imgf001018_0001
Table 6 AML with complex aberrant kt
Figure imgf001019_0001
Table 6 AML with complex aberrant kt
Figure imgf001020_0001
Table 6 AML with complex aberrant kt
Figure imgf001021_0001
Table 6 AML with complex aberrant kt
Figure imgf001022_0001
Table 6 AML with complex aberrant M
Figure imgf001023_0001
Table 6 AML with complex aberrant kt
Figure imgf001024_0001
Table 6 AML with complex aberrant kt
Figure imgf001025_0001
Table 6 AML with complex aberrant kt
Figure imgf001026_0001
-1026-
Figure imgf001027_0001
- 1027-
Figure imgf001028_0001
Table 6 AML with complex aberrant kt
Figure imgf001029_0001
Table 7
Figure imgf001030_0001
Table 7
Figure imgf001031_0001
Table 7
Figure imgf001032_0001
Table 8
Figure imgf001033_0001
Table 8
Figure imgf001034_0001
Table 8
Figure imgf001035_0001
Table 8
Figure imgf001036_0001
Table 8
Figure imgf001037_0001
Table 9
Figure imgf001038_0001
Table 9
Figure imgf001039_0001
Table 9
Figure imgf001040_0001
Table 9
Figure imgf001041_0001
Table 10
Figure imgf001042_0001
Table 10
Figure imgf001043_0001
Table 10
Figure imgf001044_0001
Table 10
Figure imgf001045_0001
Table 13
Figure imgf001046_0001
Table 13
Figure imgf001047_0001
Table 13
Figure imgf001048_0001
Table 13
Figure imgf001049_0001
Table 13
Figure imgf001050_0001
Table 13
Figure imgf001051_0001
Table13
Figure imgf001052_0001
Table 13
Figure imgf001053_0001
Table 13
Figure imgf001054_0001
Table 13
Figure imgf001055_0001
10
Table13
Figure imgf001056_0001
11
Table13
Figure imgf001057_0001
12
Table13
Figure imgf001058_0001
13
Table13
Figure imgf001059_0001
14
Table 13
Figure imgf001060_0001
15
Table13
Figure imgf001061_0001
16
Table 13
Figure imgf001062_0001
17
Table13
Figure imgf001063_0001
18
Table 13
Figure imgf001064_0001
19
Table 13
Figure imgf001065_0001
20
Table 13
Figure imgf001066_0001
21
Table 13
Figure imgf001067_0001
22
Table 14
Figure imgf001068_0001
Table 14
Figure imgf001069_0001
Table 14
Figure imgf001070_0001
Table 14
Figure imgf001071_0001
Table 14
Figure imgf001072_0001
Table 14
Figure imgf001073_0001
Table 14
Figure imgf001074_0001
Table 14
Figure imgf001075_0001
Table 14
Figure imgf001076_0001
Table 14
Figure imgf001077_0001
10
Table 14
Figure imgf001078_0001
11
Table 14
Figure imgf001079_0001
12
Table 14
Figure imgf001080_0001
13
Table 14
Figure imgf001081_0001
14
Table 14
Figure imgf001082_0001
15
Table 14
Figure imgf001083_0001
16
Table 14
Figure imgf001084_0001
17
Table 14
Figure imgf001085_0001
18
Table 14
Figure imgf001086_0001
19
Table 14
Figure imgf001087_0001
20
Table 14
Figure imgf001088_0001
21
Table 14
Figure imgf001089_0001
22
Table 14
Figure imgf001090_0001
23
Table 14
Figure imgf001091_0001
24
Table 14
Figure imgf001092_0001
25
Table 14
Figure imgf001093_0001
26
Table 14
Figure imgf001094_0001
27
Table 14
Figure imgf001095_0001
28
Table 14
Figure imgf001096_0001
29
Table 14
Figure imgf001097_0001
30
Table 14
Figure imgf001098_0001
31
Table 14
Figure imgf001099_0001
32
Table 14
Figure imgf001100_0001
33
Table 14
Figure imgf001101_0001
34
Table 14
Figure imgf001102_0001
35
Table 14
Figure imgf001103_0001
36
Table 14
Figure imgf001104_0001
37
Table 14
Figure imgf001105_0001
38
Table 15
Figure imgf001106_0001
Table 15
Figure imgf001107_0001
Table15
Figure imgf001108_0001
Table15
Figure imgf001109_0001
Table15
Figure imgf001110_0001
Table15
Figure imgf001111_0001
Table 16
Figure imgf001112_0001
Table16
Figure imgf001113_0001
Table 16
Figure imgf001114_0001
Table 16
Figure imgf001115_0001
Table 16
Figure imgf001116_0001
Table16
Figure imgf001117_0001
Table 17
Figure imgf001118_0001
Table 17
Figure imgf001119_0001
Table 17
TABLE 9: annotation lower expressed genes in AML with 11q23
HUGO Sequence
# affy ld name Title MapLocatlon Sequence Type Transcript ID Derived From
50 218086 at NPDC1 neural proliferation. differentiation and control, 1 9q34 3 Exemplarsequence Hs 105547 0 NM 015392 1
Table 17
Figure imgf001121_0001
Table 17
Figure imgf001122_0001
Table 17
Figure imgf001123_0001
Table 18
Figure imgf001124_0001
Table 18
Figure imgf001125_0001
Table 18
Figure imgf001126_0001
Table18
Figure imgf001127_0001
Table 18
Figure imgf001128_0001
Table18
Figure imgf001129_0001
Table 19
Figure imgf001130_0001
Table19
Figure imgf001131_0001
Table 19
Figure imgf001132_0001
Table 19
TABLE 10 annotation: lower expressed genes in 11q23 leukemias
Sequence atfy id HUGO name Title MapLocation Sequence Type Transcript ID Derived From Sequence ID
226764 at LOC152485 hypothetical protein LOC152485 4q31 1 Conseπsussequeπce Hs 343480 BG542955 Hs 34348 0 S1
MAD mothers against decapentaplegtc
210993 s at homolog 1 (Drosophila) 4q28 Exemplarsequence Hs 790671 g1332713
Table 19
Figure imgf001134_0001
Table 19
Figure imgf001135_0001
Table 19
Figure imgf001136_0001
Table 19
Figure imgf001137_0001
Table 20
Figure imgf001138_0001
Table 20
Figure imgf001139_0001
Table 20
Figure imgf001140_0001
Table 20
Figure imgf001141_0001
Table 20
Figure imgf001142_0001
Table 20
Figure imgf001143_0001

Claims

PATENT CLAIMS
1. A method of genotyping a leukemia cell, the method comprising: detecting an expression level of at least one set of genes in or derived from at least one target human leukemia cell; and, correlating a detected differential expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell lacking t(l Iq23)/MLL with the target human leukemia cell having t(l Iq23)/MLL; correlating a detected substantially identical expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell lacking t(l Iq23)/MLL with the target human leukemia cell lacking t(l lq23)/MLL; correlating a detected differential expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell having t(l Iq23)/MLL with the target human leukemia cell lacking t(l Iq23)/MLL; or, correlating a detected substantially identical expression of one or more genes of the target human leukemia cell relative to a corresponding expression of the genes in or derived from at least one reference human leukemia cell having t(l Iq23)/MLL with the target human leukemia cell having t(l lq23)/MLL, thereby genotyping the leukemia cell.
2. The method of claim 1, wherein the target human leukemia cell and/or the reference human leukemia cell comprises an acute leukemia cell.
3. The method of claim 1, wherein the reference human leukemia cell lacking t(l Iq23)/MLL comprises a precursor B-ALL cell with t(9;22), a precursor B-ALL cell with t(8;14), a precursor T-ALL cell, an AML cell with t(8;21), an AML cell with t(15;17), an AML cell with inv(16), or an AML cell with a complex aberrant karyotype.
4. The method of claim 1, wherien the set of genes is selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14.
5. The method of claim 1, comprising: correlating a detected differential expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference ALL cell having t(l Iq23)/MLL with the target human acute leukemia being an AML cell; or, correlating a detected substantially identical expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference
AML cell having t(l Iq23)/MLL with the target human acute leukemia being an AML cell.
6. The method of claim 1, comprising: correlating a detected differential expression of one or more genes of the target human leukemia cell having t( 11 q23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference AML cell having t(l Iq23)/MLL with the target human acute leukemia being an ALL cell; or, correlating a detected substantially identical expression of one or more genes of the target human leukemia cell having t(l Iq23)/MLL relative to a corresponding expression of the genes in or derived from at least one reference
ALL cell having t(l Iq23)/MLL with the target human acute leukemia being an ALL cell.
7. The method of claim 1, wherein the set of genes in or derived from the target human leukemia cell comprises at least about 10, 100, 1000, 10000, or more members.
8. The method of claim 1, wherein the target human leukemia cell is obtained from a subject.
9. The method of claim 1, wherein the detected differential expression of the genes comprises at least about a 5% difference.
10. The method of claim 1, wherein the detected substantially identical expression of the genes comprises less than about a 5% difference.
11. The method of claim 1, wherein the expression level is detected using an array, a robotics system, and/or a microfluidic device.
12. The method of claim 1, wherein the expression level of the set of genes is detected by amplifying nucleic acid sequences associated with the genes to produce amplicons and detecting the amplicons.
13. The method of claim 12, wherein the amplicons are detected using a process that comprises one or more of: hybridizing the amplicons to an oligonucleotide array, digesting the amplicons with a restriction enzyme, or real¬ time polymerase chain reaction (PCR) analysis.
14. The method of claim 1, wherein detecting the expression level of the set of genes comprises measuring quantities of transcribed polynucleotides or portions thereof expressed or derived from the genes.
15. The method of claim 14, wherein the transcribed polynucleotides are mRNAs or cDNAs.
16. The method of claim 1, wherein detecting the expression level comprises contacting polynucleotides or polypeptides expressed from the genes with compounds that specifically bind the polynucleotides or polypeptides.
17. The method of claim 16, wherein the compounds comprise aptamers, antibodies or fragments thereof.
18. A method of producing a reference data bank for genotyping leukemia cells, the method comprising:
(a) compiling a gene expression profile of a patient sample by detecting the expression level of one or more genes of at least one human leukemia cell, which genes are selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14, and;
(b) classifying the gene expression profile using a machine learning algorithm.
19. The reference data bank produced by the method of claim 18.
20. A kit, comprising: one or more probes that correspond to at least portions of genes or expression products thereof, which genes are selected from the markers listed in Table 8, Table 9, Table 10, Table 13, and/or Table 14; and, instructions for correlating detected expression levels of polynucleotides and/or polypeptides in at least one target leukemia cell from a human subject, which polynucleotides and/or polypeptides are targets of one or more of the probes, with the target leukemia cell comprising a t(l Iq23)/MLL.
21. The kit of claim 20, wherein at least one solid support comprises the probes.
22. The kit of claim 20, comprising one or more additional reagents to perform real-time PCR analyses.
23. A system, comprising: one or more probes that correspond to at least portions of genes or expression products thereof, which genes are selected from the markers listed in
Table 8, Table 9, Table 10, Table 13, and/or Table 14; and, at least one reference data bank for correlating detected expression levels of polynucleotides and/or polypeptides in at least one target leukemia cell from a human subject, which polynucleotides and/or polypeptides are targets of one or more of the probes, with the target leukemia cell comprising a t(l lq23)/MLL.
24. The system of claim 23, wherein at least one solid support comprises the probes.
25. The system of claim 23, comprising one or more additional reagents and/or components to perform real-time PCR analyses.
26. The system of claim 23, wherein the reference data bank is produced by:
(a) compiling a gene expression profile of a patient sample by determining the expression level at least one of the genes, and (b) classifying the gene expression profile using a machine learning algorithm.
27. The system of claim 26, wherein the machine learning algorithm is selected from the group consisting of: a weighted voting algorithm, a K-nearest neighbors algorithm, a decision tree induction algorithm, a support vector machine, and a feed-forward neural network.
PCT/EP2005/011732 2004-11-04 2005-11-03 Gene expression profiling of leukemias with mll gene rearrangements WO2006048266A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US62567304P 2004-11-04 2004-11-04
US60/625,673 2004-11-04

Publications (2)

Publication Number Publication Date
WO2006048266A2 true WO2006048266A2 (en) 2006-05-11
WO2006048266A3 WO2006048266A3 (en) 2006-08-24

Family

ID=36021847

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2005/011732 WO2006048266A2 (en) 2004-11-04 2005-11-03 Gene expression profiling of leukemias with mll gene rearrangements

Country Status (1)

Country Link
WO (1) WO2006048266A2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8383349B2 (en) 2007-03-16 2013-02-26 The Board Of Trustees Of The Leland Stanford Junior University Bone morphogenetic protein antagonist and uses thereof
CN109976760A (en) * 2017-12-27 2019-07-05 北京东土科技股份有限公司 A kind of the cross compile method and cross-compiler of graphic language
US10342786B2 (en) 2017-10-05 2019-07-09 Fulcrum Therapeutics, Inc. P38 kinase inhibitors reduce DUX4 and downstream gene expression for the treatment of FSHD
US11291659B2 (en) 2017-10-05 2022-04-05 Fulcrum Therapeutics, Inc. P38 kinase inhibitors reduce DUX4 and downstream gene expression for the treatment of FSHD
US11690847B2 (en) 2016-11-30 2023-07-04 Case Western Reserve University Combinations of 15-PGDH inhibitors with corticosteroids and/or TNF inhibitors and uses thereof
US11718589B2 (en) 2017-02-06 2023-08-08 Case Western Reserve University Compositions and methods of modulating short-chain dehydrogenase

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002002634A2 (en) * 2000-06-30 2002-01-10 Incyte Genomics, Inc. Human extracellular matrix and cell adhesion polypeptides
WO2002066954A2 (en) * 2001-02-16 2002-08-29 Arbor Vita Corporation Pdz domain interactions and lipid rafts
WO2003039443A2 (en) * 2001-11-05 2003-05-15 Deutsches Krebsforschungszentrum Novel genetic markers for leukemias
WO2005043163A2 (en) * 2003-11-04 2005-05-12 Roche Diagnostics Gmbh Method for distinguishing who classified aml subtypes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002002634A2 (en) * 2000-06-30 2002-01-10 Incyte Genomics, Inc. Human extracellular matrix and cell adhesion polypeptides
WO2002066954A2 (en) * 2001-02-16 2002-08-29 Arbor Vita Corporation Pdz domain interactions and lipid rafts
WO2003039443A2 (en) * 2001-11-05 2003-05-15 Deutsches Krebsforschungszentrum Novel genetic markers for leukemias
WO2005043163A2 (en) * 2003-11-04 2005-05-12 Roche Diagnostics Gmbh Method for distinguishing who classified aml subtypes

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HAFERLACH T ET AL: "The Diagnosis of 14 Specific Subtypes of Leukemia Is Possible Based on Gene Expression Profiles: A Study on 263 Patients with AML, ALL, CML, or CLL" BLOOD, W.B.SAUNDERS COMPANY, ORLANDO, FL, US, vol. 100, no. 11, 16 November 2002 (2002-11-16), page 139A, XP002263227 ISSN: 0006-4971 *
KOHLMANN A ET AL: "MOLECULAR CHARACTERIZATION OF ACUTE LEUKEMIAS BY USE OF MICROARRAY TECHNOLOGY" GENES, CHROMOSOMES & CANCER, vol. 37, no. 4, August 2003 (2003-08), pages 396-405, XP008025253 *
KOHLMANN A ET AL: "Pediatric acute lymphoblastic leukemia (ALL) gene expression signatures classify an independent cohort of adult ALL patients." LEUKEMIA : OFFICIAL JOURNAL OF THE LEUKEMIA SOCIETY OF AMERICA, LEUKEMIA RESEARCH FUND, U.K. JAN 2004, vol. 18, no. 1, January 2004 (2004-01), pages 63-71, XP002373717 ISSN: 0887-6924 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8383349B2 (en) 2007-03-16 2013-02-26 The Board Of Trustees Of The Leland Stanford Junior University Bone morphogenetic protein antagonist and uses thereof
US11690847B2 (en) 2016-11-30 2023-07-04 Case Western Reserve University Combinations of 15-PGDH inhibitors with corticosteroids and/or TNF inhibitors and uses thereof
US11718589B2 (en) 2017-02-06 2023-08-08 Case Western Reserve University Compositions and methods of modulating short-chain dehydrogenase
US10342786B2 (en) 2017-10-05 2019-07-09 Fulcrum Therapeutics, Inc. P38 kinase inhibitors reduce DUX4 and downstream gene expression for the treatment of FSHD
US10537560B2 (en) 2017-10-05 2020-01-21 Fulcrum Therapeutics. Inc. P38 kinase inhibitors reduce DUX4 and downstream gene expression for the treatment of FSHD
US11291659B2 (en) 2017-10-05 2022-04-05 Fulcrum Therapeutics, Inc. P38 kinase inhibitors reduce DUX4 and downstream gene expression for the treatment of FSHD
US11479770B2 (en) 2017-10-05 2022-10-25 Fulcrum Therapeutics, Inc. Use of p38 inhibitors to reduce expression of DUX4
CN109976760A (en) * 2017-12-27 2019-07-05 北京东土科技股份有限公司 A kind of the cross compile method and cross-compiler of graphic language

Also Published As

Publication number Publication date
WO2006048266A3 (en) 2006-08-24

Similar Documents

Publication Publication Date Title
KR101530689B1 (en) Prognosis prediction for colorectal cancer
WO2006048262A2 (en) Classification of acute myeloid leukemia
US9822417B2 (en) Methods and biomarkers for analysis of colorectal cancer
EP1996729A2 (en) Molecular assay to predict recurrence of dukes&#39; b colon cancer
WO2010042228A2 (en) Methods for predicting disease outcome in patients with colon cancer
US20120004127A1 (en) Gene expression markers for colorectal cancer prognosis
WO2006048266A2 (en) Gene expression profiling of leukemias with mll gene rearrangements
WO2006048263A2 (en) Gene expression profiling in acute promyelocytic leukemia
WO2006048264A2 (en) Gene expression profiling in acute lymphoblastic leukemia (all), biphenotypic acute leukemia (bal), and acute myeloid leukemia (aml) m0
WO2006048270A2 (en) Methods of detecting leukemia and its subtypes
WO2005043163A2 (en) Method for distinguishing who classified aml subtypes
Nilbert et al. Lessons from genetic profiling in soft tissue sarcomas
WO2006048274A1 (en) Flt3 gene expression profiling
WO2006048273A1 (en) Methods of validating gene expression assays
WO2006048265A2 (en) Classifying leukemia with translocation (9;22)
WO2006048275A2 (en) Chronic lymphocytic leukemia expression profiling
US20070275380A1 (en) Method for Distinguishing Aml Subtypes With Aberrant and Prognostically Intermediate Karyotypes
WO2005043161A2 (en) Method for distinguishing leukemia subtypes
WO2005043164A2 (en) Method for distinguishing cbf-positive aml subtypes from cbf-negative aml subtypes
US20070122814A1 (en) Methods for distinguishing prognostically definable aml
US20070212734A1 (en) Method for Distinguishing T(11Q23)/Mll-Positive Leukemias From t(11Q23)/Mll Negative Leukemia
WO2005043167A2 (en) Method for distinguishing aml subtypes with differents gene dosages
EP1682902A2 (en) Method for distinguishing mll-ptd-positive aml from other aml subtypes

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KN KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05810753

Country of ref document: EP

Kind code of ref document: A2