WO2007137366A1 - Diagnostic and prognostic indicators of cancer - Google Patents
Diagnostic and prognostic indicators of cancer Download PDFInfo
- Publication number
- WO2007137366A1 WO2007137366A1 PCT/AU2007/000768 AU2007000768W WO2007137366A1 WO 2007137366 A1 WO2007137366 A1 WO 2007137366A1 AU 2007000768 W AU2007000768 W AU 2007000768W WO 2007137366 A1 WO2007137366 A1 WO 2007137366A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gene
- group
- expression
- outcome
- genes
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/30—Unsupervised data analysis
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6834—Enzymatic or biochemical coupling of nucleic acids to a solid phase
- C12Q1/6837—Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/136—Screening for pharmacological compounds
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- the present invention relates generally to the field of cancer research.
- the present invention also relates to a method of determining a predictive relationship between data related to gene expression levels and an outcome related to a disease or disorder, and a classifier for using the predictive relationship. More particularly, the present invention relates to expression profiling of cancer patients.
- ALL acute lymphoblastic leukaemia
- Acute lymphoblastic leukaemia is the single most common and at the same time the most successfully treated malignancy in children.
- resistant forms of childhood ALL constitute a leading cause of cancer-related morbidity and mortality.
- Long-term survival rates have reached 75% with contemporary treatment strategies tailored to specific subgroups based on biological and clinical risk features .
- Current National Cancer Institute (NCI) criteria for risk assignment utilise age and white blood cell (WBC) counts at diagnosis to stratify patients into standard risk (SR) (1-9.99 years of age and WBC ⁇ 50000/ ⁇ l) and high risk (HR) (>10 years of age and/or WBC>50000/ ⁇ l) .
- SR standard risk
- HR high risk
- Treatment itself remains one of the strongest predictors of outcome.
- the invention provides a method of determining a predictive relationship between data related to gene expression levels and an outcome related to a disease or disorder, said method comprising: providing data comprising a plurality of probe sets, each probe set comprising a plurality of elements of data related to a gene, each data element reflecting a gene expression level indicated using the probe and each data element in each probe set being from a respective biological sample; analysing the data to find a group of one or more of the probe sets which have a predictive relationship to the outcome.
- the probes related to the group of probe sets reflect which genes have the predictive relationship between their expression level and the outcome.
- the method further comprises further analysing the group to find a refined group of one or more probe sets highly related to the outcome.
- the method further comprises refining the membership of the refined group to include probe sets which are predictive of the outcome when the data related to the probe sets are obtained by use of two or more different techniques to measure gene expression.
- the method further comprises refining the membership of the refined group to exclude probe sets which are not significantly predictive of the outcome when the probe sets are obtained by use of two or more different techniques to measure gene expression.
- the predictive relationship is indicative of one or more outcomes of a set of possible outcomes.
- the group is typically indicative of an outcome by being discriminatory of the outcome with a significant degree of certainty.
- the group is discriminatory of the outcome according to each expression level of the gene associated with each corresponding probe in the group .
- analysing the data to find the group of probe sets comprises conducting supervised analysis to rank the relevance of each probe set to the outcome.
- finding the refined group comprises conducting further supervised analysis of the group.
- the group of probe sets again undergoes analysis to rank probe sets that are members of the first group. From this ranking the refined group is identified. Using this technique the refined group will not necessarily have the same membership as the same number of the top probe sets in the first group.
- one testing technique is to measure gene expression using an oligonucleotide microarray.
- the technique is to measure gene expression using quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) .
- the method further comprises verifying the relationship between gene expression levels and the outcome.
- the verification is typically conducted by using an unsupervised clustering technique.
- the method further comprises calculating a score of the accuracy in which the group is indicative of the outcome.
- finding the refined group comprises decreasing the membership of the group by- removing members of the group which have a relatively low accuracy score. This may be recursively repeated until the size of the refined group is sufficiently small.
- the data undergoes a preliminary variance filter prior to analysis to identify outcome-discriminating probe sets. In another embodiment the data undergoes a preliminary variance filter prior to analysis to remove non outcome-discriminating probe sets.
- the supervised analysis is performed by a decision-tree based algorithm.
- a decision-tree based algorithm Preferably this is a Random Forest algorithm.
- the unsupervised analysis is conducted by one or more of a decision tree algorithm, complete-linkage hierarchical clustering and/or principal component analysis (PCA) .
- PCA principal component analysis
- the group or refined group of probes sets is used to form a classifier of the outcome.
- the classifier receives data in the form of gene expression levels and predicts an outcome .
- genes to which probes bind that are related to the group of probe sets are used to diagnose the disease or disorder.
- genes to which probes bind that are related to the group of probe sets are used to predict a prognosis of a patient having the disease or disorder.
- genes to which probes bind that are related to the group of probe sets are used to identify biological functions related to the disease or disorder.
- the identified biological functions are used to affect the outcome in a subject suffering from the disease or disorder or to prevent the subject from developing the disease or disorder.
- the predictive relationship is between genes indicated by the probes to which the group of probes sets relate and an outcome of either regression or relapse of pre-B ALL.
- the genes are selected from the group consisting of SHARP, BICD2, NRBF2, OAZIN, DHX8, VAPA, QKI, SLCl8A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3, TEPl, GPD2, or a functional fragment thereof .
- the genes consist essentially of OAZIN, GLUL, BICD2, IGJ 1 or PLA2G6, or a functional fragment thereof. In other embodiments, the genes consist essentially of OAZIN, GLUL, and IGJ, or functional fragments thereof. In further embodiments, the genes consist essentially of OAZIN and JGJ " .
- the expression level of the gene related to the group of probes indicative of the outcome is used to produce a model which is predictive of the outcome or a course of treatment to achieve the outcome or a prophylactic course of treatment.
- a second aspect of the invention provides an apparatus for determining a predictive relationship between data related to gene expression levels and an outcome related to a disease or disorder, said apparatus comprising: a receiver of data comprising a plurality of probe sets, each probe set comprising a plurality of elements of data related to a gene, each data element reflecting a gene expression level indicated using the probe and each data element in each probe set being from a respective biological sample; an analyser for finding a group of one or more of the probe sets which have a predictive relationship to the outcome .
- the analyser is arranged to further analyse the group to find a refined group highly related to the outcome.
- the analyser is arranged to refine the membership of the refined group to include probe sets which are predictive of the outcome when the probe sets are collected by use of two or more different techniques to measure gene expression.
- the apparatus further comprises a verifier for verifying the relationship between the gene expression levels and the outcome.
- the verifier is typically configured to conduct an unsupervised clustering technique.
- the analyser is configured to conduct supervised analysis to rank the relevance of each probe set to the outcome .
- the analyser is configured to conduct further supervised analysis of the group to find the refined group.
- the apparatus comprises a storage means for storing the group and the refined group.
- a third aspect of the invention provides a classifier for classifying data comprising a plurality of levels of expression of predetermined genes of a biological sample, said classifier comprising: an analyser for predicting an outcome according to a predictive relationship between the data and the outcome, thereby classifying the data.
- the predetermined genes are determined according to the above method.
- the gene expression levels are determined according to the above method.
- the predetermined genes are SHARP, BICD2, NRBF2, OAZIN, DHX8, VAPA, QKI, SLC18A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3 , TEPl, GPD2, or a functional fragment thereof.
- the predetermined genes consist essentially of OAZIN, GLUL, BICD2, IGJ, or PLA2G6, or functional fragments thereof. In other embodiments, the genes consist essentially of OAZIN, GLUL, and JGJ " , or functional fragments thereof. In further embodiments, the genes consist essentially of OAZIN and IGJ.
- a fourth aspect of the invention provides a computer program comprising instructions to control a computer to conduct the method.
- a fifth aspect of the invention provides a computer program comprising instructions to control a computer to operate as the apparatus .
- a sixth aspect of the invention provides a computer program comprising instructions to control a computer to operate as the classifier.
- a seventh aspect of the invention provides a computer readable storage medium for storing any one of the computer programs.
- An eighth aspect of the invention provides a method of gene expression profiling comprising: obtaining gene expression level data reflecting expression levels of a large number of genes from a pool of biological samples; analysing the data to identify a plurality of gene expression levels predictive of an outcome in relation to a disease or disorder, thereby providing a gene expression profile.
- the method further comprises further analysing the group to find a refined group highly related to the outcome .
- the method further comprises refining the membership of the refined group to include probe sets which are predictive of the outcome when the probe sets are obtained by use of two or more different techniques to measure gene expression. [0047] In some embodiments the method further comprises refining the membership of the refined group to exclude probe sets which are not significantly predictive of the outcome when the probe sets are obtained by use of two or more different techniques to measure gene expression.
- the invention provides a method of expression profiling for pre-B ALL, comprising the steps of : (a) providing biological samples from individuals with or without pre-B ALL;
- step (d) performing a first supervised analysis on data obtained from step (c) , wherein said first supervised analysis will classify said individuals with or without pre-B ALL into distinct subgroups; and (e) performing a second supervised analysis on data obtained from step (d) , wherein said second supervised analysis will classify said individuals further.
- the step of measuring the expression levels of nucleic acid molecules is conducted by hybridization of the nucleic acid molecule to a DNA microarray or by qRT-PCR.
- the first and second supervised analysis are performed by a decision-tree based algorithm.
- the algorithm is a Random Forest algorithm.
- the present invention provides a method of prognosis or diagnosis of pre-B ALL, comprising the steps of: - Il ia) providing biological samples from individuals with or without pre-B ALL;
- step (d) performing a first supervised analysis on data obtained from step (c) , wherein said first supervised analysis will classify said individuals with or without pre-B ALL into distinct subgroups;
- step (e) performing a second supervised analysis on data obtained from step (d) , wherein said second supervised analysis will classify said individuals further.
- the step of measuring the expression levels involves measuring the expression levels of a group of 18 genes within a biological sample.
- the 18 genes have gene ID numbers selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398,
- the genes have the gene ID numbers 51582, 2752, 23299, 3512 or 8398. Most preferably, the genes have the gene ID numbers of 51582, 2752, and 3512.
- the step of measuring the expression levels of said genes is examined at the nucleic acid level or protein level.
- the present invention provides a method for prognosis or diagnosis of pre-B ALL, comprising the step of measuring the expression levels of a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof .
- the present invention provides a therapeutic or prophylactic composition for the treatment or prevention of pre-B ALL, comprising one or more of :
- the present invention provides a microarray for prognosis or diagnosis of pre-B ALL, comprising one or more genes, wherein said gene has a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof or antisense molecule thereto.
- the present invention provides a kit for prognosis or diagnosis of pre-B ALL, comprising a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof or antisense molecule thereto.
- the present invention provides a method for predicting the likelihood of relapse in an individual with pre-B ALL, comprising the step of measuring the expression level of a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof .
- the present invention provides a method of selecting an agent for the prevention and/or treatment of pre-B ALL in an individual, comprising:
- the present invention provides a method of screening for an agent capable of modulating the expression of one or more genes having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 comprising the steps of:
- step (c) contacting the gene(s) with a test agent and determining whether the level of expression changes relative to step (b) , wherein a change in the level of expression indicates that the agent is capable of modulating the expression of one or more of said genes .
- the present invention provides a method of screening for an agent capable of treating or preventing pre-B ALL, comprising the steps of:
- step (c) contacting the gene(s) with a test agent and determining whether the level of expression changes relative to step (b) , wherein a change in the level of expression indicates that the agent is capable of treating or preventing pre-B ALL.
- the level of at least one gene or fragment is increased compared to a normal level. In further embodiments, the level of at least one gene or fragment may be decreased compared to a normal level.
- the present invention demonstrates gene expression profiling using, for example, DNA microarray and supervised analysis can be used to classify subgroups of pre-B ALL, identify genes with differential expression in subsets of pre-B ALL patients, and identify potential therapeutic targets for pre-B ALL. For example, in some embodiments there are provided methods of diagnosis and prognosis for pre-B ALL or subgroups of pre-B ALL based on the expression of a group of 18 genes. There is also provided a method of pre-B ALL diagnosis based on the expression levels of 18 genes.
- the present invention provides a method for predicting the prognosis or diagnosis of pre-B ALL comprising the steps of: (a) providing a biological sample from a patient with or without pre-B ALL; (b) obtaining a sample of nucleic acid isolated from the sample in step (a) , wherein the nucleic acid is RNA or a cDNA copy of RNA; (c) determining the gene expression pattern of a panel of specific sequences comprising at least OAZIN and one further gene selected from the group consisting of SHARP, BICD2, NRBF2, , DHX8, VAPA, QKI, SLC18A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3, TEPl and GPD2 within each nucleic acid pool described in (c) have been predetermined to either increase or decrease in response to pre-B ALL, where the gene expression pattern comprises the relative level of mRNA or
- the panel of specific sequences comprises at least OAZJN and JGJ. In other embodiments, the panel of specific sequences comprises at least OAZIN and/or JGJ and/or GLUL. [0067] In another aspect of the present invention, there are provided methods of treatment for pre-B ALL. Such methods involve inhibiting or enhancing expression of genes that are found to be over-expressed or down- regulated respectively in individuals diagnosed with pre-B ALL as disclosed herein.
- the biological sample is cells, tissue, or fluid isolated from bone marrow, plasma, serum, spinal fluid, lymph fluid, the external sections of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, milk, whole blood, blood cells, tumours, organs, and also includes samples of in vivo cell culture constituents, including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, putatively virally infected cells, recombinant cells, and cell components.
- the biological sample is bone marrow or blood.
- Figure 1 is a matrix representation of probe sets showing discrimination of outcomes according to an embodiment of the present invention.
- Figure 2 is a matrix representation of probe sets showing discrimination of outcomes according to an embodiment of the present invention.
- Figure 3 is a plot of principal component analysis results.
- Figures 4A-4D are plots of Kaplan-Meier curves.
- Figures 5A-5B are plots of outcome discrimination .
- Figure 5C is a plot of a Kaplan-Meier curve.
- Figures 6A-6D are plots of Pearson's correlations results.
- Figures 7A-7B are plots of principal component analysis results .
- Figures 8A-8B are plots of Kaplan-Meier curves.
- Figures 9A-9F are differential expression plots.
- Substitute Sheet (Rule 26) RO/AU invention is not entitled to antedate such disclosure by virtue of prior invention.
- nucleic acid molecule includes a plurality of such molecules .
- a method of analysing the level of expression of a nucleic acid molecule, such as a gene, of a subject, such as a human being, in a test cohort Biological samples are obtained from each member of the test cohort.
- the test cohort comprises subjects exhibiting alternative outcomes of a disease or disorder, such as a proportion of the individuals in the cohort being in remission and a proportion having relapse of a disease, such as pre-B ALL.
- "outcome” means any one or more of a possible range of states including being healthy or being diagnosed as having a particular disease and a possible range of prognoses.
- disease as used herein is a general term which refers to any departure from health in which a subject suffers.
- a “disorder” refers to an abnormal functioning of a function or part of a body.
- the disease or disorder may be any disease or disorder associated with a different level of expression of a gene compared with that in a normal subject because altered levels of expression of a gene are known to be associated with a range of diseases and disorders, including but not limited to pre-B ALL.
- Pre-B ALL is a type of leukemia.
- Leukemia is a cancer involving the blood forming cells. Cancer means any malignant cell growth or tumour caused by abnormal and uncontrolled cell division.
- Data of gene expression levels is obtained from the test cohort. Determining the level of expression is typically conducted using a commercially available microarray comprising probes specific to a range of genes. Such a microarray will typically produce data for 33,000 genetic probes.
- the data from microarrays is generated by first hybridysing the test material (mRNA or cRNA) to the array platform. The degree of hybridisation to the elements on the microarray is detected by a scanner. The signals for each element on the microarray is the raw data from the assay and is subsequently processed to calculate expression levels of genes under test. The data is formatted to capture information on a plurality of probe sets. Each probe set corresponds to a particular gene.
- Each probe set comprises a set of data elements, where each data element contains an expression level (as measured by the amount of the test material that binds to the probe) for each one of a plurality of biological samples.
- each probe set is produced for each gene in the range testable by the microarray.
- Each probe set reflects a measure of the gene expression level of a particular gene for each of the biological samples of the test cohort.
- the gene expression levels in the probe sets are aligned so that the gene expression level of the genes of one of the biological samples is in the same position in each probe set .
- a variance filter is applied to the probe sets to identify genes which are expressed at outcome-discriminating levels.
- the variance filter eliminates all probe sets below a set fold-change and a predetermined p-value. For example probe sets with a fold change of less than 1.15 and a p- value of greater than 0.1 (999 permutations) are eliminated.
- a supervised analysis is performed on the remaining probe sets to rank the predictability of the outcomes based on the probe sets. Highly ranked probe sets will reflect a strong (highly predictive) relationship between one or more of the outcomes and the genes to which the corresponding probes bind.
- a decision-tree based algorithm is preferably used for the supervised analysis.
- the Random Forest algorithm is particularly useful for this analysis.
- the Random Forest Algorithm of Breiman and Cutler is described at the URL www. stat .berkeley.edu/ ⁇ breiman/RandomForests/cc home .htm and is used in, for example, Zhang H, Yu CY, Singer B, et al: Recursive partitioning for tumor classification with gene expression microarray data.
- Random Forest is a decision based algorithm. RF analysis typically consists of 100,000 trees. For each tree, the intrinsic RF reiterative process randomly chooses a subset of samples and probe sets for initial analysis, and subsequently uses the remaining samples for testing back. All probe sets used for RF analysis are ranked according to their ability to discriminate between groups of interest . For each sample a classification accuracy is obtained, along with a measure of confidence.
- the Random Forest algorithm can rank the probe sets so that a group is identified by for example, selecting the top 200 and cutting off the remainder. This has the effect of selecting the number of potentially relevant probe sets from about between 1,000 and several thousand down to about between 20 and 200 (depending on the number of members in the group) to produce a group of thinned probe sets .
- An unsupervised analysis is conducted to determine grouping of the probes sets to confirm outcome predictability.
- a decision tree algorithm, complete- linkage hierarchical clustering and/or principal component analysis (PCA) can be used.
- Various clustering methods are in use and they are designed to find groups of specimens under test that show similar patterns of gene expression. They are referred to as "unsupervised” clustering methods if the analysis is solely using the expression data from the microarray, contrasting with “supervised” clustering methods where independent information on the specimens is incorporated, e.g. belonging to a particular outcome group (relapse or remission) .
- the RF method is a supervised learning algorithm.
- Principal Component Analysis is an established technique for the unsupervised analysis of microarray data and has been extensively used as the algorithm of choice to visualize microarray data.
- A. classifier is then formed from -the thinned probe sets.
- the classifier is used to predict the outcome based on expression levels of the genes related to the thinned probe sets in a given sample.
- the thinned probe sets can be further refined to form a refined group of probe sets prior to formation of the classifier. Again a supervised analysis is performed on the remaining probe sets to rank the predictability of the outcomes. A decision-tree based algorithm and in particular the Random Forest algorithm is used.
- the outcome prediction accuracy of the refined group of probe sets is compared to the prediction accuracy of the further refined group of probe sets. Where prediction accuracy is better the further refined group of probe sets can be used.
- the number of genes indicated in the further refined group of probe sets can be varied and the accuracy reassessed to determine the optimum number of probes sets that more accurately predict the outcome . Where different techniques are used to measure gene expression level, such as qRT-PCR, the optimum number of probe sets that more accurately predict the outcome for a desired technique can be determined.
- the result is a relatively low number of probe sets which show a high accuracy in predicting the outcome of the disease or disorder.
- This refined group of probes sets indicates which genes to measure expression levels of in order to predict the outcome. These genes and their expression levels which are indicative of a particular outcome are referred to as a gene expression signature (GES) .
- GES gene expression signature
- a GES may be formed for each possible outcome.
- the expression level of each gene of the GES of subjects in the validation cohort can be determined by an appropriate measurement technique (eg qRT-PCR) .
- the classifier integrates the expression levels in the GES (s) in order to predict the outcome of each subject in the validation cohort with a significant likelihood of success .
- Survival analysis is a form of regression that models subjects' time to an event (such as death or hospitalization) . Some subjects will not achieve the event of interest while under follow-up but are still included in the analysis and are said to be “censored” when their follow-up ends.
- Cox proportional hazards regression is a non-parametric type of survival analysis which assumes that there is a risk function over time and that modelled factors such as sex or gene expression affect this baseline risk in a proportional manner (eg males may be consistently 1.3 times more likely to die than females) .
- the GES can be used in a classifier which receives the gene expression levels as data and outputs a predicted outcome or a list of predicted outcomes along with an indication of the likelihood of that outcome.
- the identified genes can also be used to perform a gene ontology functional analysis to identify patterns in the function of the genes to understand the disease or disorder at a macro genetic level .
- the types of outcome that can be predicted include diagnosis of a disease or disorder and prognosis of a patient having a disease or disorder. However it is also possible to then apply the predicted outcome to determine a suitable treatment regime for a patient with a particular disease or disorder and/or preventative action to reduce the likelihood of a patient contracting a disease or disorder or advancement of a disease or disorder.
- the classifier may be implemented in the form of a computer running a computer program.
- the analysis for obtaining the GES may be implemented in the form of a computer running a computer program.
- the expression level of a nucleic acid molecule described herein means both whether the nucleic acid molecule is expressed or not, as well as, the extent to which it is expressed.
- the expression level may be increased compared to the level in a normal subject.
- the expression levels of IGJ and PLA2G6 were found by the inventors to be increased in poor outcome patients.
- the expression level of a nucleic acid molecule may be decreased compared to the level in a "normal" subject.
- the level of expression of a nucleic acid molecule may be determined by any means known in the art and may be determined directly or indirectly. For example, the amount of RNA corresponding to a gene may be determined. Alternatively the amount of protein encoded by the gene may be determined.
- the determination of the level of expression of one or more nucleic acid molecules can be used in the prognosis, diagnosis, or treatment of a disease or disorder. For example, determining the likelihood that an individual will regress following treatment for pre-B ALL may be used to predict whether the individual will survive the disease and may also be used to effect the most appropriate treatment for the individual .
- prognosis means a prediction of the course and outcome of a disease or disorder or the likelihood of a subject developing a disease or disorder. For example, depending upon the level of expression of a gene of the invention, the subject can be identified as likely to develop pre-B ALL and/or classified as likely to suffer a recurrence of the disease.
- diagnosis means the process of identifying a disease or disorder, such as by its symptoms, laboratory tests (including genotypic tests), and physical findings. The identification of the level of expression of a nucleic acid molecule of the invention can be used in the diagnosis of a disease associated with the gene .
- a panel of genes or nucleic acid molecules which can be used in the prognosis, diagnosis, and/or treatment of pre-B ALL.
- Pre-B ALL is a type of leukaemia.
- Leukaemia is a cancer involving the blood forming cells . Cancer means any malignant cell growth or tumour caused by abnormal and uncontrolled cell division.
- Acute leukaemia is the most common malignancy in children and it is caused by the clonal proliferation of hematopoietic cells (usually white blood cells) .
- white blood cells usually white blood cells
- abnormal immature white cells increase greatly and invade other tissues and organs. These white cells are not able to function at their normal task of fighting disease and they impact on the function of normal cells which makes the leukemic child vulnerable to infection or haemorrhage.
- ALL acute lymphoblastic leukaemia
- AML acute myeloid leukaemia
- the term "diagnosis” means the process of identifying pre-B ALL by its symptoms, via laboratory tests (including immunophenotyping and genotypic tests) or through physical findings.
- the identification of the level of expression of a nucleic acid molecule of the invention can be used in the diagnosis of a disease associated with the gene.
- prognosis or “prognostic marker” shall be taken to mean an indicator of the likelihood of progression of pre-B ALL diagnosed in an individual or the likelihood of an individual developing pre-B ALL.
- prognostic marker shall be taken to mean an indicator of the likelihood of progression of pre-B ALL diagnosed in an individual or the likelihood of an individual developing pre-B ALL.
- an individual might be identified as likely to develop pre-B ALL and/or classified as likely to suffer a relapse of the disease.
- predictive marker shall be taken to mean an indicator of response to therapy, said response is preferably defined according to patient survival and/or relapse of disease. It is preferably used to define patients with high, low and intermediate length of survival after treatment that is the result of the inherent heterogeneity of the disease process .
- biological samples are obtained.
- the terms "obtaining or providing a biological sample” or “obtaining or providing a sample from an individual”, shall not be taken to include the active retrieval of a sample from an individual, (e.g., the performance of a biopsy) .
- Said terms shall be taken to mean the obtainment of a sample previously isolated from an individual.
- Said samples may be isolated by any means standard in the art, including but not limited to biopsy, surgical removal, body fluids .isolated by means of aspiration.
- said samples may be provided by third parties including but not limited to clinicians, couriers, commercial sample providers and sample collections .
- biological sample includes any biological material of an individual .
- the biological sample is cells, tissue, or fluid isolated from bone marrow, plasma, serum, spinal fluid, lymph fluid, the external sections of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, milk, whole blood, blood cells, tumours, organs, and also includes samples of in vivo cell culture constituents, including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, putatively virally infected cells, recombinant cells, and cell components.
- the biological sample is bone marrow or blood.
- the biological sample may be tested using the techniques described herein directly after isolation or alternatively further processed in order to increase the quality of the data produced.
- the selective expansion of cells is useful to induce proliferation and generate a "cell line" in which the frequency of the relevant cells are log scale greater than the same cells in a biological sample directly isolated from a subject.
- the literature has also shown that, if required, the cells can be further concentrated and purified by cloning the specific cells.
- an individual is screened in order to determine the expression level of a gene or nucleic acid molecule associated with pre-B ALL diagnosis or prognosis.
- expression level when used in reference to nucleic acid molecules or genes described herein shall be taken to mean the transcription and/or translation of a gene or nucleic acid molecule, as well as the genetic or the epigenetic modifications of the genomic DNA associated with the gene and/or regulatory or promoter regions thereof. Genetic modifications include SNPs, point mutations, deletions, insertions, repeat length, rearrangements and other polymorphisms.
- the expression level of a gene or nucleic acid molecule may be determined by the analysis of any factors associated with or indicative of the level of transcription and translation of a gene including, but not limited to, methylation analysis, loss of heterozygosity
- RNA expression levels (hereinafter also referred to as LOH) , RNA expression levels and protein expression levels.
- the term "expression level" also encompasses the absence of expression.
- the expression level may be increased compared to the level in an individual not having pre-B ALL.
- the individual not having pre-B ALL may have a disease or disorder other than ore-B ALL or may be an apparently healthy individual.
- the individual will have expression levels of at least one of the nucleic acid molecules of the invention within normal, typical and/or average levels.
- the expression levels of IGJ and PLA2G6 were found by the inventors to be increased in individuals that had a poor prognosis.
- the expression level of a nucleic acid molecule may be decreased compared to the level in an individual not having pre-B ALL.
- the inventors found that the expression levels of OAZIN, GLUL, and BICD2 were decreased in individuals that had a poor prognosis.
- nucleic acid molecule means a DNA or RNA molecule .
- a “gene” means a length of DNA which encodes a particular protein or RNA molecule. Nucleic acid molecules and genes disclosed herein may or may not include the 5 1 and 3 1 untranslated regions of the DNA.
- RNA may be any class of RNA, including messenger RNA (mRNA) , transfer RNA (tRNA) , or ribosomal RNA (rRNA) , as the sequence of RNA corresponds to that of DNA.
- mRNA messenger RNA
- tRNA transfer RNA
- rRNA ribosomal RNA
- the nucleic acid molecules may be single- stranded or double-stranded, or antisense DNA or RNA molecules, and includes functional fragment (s) of the nucleic acid molecule and antisense molecules thereto.
- a "double-stranded DNA or RNA molecule” refers to the polymeric form of deoxyribonucleotides (adenine, guanine, thymine, cytosine, or uridine) in a double- stranded helix. This term refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary structure. Thus, this term includes double-stranded DNA and RNA found inter alia in linear DNA or RNA molecules (eg restriction fragments) , viruses, plasmids, and chromosomes.
- sequences may be described herein according to the normal convention of giving only the sequence in the 5 ' to 3 ' direction along the non-transcribed strand of DNA or RNA, eg the strand having a sequence homologous to the mRNA.
- Nucleotide sequences of preferred embodiments of the invention may include sequences that differ by one or more nucleotide substitutions, additions or deletions, such as allelic variants, and will also include sequences that differ due to the degeneracy of the genetic code, provided that the different sequence which retains a function of the starting molecule, for example, is expressed in an individual having pre-B ALL at a different level to that in an individual not having pre-B ALL.
- fragment of a molecule means a portion of the entire molecule.
- the size of the fragment is limited only in that it must have expression levels in an individual having pre-B ALL that is different when compared to the expression level in an individual not having pre-B ALL.
- the fragments may be functional fragments.
- a "functional fragment" of a molecule is one which retains a function of the full- length molecule, for example, is expressed in an individual having pre-B ALL at a different level to that in an individual not having pre-B ALL.
- nucleic acid molecule can encode a polypeptide.
- encode or “encoded” refer generally to a nucleic acid sequence being present in a translatable form.
- An "antisense” molecule is also considered to encode a polypeptide sequence, since the same informational content is present in a readily accessible form, especially when linked to a sequence which promotes expression of the sense strand.
- Nucleic acid molecules disclosed herein include those involved in cell communication, cell growth and/or maintenance, transcription, and metabolism, for example, RNA, DNA, phosphate, and protein metabolism.
- the expression level of a nucleic acid molecule of the preferred embodiments of invention may be determined by any means known in the art and may be determined directly or indirectly. For example, the amount of RNA corresponding to a gene may be determined. Alternatively the amount of protein encoded by the gene may be determined.
- the expression level of a gene is determined by detecting the amount of mRNA corresponding to the gene, this may be achieved by techniques such as reverse transcription polymerase chain reaction (RT-PCR) , Northern blot analysis, and an RNase protection assay.
- RT-PCR reverse transcription polymerase chain reaction
- PCR Polymerase chain reaction
- the PCR method involves repeated cycles of primer extension synthesis in the presence of PCR reagents, using two oligonucleotide primers capable of hybridizing preferentially to a template nucleic acid.
- the primers used in the PCR method will be complementary to nucleotide sequences within the template at both ends of or flanking the nucleotide sequence to be amplified, although primers complementary to the nucleotide sequence to be amplified also may be used. See Wang, et al .
- PCR may also be used to determine whether a specific sequence is present, by using a primer that will specifically bind to the desired sequence, where the presence of an amplification product is indicative that a specific binding complex was formed. Detection of mRNA having the subject sequence is indicative of the level of expression of the gene.
- the amplified sample can be fractionated by electrophoresis, e.g. capillary or gel electrophoresis, transferred to a suitable support, e.g. nitrocellulose, and then probed with a fragment of the template sequence .
- electrophoresis e.g. capillary or gel electrophoresis
- Oligonucleotide primers are short-length, single- or double-stranded polydeoxynucleotides that are chemically synthesised by known methods (involving, for example, triester, phosphoramidite, or phosphonate chemistry) , such as described by Engels, et al . , Agnew. Chem. Int. Ed.
- Oligonucleotide primers and probes of the invention are DNA molecules that are sufficiently complementary to regions of contiguous nucleic acid residues within the allergy-associated gene nucleic acid to hybridise thereto, preferably under high stringency conditions. Defining appropriate hybridisation conditions is within the skill of the art. See eg., Maniatis et al . , DNA Cloning, vols . I and II. Nucleic Acid Hybridisation. However, briefly,
- stringent conditions for hybridisation or annealing of nucleic acid molecules are those that (1) employ low ionic strength and high temperature for washing, for example, 0.015M NaCl/0.0015M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50°C, or (2) employ during hybridisation a denaturing agent such as formamide, for example, 50% (vol/vol) formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1% polyvinylpyrrolidone/50mM sodium phosphate buffer at pH 6.5 with 75OmM NaCl, 75mM sodium citrate at 42°C.
- formamide for example, 50% (vol/vol) formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1% polyvinylpyrrolidone/50mM sodium phosphate buffer at pH 6.5 with 75OmM NaCl, 75mM sodium citrate at 42°C.
- formamide for example, 50% (vol/vol) formamide with 0.1% bovine
- Exemplary primers and probes include oligonucleotides that are at least about 15 nucleic acid residues long and that are selected from any 15 or more contiguous residues of DNA.
- oligonucleotides primers and probes used in some embodiments of the invention are at least about 20 nucleic acid residues long.
- the invention also contemplates oligonucleotide primers and probes that are 150 nucleic acid residues long or longer.
- PCR reagents refers to the chemicals, apart from the template nucleic acid sequence, needed to perform the PCR process. These chemicals generally consist of five classes of components: (i) an aqueous buffer, (ii) a water soluble magnesium salt, (iii) at least four deoxyribonucleotide triphosphates (dNTPs) , (iv) oligonucleotide primers
- a polynucleotide polymerase preferably a DNA polymerase, more preferably a thermostable DNA polymerase, ie a DNA polymerase which can tolerate temperatures between 90°C and 100°C for a total time of at least 10 minutes without losing more than about half its activity.
- the four conventional dNTPs are thymidine triphosphate (dTTP) , deoxyadenosine triphosphate (dATP) , deoxycitidine triphosphate (dCTP) , and deoxyguanosine triphosphate (dGTP) .
- dTTP thymidine triphosphate
- dATP deoxyadenosine triphosphate
- dCTP deoxycitidine triphosphate
- dGTP deoxyguanosine triphosphate
- These conventional deoxyribonucleotide triphosphates may be supplemented or replaced by dNTPs containing base analogues which Watson- Crick base pair like the conventional four bases, e.g. deoxyuridine triphosphate (dUTP) .
- dUTP deoxyuridine triphosphate
- a detectable label may be included in an amplification reaction.
- Biotin-labelled nucleotides can be incorporated into DNA or RNA by such techniques as nick translation, chemical and enzymatic means, and the like.
- the biotinylated primers and probes are detected after hybridisation, using indicating means such as avidin/streptavidin, fluorescent-labelling agents, enzymes, colloidal gold conjugates, and the like.
- Nucleic acids may also be labelled with other fluorescent compounds, with immunodetectable fluorescent derivatives, with biotin analogues, and the like.
- Nucleic acids may also be labelled by means of attachment to a protein. Nucleic acids cross-linked to radioactive or fluorescent histone single-stranded binding protein may also be used.
- oligonucleotide primers and probes there are other suitable methods for detecting oligonucleotide primers and probes and other suitable detectable labels that are available for use in the practice of the present invention.
- fluorescent residues can be incorporated into oligonucleotides during chemical synthesis.
- oligonucleotides primers and probes of the invention are labelled to render them readily detectable.
- Detectable labels may be any species or moiety that may be detected either visually or with the aid of an instrument.
- Suitable labels include fluorochromes, eg.
- fluorescein isothiocyanate FITC
- rhodamine Texas Red
- phycoerythrin allophycocyanin
- 6-carboxyfluorexcein 6- FAM
- 2 2 1 , 7'-dimethoxy-4', 5'-dichloro-6-carboxyfluorescein (JOE)
- 6-carboxy-X-rhodamine ROX
- 6-carboxy-2 ⁇ 4 1 , 7 1 , 4 , 7- hexachlorofluorescein HEX
- 5-carboxyfluorescein 5-FAM
- N,N,N',N'-tetramethyl-6-carboxyrhodamine TAMRA
- radioactive labels eg.
- Another group of fluorescent compounds are the naphthylamines, having an amino group in the alpha or beta position. Included among such naphthylamino compounds are 1-dimethylaminonaphthyl-5-sulfonate, l-anilino-8- naphthalene sulfonate and 2-p-touidinyl-6 -naphthalene sulfonate.
- dyes include 3 -phenyl-7 - isocyanatocoumarin, acridines, such as 9- isothiocyanatoacridine acridine orange; N- (p- (2- benzoaxazolyl) phenyl) maleimide ; benzoxadiazoles , stilbenes, pyrenes, and the like.
- the fluorescent compounds are selected from the group consisting of VIC, carboxy fluorescein (FAM) , Lightcycler ® 640, and Cy5.
- the label may be a two stage system, where the amplified DNA is conjugated to biotin, hapten, or the like having a high affinity binding partner, e.g. avidin, specific antibodies, etc, where the binding partner is conjugated to a detectable label.
- the label may be conjugated to one or both of the primers.
- the pool of nucleotides used in the amplification is labelled, so as to incorporate the label into the amplification product.
- RT-PCR is a form of PCR which can amplify a known mRNA sequence using a reverse transcriptase to convert the mRNA to cDNA prior to traditional PCR.
- aliquots are removed from the PCR every couple of cycles beginning at a point where product is undetectable (typically about cycle 20) and extending through the entire exponential phase. Products are then resolved electrophoretically and quantitated by densitometry, fluorescence or phosphorimaging .
- a fluorescent signal can be used to report formation of PCR product as each cycle of the amplification proceeds, coupled with an automated PCR/fluorescent detection system (Heid et al. , 1996, Genome Res.; 6:986- 994) .
- Suitable detection systems for RT-PCR include SYBR Green (Molecular Beacons) , Scorpions (Molecular Probes), and TaqMan ® (Applied Biosystems) .
- the invention utilises a combined PCR and hybridisation probing system so as to make the most of the closed tube or homogenous assay systems such as the use of FRET probes as disclosed in US patents (Nos 6,140,054; 6,174,670), the entirety of which are also incorporated herein by reference.
- the FRET or "fluorescent resonance energy transfer" approach employs two oligonucleotides which bind to adjacent sites on the same strand of the nucleic acid being amplified.
- One oligonucleotide is labelled with a donor fluorophore which absorbs light at a first wavelength and emits light in response, and the second is labelled with an acceptor fluorophore which is capable of fluorescence in response to the emitted light of the first donor (but not substantially by the light source exciting the first ? donor, and whose emission can be distinguished from that of the first fluorophore) .
- the second or acceptor fluorophore shows a substantial increase in fluorescence when it is in close proximity to the first or donor fluorophore, such as occurs when the two oligonucleotides come in close proximity when they hybridise to adjacent sites on the nucleic acid being amplified (for example in the annealing phase of PCR) forming a fluorogenic complex.
- the method allows detection of the amount of product as it is being formed.
- one of the labelled oligonucleotides may also be a primer used for PCR.
- the labelled PCR primer is part of the DNA strand to which the second labelled oligonucleotide hybridises, as described by Neoh et al., 1999, J Clin Path; 52:766-769, von Ahsen et al., 2000, Clin Chew; 46:156-161, the entirety of which are incorporated by reference.
- amplification and detection of amplification with hybridisation primers and probes can be conducted in two separate phases, for example by carrying out PCR amplification first, and then adding hybridisation probes under such conditions as to measure the amount of nucleic acid which has been amplified.
- one embodiment of the present invention utilises a combined PCR and hybridisation probing system so as to make the most of the closed tube or homogenous assay systems and is carried out on a Roche Lightcycler ® or other similarly specified or appropriately configured instrument.
- Such systems would also be adaptable to the detection methods described here.
- probes can be used for allele discrimination if appropriately designed for the detection of point-mutation (s) , in addition to deletion(s) and insertion (s) .
- the unlabelled PCR primers may be designed for allele discrimination by methods well known to those skilled in the art (Ausubel 1989-1999) .
- detection of amplification in homogenous and/or closed tubes can be carried out using numerous means in the art, for example using TaqMan® hybridisation probes in the PCR reaction and measurement of fluorescence specific for the target nucleic acids once sufficient amplification has taken place.
- nucleic acid amplification/detection systems such as those based on the TaqMan approach (US patent Nos 5,538,848 and 5,691,146), fluorescence polarisation assays (eg Gibson et al . , 1997, Clin Chem. , 43: 1336-1341), and the Invader assay (eg Agarwal et al., 2000, Diagn MoI Pathol., 9(3): 158-164; Ryan et al . , 1999, MoI Diagn., 4(2) : 135-144) .
- Such systems would also be adaptable for use in the invention described, enabling real-time monitoring of nucleic acid amplification.
- Northern blot analysis involves fractionating RNA species on the basis of size by denaturing gel electrophoresis followed by transfer of the RNA onto a membrane by capillary, vacuum or pressure blotting (Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, 2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY) .
- the RNA may be bound to the membrane in an apparent non- covalent interaction via exposure to short wave ultraviolet light or by heating at 80 0 C in a vacuum oven.
- RNA sequences of interest are detected on the blot by hybridization to an oligonucleotide probe.
- Probes for Northern blot detection generally contain full or partial cDNA sequences and may be labelled by enzymatic incorporation of radiolabeled (usually 32 P or 33 P) nucleotides or with nucleotides conjugated to haptens such as biotin for subsequent chemiluminescent detection. After probe hybridization and washing to remove non-specific label, the hybridization signal is generally detected by exposing blots to X-ray film or phosphor storage plates, after prior incubation with chemiluminescent substrates if necessary. The resulting band identified by the probe indicates the size of the mRNA, and the intensity of the band corresponds to the relative abundance. Autoradiograph band intensities may be quantitated by densitometry, by direct measurement of hybridized radiolabeled probe via storage phosphor imaging or by scintillation counting of excised bands.
- RNase protection assay operates on the same principle as a Northern blot as it involves hybridization of a labelled probe to a target mRNA. However in the RPA, hybridization takes place in a solution containing both a labelled antisense RNA probe and the target mRNA without prior gel fractionation or blotting (Azrolan & Breslow, 1990, J " . Lipid Res., 31:1141-1146; Sambrook et al . , 1989, supra).
- RNA levels can be determined.
- a microarray is a tool for analysing gene expression and typically consists of a small membrane or glass slide onto which samples of many nucleic acid and/or protein molecules have been arranged in a regular pattern.
- Microarrays are particularly useful for directly or indirectly detecting the level of expression of a nucleic acid molecule of an individual .
- a microarray as disclosed in embodiments of the present invention may have DNA, RNA, and/or protein applied to a solid matrix in predetermined locations and in such a way that mutations can be detected in the one or more molecule (s) .
- oligonucleotide probes for nucleic acid molecules disclosed herein can be deposited or synthesised at predetermined locations on a glass slide or other support .
- Messenger RNA or cRNA isolated from a biological sample obtained from an individual can be added to the probes under conditions which allow binding between the probes and mRNA sequences if present in the biological sample.
- mRNA from an individual may be applied to the slide before adding probes for nucleic acid molecules disclosed herein. Binding between the probes and DNA can be detected by any means known in the art and specific binding between the DNA and a probe indicates the DNA is expressed in the subject.
- hybridisation conditions include those provided by Sambrook et al., infra.
- stringent conditions for hybridisation or annealing of nucleic acid molecules are those that (1) employ low ionic strength and high temperature for washing, for example, 0.015M NaCl/0.0015M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50°C, or
- a denaturing agent such as formamide, for example, 50% (vol/vol) formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1% polyvinylpyrrolidone/50mM sodium phosphate buffer at pH 6.5 with 75OmM NaCl, 75mM sodium citrate at 42°C.
- protein isolated from a biological sample from an individual may be applied to a plastic slide.
- protein refers to peptides, proteins, and polypeptides.
- Labelled antibodies may be applied to the protein under conditions which allow binding between an antibody and the protein. Binding between the protein and antibody indicates that the nucleic acid molecule encoding the protein is expressed in the individual.
- the expression level of a gene is determined by detecting the amount of protein encoded by the gene, this may be achieved by ELISA, immunohistochemical staining, or flow cytometry (such as fluorescent activated cell sorting) .
- Enzyme-linked Immunosorbent Assays combine the specificity of antibodies with the sensitivity of simple enzyme assays, by using antibodies coupled to an easily-assayed enzyme.
- ELISAs can provide a useful measurement of antigen or antibody concentration and can be used to detect the presence of a protein encoded by a gene of the invention and recognized by an antibody.
- One of the most useful of the immunoassays is the two antibody "sandwich" ELISA. This assay is used to determine the concentration of a protein in a sample and can determine the absolute amount of protein in the sample.
- the sandwich ELISA requires two antibodies that bind to epitopes that do not overlap on the protein. This can be accomplished with either two monoclonal antibodies that recognize discrete sites or one batch of affinity-purified polyclonal antibodies.
- one antibody (the “capture” antibody) is purified and bound to a solid phase, typically attached to the bottom of a plate well.
- the sample is then added and protein in the sample is allowed to complex with the bound antibody. Unbound products are then removed with a wash, and a labelled second antibody (the “detection” antibody) is allowed to bind with the bound protein, thus completing the "sandwich” .
- the assay is then quantitated by measuring the amount of labelled second antibody bound to the matrix, through the use of a colorimetric substrate .
- the expression level of a nucleic acid molecule as disclosed herein may be determined in a cell or tissue i.e. an aggregate of cells, using techniques such as immunohistochemical staining techniques .
- immunohistochemical staining techniques a cell sample is prepared, typically by dehydration and fixation, followed by reaction with labelled antibodies specific for the gene product coupled, where the labels are usually visually detectable, such as enzymatic labels, fluorescent labels, luminescent labels, and the like.
- a particularly sensitive staining technique suitable for use in the present invention is described by Hsu et al . , 1980, Am. J. Clin.- Path., 75:734-738.
- Antibodies useful for immunohistochemical staining may be either monoclonal or polyclonal. Conveniently, the antibodies may be prepared against a synthetic peptide based on the protein or peptide encoded by genes or nucleic acid molecules of embodiments of the invention.
- FACS Fluorescence Activated Cell Sorting
- flow cytometry are herein used interchangeably. FACS is a powerful method used to study cells. Individual cells held in a thin stream of fluid are passed through one or more laser beams cause light to scatter and fluorescent dyes to emit light at various frequencies. Photomultiplier tubes (PMT) convert light to electrical signals and cell data is collected. Cell sub- populations are identified and defined at high purity (-100%) .
- Fluorescent labelling allows investigation of cell structure and function, including the determination of whether a particular gene is being expressed by the cell.
- the expressed protein or mRNA is labelled with fluorescent dyes, typically using antibodies which specifically bind the protein or mRNA, and FACS collects the fluorescence signals in one to several channels corresponding to different laser excitation and fluorescence emission wavelength.
- Immunofluorescence the most widely used application, involves the staining of cells with antibodies conjugated to fluorescent dyes such as fluorescein and phycoerythrin. This method is often used to label molecules on the cell surface, but antibodies can be directed at targets in cytoplasm.
- an antibody to a molecule of the invention is directly conjugated to a fluorescent dye. Cells are stained in one step. In indirect immunofluorescence the primary antibody is not labelled. A second fluorescently-conjugated antibody is added which is specific for the first antibody.
- Microfluidic technology may also be used in the analysis of proteins (Figeys et al . , 1998, Anal Chem. ,
- microfluidics can be linked with a mass spectrometric analysis of proteins or peptides.
- peptides can be adsorbed onto hydrophobic membranes, desalted, and through the use of microfluidics eluted in a controlled manner to allow the direct mass spectrometric analysis of picomole amounts of peptides by electrospray ionisation mass spectrometry procedures (Lion et al., 2003, J. Chromatogr. A., 1003:11-19).
- Combinatorial peptidomics (Soloviev et al . , 2003, J Nanobiotechnology, 1:4) may also be used with integrated microfluidic systems.
- Methods of the invention may further comprise the step of validating the results obtained by any one of the previous methods by RT-PCR, which is described above.
- the gene(s) and/or nucleic acid molecules associated with pre-B ALL can be identified.
- genes in one or more lists of genes associated with pre-B ALL can be measured, and those measurements are used, either alone or with other parameters, to assign the individual into a particular risk category.
- gene expression levels can be correlated with intrinsic disease biology and/or etiology to define intrinsically related groups.
- Gene or nucleic acid molecule expression levels can be displayed in a number of ways. A common method is to arrange a ratio matrix into a graphical dendogram where columns indicate test samples and rows indicate genes
- Analysis of the expression levels can also be conducted by comparing intensities generated in qPCR or microarrays . This is best done by generating a ratio matrix of the expression intensities of genes in a test sample versus those in a control sample. For instance, the gene expression intensities from a tissue from a pre-B ALL patient can be compared with the expression intensities generated from tissue obtained from an individual not having pre-B ALL. A ratio of these expression intensities indicates the fold-change in gene expression between the test and control samples .
- Microarray technology allows for the measurement of the steady-state mRNA level of thousands of genes simultaneously thereby presenting a powerful tool for identifying expression levels.
- Two microarray technologies are currently in wide use. The first are cDNA arrays and the second are oligonucleotide arrays. Although differences exist in the construction of these chips, essentially all downstream data analysis and output are the same.
- the product of these analyses are typically measurements of the intensity of the signal received from a labelled probe used to detect a cDNA sequence from the sample that hybridizes to a nucleic acid sequence at a known location on the microarray. Typically, the signal intensity is proportional to the cDNA quantity, and thus mRNA, expressed in the sample cells. A large number of such techniques are available and useful.
- biotinylated cRNA is prepared and hybridized to a microarray such as Affymetrix HG-U133A oligonucleotide microarrays (Affymetrix, Santa Clara, CA) by methods such as those described in Hoffmann et al., 2005, MoI Biotechnol . , 29:31-8.
- Array images can then be reduced to intensity values for each probe (CEL files) using, for example, Affymetrix MAS 5.0 software.
- Expression measures are extracted using robust multi-array analysis (RMA) for example as described in Irizarry et al., 2003, Nucleic Acids Res., 31:el5 and Dallas et al . , 2005, BMC Genomics., 6:59.
- RMA robust multi-array analysis
- a variance filter can be applied to eliminate all data with a fold-change ⁇ 1.15 and a p-value >0.1 (999 permutations) .
- Fold-change is measured by dividing expression levels for relapse sample by CCR samples, both being normalised.
- the Random Forest Algorithm of Breiman & Cutler is described at the URL www. stat . berkeley . edu/ ⁇ breiman/RandomForests/cc home .htm and is used in, for example, Zhang et al . , 2001, Proc Natl Acad Sci USA., 98:6730-5 and Beesley et al., 2005, Br J Haematol., 131: 447-56.
- the Random Forest algorithm can for example reduce the number of potentially relevant probes from about between 1,000 and several thousand to about between 20 and 200.
- target genes e.g. top-ranked 200 genes
- secondary RF analyses various combinations of target genes (e.g. top-ranked 200 genes) can be selected and subjected to secondary RF analyses.
- An unsupervised analysis can also be conducted to determine grouping of the probes or genes to confirm outcome predictability.
- a decision tree algorithm complete-linkage hierarchical clustering and/or principal component analysis (PCA) can be used.
- PCA principal component analysis
- a classifier can be formed from the reduced number of target genes.
- the classifier is used to predict the likely outcome of treatment and the like eg prognosis, based on the expression levels of the genes in the reduced list of target genes.
- the reduced list of target genes can be further refined prior to formation of the classifier. ' Again a supervised analysis is performed on the list of target genes to rank the predictability of the outcomes based on the probes or the genes to which the probes relate. A decision-tree based algorithm and in particular the Random Forest algorithm is used.
- the outcome prediction accuracy of the further refined list of target genes can then be compared to the prediction accuracy of the original target gene list.
- the number of genes indicated in the further refined target gene list can be varied and the accuracy reassessed to determine the optimum number of probes. Where different techniques are used to measure gene expression level, such as qRT-PCR, the optimum number of probes that more accurately predict the outcome for a desired technique will need to be ascertained.
- the result is a relatively low number of gene targets, which show a high accuracy in predicting the outcome of pre-B ALL.
- This refined group of genes indicates which gene expression levels predict the outcome.
- This refined list of target genes and their expression levels is referred to as a gene expression signature (GES) .
- a GES may be formed for each possible outcome .
- the expression level of each gene of the GES of individuals in a validation cohort can be determined by an appropriate measurement technique (eg qRT-PCR) .
- the classifier can compare the expression levels of the refined list of target genes to the expression levels in the GES (s) in order to predict the outcome of each subject in the validation cohort with a significant likelihood of success.
- results can be confirmed by, for example, using a multivariate Cox regression analysis (Cox, 1972, J. R. Stat. Soc. B., 34:187-220).
- a refined list of target genes or nucleic acid molecules identified as being associated with pre-B ALL outcome by the methods disclosed herein are selected from the group consisting of WWOX, EIF3S5, MYST2, EIF3S10, GABARAP, GLUL, DDX24, PSMFl, SNRP70, PRNP, ANXA4 , JUNB, SUPT4H1, PPIF, PPIF, EIF'4Al , DUSP3, CALDl, OAZIN, PON2, CPD, KIF5B, SHARP, SHARP, CHERP, HBXIP, APG5L, DPMI,
- nucleic acid molecules of the invention are OAZIN, GLUL, BICD2, IGJ, and PLA2G6.
- nucleic acid molecules of the invention are OAZIN, GLUL, and JGJ.
- OAZIN ornithine decarboxylase antizyme inhibitor regulates ornithine decarboxylase, the rate limiting enzyme in polyamine synthesis, and the enhanced expressed of both OAZIN and ornithine decarboxylase has been detected in tumour tissues.
- GLUL glutamine synthetase
- GLUL glutamine synthetase
- IGJ immunoglobulin J chain
- the refined list of target genes or nucleic acid molecules of the invention or functional fragments thereof or antisense molecules thereto are located on a microarray.
- the microarray may have two or more of WWOX, EIF3S5, MYST2, EIF3S10, GABARAP, GLUL, DDX24, PSMFl, SNRP70, PRNP, ANXA4 , JUNB, SUPT4H1, PPIF, PPIF, EIF4Al, DUSP3, CALDl, OAZIN, P0N2, CPD, KIF5B, SHARP, SHARP, CHERP, HBXIP, APG5L, DPMI, DNAJB9, DNAJB9, PLAGL2, PDGFRA; TFAM, JARID2, PTPRM, DHX8, Clorf9, ZFHXlB, IGFlR, ZNF263, ACKl, NUBPl, TOPORS, F0X03A, E
- the microarray has three or more DNA or RNA molecules and/or proteins, or fragments thereof . In other embodiments the microarray has four or more DNA or RNA molecules and/or proteins, or fragments thereof. In other embodiments the microarray has five or more DNA or RNA molecules and/or proteins, or fragments thereof. In other embodiments the microarray has six or more DNA or RNA molecules and/or proteins, or fragments thereof . In other embodiments the microarray has seven or more DNA or RNA molecules and/or proteins, or fragments thereof. In other embodiments the microarray has eight or more DNA or RNA molecules and/or proteins, or fragments thereof.
- microarray has nine or more DNA or RNA molecules and/or proteins, or fragments thereof. In still other embodiments the microarray has ten or more DNA or RNA molecules and/or proteins, or fragments thereof .
- a microarray as disclosed in embodiments of the invention may be part of a kit.
- the kit may further comprise reagents required to detect two or more mutations in a biological sample on which the microarray is used. Typically the kit would also include instructions for use.
- the proteins or peptides encoded by genes or nucleic acid molecules such as SHARP, BICD2, NRBF2, . OAZIN, DHX8, VAPA, QKI, SLC18A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3, TEPl 1 and GPD2 may be used as an immunogen to generate antibodies.
- Such antibodies, which specifically bind to the peptides are useful as standards in assays such as radioimmunoassay, enzyme-linked immunoassay, or competitive-type receptor binding assays, radioreceptor assay, as well as in affinity purification techniques.
- the determination of the expression level of one or more nucleic acid molecules disclosed herein can be used in the prognosis, diagnosis, or treatment of a disease or disorder. For example, determining the likelihood that an individual will relapse following treatment for pre-B ALL may be used to predict whether the individual will survive the disease and may also be used to effect the most appropriate treatment for the individual .
- Genes identified by the present invention that show significantly up-regulated or down-regulated expression in pre-B ALL are potential therapeutic targets for pre-B ALL.
- Over-expressed genes may be targets for small molecules or inhibitors that decrease their expression. Methods and materials that can be used to inhibit gene expression, e.g. small drug molecules, anti- sense oligonucleotides, or antibody would be readily apparent to a person having ordinary skill in this art.
- under-expressed genes can be replaced by gene therapy or induced by drugs .
- treatment means any treatment of a disorder or disease in a subject by administering a medicament to the subject following the identification of the expression level of a gene, eg using pharmacogenomics .
- Treatment includes: (a) inhibiting the disorder or disease, i.e., arresting its development; or (b) relieving or ameliorating the symptoms of the disorder or disease, i.e., cause regression of the symptoms of the disorder or disease. The effect may be therapeutic in terms of a partial or complete cure of the disorder or disease.
- Standard immunophenotype and cytogenetic analyses were performed at local institutions on pre-treatment BM or PB specimens.
- a test cohort comprised 55 patients, 39 of them achieved CCR and the follow-up time was 5 years or more. Their leukemia specimens were evaluated by gene expression profiling (GEP) .
- GEP gene expression profiling
- Biotinylated cRNA was prepared from 2 ⁇ g of total RNA and hybridisation to Affymetrix HG-U133A oligonucleotide microarrays (Affymetrix, Santa Clara, CA) was performed as for example described in Hoffmann et al., 2005, MoI Biotechnol., 29:31-8.
- Array images were reduced to intensity values for each probe (CEL files) using Affymetrix MAS 5.0 software.
- Expression measures were extracted using robust multi-array analysis (RMA) for example as described in Irizarry et al., 2003, Nucleic Acids Res., 31:el5 and Dallas et al . , 2005, BMC Genomics., 6:59.
- RMA robust multi-array analysis
- Figure 1 shows a matrix 10 in which probe sets (and patients) have been sorted to show outcome discrimination in childhood pre-B ALL samples using hierarchical clustering of 200 probe sets.
- Each column 12 represents a patient labelled across the bottom with unshaded circles for good outcome (CCR) and shaded circles for poor outcome (relapse) .
- Each row represents a probe set.
- the colour scale 16 shows the relative gene expression changes normalised by the standard deviation and indicates levels of expression according to the shade.
- Figure 2 shows a similar matrix 30 with patients in columns 32 and probe sets in rows 34.
- the probe sets are identified by the respective probe label 38.
- the gene to which the probe binds is labelled 36. Those patients predicted to have CCR are grouped as 40 and those predicted to relapse are grouped as 42.
- PCA principal component analysis
- Figure 3 samples are projected into the space of the first two principle components.
- the circles represent patients predicted to relapse and triangles represent patients predicted to have CCR.
- Kaplan-Meier survival analysis was conducted on relapse-free survival data using log-rank test to compare outcome for patient groups according to the 18-GC RF prediction.
- Kaplan-Meier curves are shown for the duration of relapse-free survival in Figure 4D, which can be contrasted with Figures 4A-C showning conventional parameters, such as age (4A) , WBC at diagnosis (4D) and gender (4C) .
- the 18-GC was the only parameter significantly associated with outcome (p ⁇ 0.000001) .
- Figure 6 shows Pearson's correlations between gene expression levels determined by qRT-PCR and HG-U133A microarray for (A) BICD2 (p ⁇ 0.000001), (B) IGJ (p ⁇ 0.000001), (C) OAZIN (p ⁇ 0.00001) and (D) GLUL (p ⁇ 0.000001). All data are shown as Iog2.
- Figure 8 shows Kaplan Meier survival curves for test cohort: (8A) Comparison of the duration of relapse-free survival for patient groups based on 3 -GC with expression measures determined by microarray and (8B) with expression measures determined by qRT-PCR.
- the diagnostic multigene classifier for long-term outcome prediction calculates the probability of relapse (P R ) for each patient.
- the probability was calculated using logistic regression. This probability was converted into a prediction of relapse/CCR using a standard cut-off of 50%.
- the effect of each gene was modelled separately and interactions were investigated by using the product of gene expression scores.
- Figure 5 shows outcome discrimination in the independent validation cohort using a defined diagnostic 3 -GC:
- C Kaplan Meier analysis for validation cohort showing the duration of relapse-free survival for patient groups stratified according to 3 -GC.
- the diagnostic multigene classifier for long- term outcome prediction calculates the probability of relapse (P R ) for each patient.
- the probablility was calculated using logistic regression. This probability was converted into a prediction of relapse/CCR using a standard cut-off of 50%.
- the effect of each gene was modelled separately and interactions were investigated by using the product of gene expression scores.
- GLUL The second outcome- discriminator gene identified in this study, GLUL, catalyses the conversion from glutamate to glutamine, which is critical for cell proliferation, and its overexpression has been demonstrated in liver tumors.
- glucocorticoids including the first-line anti- leukemic drugs dexamethasone and prednisolone, and induction is mediated by the glucocorticoid receptor (Olkku et al., 2004, Bone, 34: 320-9; Harmon & Thompson, 1982, J Cell Physiol. , 110:155-60). Since patients achieving CCR in our study were found to exhibit increased expression of GLUL, it could be used as a marker for early and successful response to therapy.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Chemical & Material Sciences (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Pathology (AREA)
- Immunology (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Oncology (AREA)
- Microbiology (AREA)
- Hospice & Palliative Care (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates generally to the field of cancer research. The present invention also relates to a method of determining a predictive relationship between data related to gene expression levels and an outcome related to a disease or disorder, and a classifier for using the predictive relationship. More particularly, the present invention relates to A method of expression profiling for pre-B ALL, comprising the steps of: (a) providing biological samples from individuals with or without pre-B ALL; (b) isolating nucleic acid molecules from said biological samples; (c) measuring the expression levels of said nucleic acid molecules; (d) performing a first supervised analysis on data obtained from step (c), wherein said first supervised analysis will classify said individuals with or without pre-B ALL into distinct subgroups; and (e) performing a second supervised analysis on data obtained from step (d), wherein said second supervised analysis will classify said individuals further.
Description
DIAGNOSTIC AND PROGNOSTIC INDICATORS OF CANCER
This invention was made with government support under National Institutes of Health Grant No. CA95475. The government of the United States has certain rights in this invention.
FIELD
[0001] The present invention relates generally to the field of cancer research. The present invention also relates to a method of determining a predictive relationship between data related to gene expression levels and an outcome related to a disease or disorder, and a classifier for using the predictive relationship. More particularly, the present invention relates to expression profiling of cancer patients.
BACKGROUND %
[0002] It is currently very difficult, if not impossible, to predict whether an individual will develop a disease or disorder and, if the disease or disorder develops, how the individual will respond to treatment for the disease or disorder. In particular, it is almost impossible, to predict whether an individual will develop acute lymphoblastic leukaemia (ALL) and, if the disease develops, how the individual will respond to treatment.
[0003] Acute lymphoblastic leukaemia (ALL) is the single most common and at the same time the most successfully treated malignancy in children. However, resistant forms of childhood ALL constitute a leading cause of cancer-related morbidity and mortality. Long-term survival rates have reached 75% with contemporary treatment strategies tailored to specific subgroups based on biological and clinical risk features . Current National
Cancer Institute (NCI) criteria for risk assignment utilise age and white blood cell (WBC) counts at diagnosis to stratify patients into standard risk (SR) (1-9.99 years of age and WBC<50000/μl) and high risk (HR) (>10 years of age and/or WBC>50000/μl) . However, in addition to several structural and numerical chromosomal abnormalities that are known as independent prognostic factors, treatment itself remains one of the strongest predictors of outcome.
[0004] These findings have led to increased intensity multi-agent therapy protocols for patients assessed as having high risk of relapse. Despite this up to 25% of children suffer a recurrence of the disease, mostly within the first 5 years from diagnosis, and the outcome remains dismal with remission rates varying between 5-80%.
Notably, many relapses occur in standard risk patients, who initially present with favourable prognostic biological and clinical features, such as being <10 years of age, low WBC and TEL-AMLl translocation. Thus, there is a clear need to improve the identification of patients at increased risk of treatment failure, and in particular those patients that are currently stratified as standard risk (SR) , for whom more intensive up-front treatments are already available.
[0005] Accordingly, it would be useful to be able to predict whether an individual will develop a disease or disorder, such as B-lineage ALL (pre-B ALL) , and if so, how the individual will respond to treatment for the disease or disorder. Furthermore it would be useful to be able to predict likely outcomes of alternative treatments regimes and to tailor a treatment regime to an individual .
SUMMARY
[0006] In a first aspect, the invention provides a method of determining a predictive relationship between data related to gene expression levels and an outcome related to a disease or disorder, said method comprising: providing data comprising a plurality of probe sets, each probe set comprising a plurality of elements of data related to a gene, each data element reflecting a gene expression level indicated using the probe and each data element in each probe set being from a respective biological sample; analysing the data to find a group of one or more of the probe sets which have a predictive relationship to the outcome.
[0007] The probes related to the group of probe sets reflect which genes have the predictive relationship between their expression level and the outcome.
[0008] In some embodiments the method further comprises further analysing the group to find a refined group of one or more probe sets highly related to the outcome.
[0009] In some embodiments the method further comprises refining the membership of the refined group to include probe sets which are predictive of the outcome when the data related to the probe sets are obtained by use of two or more different techniques to measure gene expression.
[0010] In some embodiments the method further comprises refining the membership of the refined group to exclude probe sets which are not significantly predictive of the outcome when the probe sets are obtained by use of two or more different techniques to measure gene expression.
[0011] In some embodiments the predictive relationship is indicative of one or more outcomes of a set of possible outcomes. The group is typically indicative of an outcome by being discriminatory of the outcome with a significant degree of certainty. In one embodiment the group is discriminatory of the outcome according to each expression level of the gene associated with each corresponding probe in the group .
[0012] In some embodiments analysing the data to find the group of probe sets comprises conducting supervised analysis to rank the relevance of each probe set to the outcome. In one embodiment finding the refined group comprises conducting further supervised analysis of the group. In this embodiment the group of probe sets again undergoes analysis to rank probe sets that are members of the first group. From this ranking the refined group is identified. Using this technique the refined group will not necessarily have the same membership as the same number of the top probe sets in the first group.
[0013] In some embodiments one testing technique is to measure gene expression using an oligonucleotide microarray. In another embodiment the technique is to measure gene expression using quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) .
[0014] In some embodiments the method further comprises verifying the relationship between gene expression levels and the outcome. The verification is typically conducted by using an unsupervised clustering technique.
[0015] In some embodiments the method further comprises calculating a score of the accuracy in which the group is indicative of the outcome.
[0016] In some embodiments finding the refined group comprises decreasing the membership of the group by- removing members of the group which have a relatively low accuracy score. This may be recursively repeated until the size of the refined group is sufficiently small.
Alternatively removal of members is recursively repeated until just before the accuracy which the refined group is indicative of the outcome is substantially reduced.
[0017] In some embodiments the data undergoes a preliminary variance filter prior to analysis to identify outcome-discriminating probe sets. In another embodiment the data undergoes a preliminary variance filter prior to analysis to remove non outcome-discriminating probe sets.
[0018] In some embodiments the supervised analysis is performed by a decision-tree based algorithm. Preferably this is a Random Forest algorithm.
[0019] In one embodiment the unsupervised analysis is conducted by one or more of a decision tree algorithm, complete-linkage hierarchical clustering and/or principal component analysis (PCA) .
[0020] In some embodiments the group or refined group of probes sets is used to form a classifier of the outcome. In one embodiment the classifier receives data in the form of gene expression levels and predicts an outcome .
[0021] In some embodiments genes to which probes bind that are related to the group of probe sets are used to diagnose the disease or disorder.
[0022] In some embodiments genes to which probes bind that are related to the group of probe sets are used to
predict a prognosis of a patient having the disease or disorder.
[0023] In some embodiments genes to which probes bind that are related to the group of probe sets are used to identify biological functions related to the disease or disorder. In a further embodiment the identified biological functions are used to affect the outcome in a subject suffering from the disease or disorder or to prevent the subject from developing the disease or disorder.
[0024] In some embodiments the predictive relationship is between genes indicated by the probes to which the group of probes sets relate and an outcome of either regression or relapse of pre-B ALL.
[0025] In some embodiments the genes are selected from the group consisting of SHARP, BICD2, NRBF2, OAZIN, DHX8, VAPA, QKI, SLCl8A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3, TEPl, GPD2, or a functional fragment thereof .
[0026] In some embodiments the genes consist essentially of OAZIN, GLUL, BICD2, IGJ1 or PLA2G6, or a functional fragment thereof. In other embodiments, the genes consist essentially of OAZIN, GLUL, and IGJ, or functional fragments thereof. In further embodiments, the genes consist essentially of OAZIN and JGJ".
[0027] In some embodiments the expression level of the gene related to the group of probes indicative of the outcome is used to produce a model which is predictive of the outcome or a course of treatment to achieve the outcome or a prophylactic course of treatment.
[0028] In some embodiments the model predicts an outcome of relapse with a probability of PR = exp(g) / (1 + exp(g), where g = -1.7643 - 0.2727 * log (GLUL) + 0.5874 * log (IGJ) + 1.4731 * log (OAZIN) + 0.6329 * log (GLUL) * log (OAZIN) )
[0029] A second aspect of the invention provides an apparatus for determining a predictive relationship between data related to gene expression levels and an outcome related to a disease or disorder, said apparatus comprising: a receiver of data comprising a plurality of probe sets, each probe set comprising a plurality of elements of data related to a gene, each data element reflecting a gene expression level indicated using the probe and each data element in each probe set being from a respective biological sample; an analyser for finding a group of one or more of the probe sets which have a predictive relationship to the outcome .
[0030] In some embodiments the analyser is arranged to further analyse the group to find a refined group highly related to the outcome.
[0031] In some embodiments the analyser is arranged to refine the membership of the refined group to include probe sets which are predictive of the outcome when the probe sets are collected by use of two or more different techniques to measure gene expression.
[0032] In some embodiments the apparatus further comprises a verifier for verifying the relationship between the gene expression levels and the outcome. The verifier is typically configured to conduct an unsupervised clustering technique.
[0033] In some embodiments the analyser is configured to conduct supervised analysis to rank the relevance of each probe set to the outcome . In one embodiment the analyser is configured to conduct further supervised analysis of the group to find the refined group.
[0034] In some embodiments the apparatus comprises a storage means for storing the group and the refined group.
[0035] A third aspect of the invention provides a classifier for classifying data comprising a plurality of levels of expression of predetermined genes of a biological sample, said classifier comprising: an analyser for predicting an outcome according to a predictive relationship between the data and the outcome, thereby classifying the data.
[0036] In some embodiments the predetermined genes are determined according to the above method.
[0037] In some embodiments the gene expression levels are determined according to the above method.
[0038] In some embodiments the predetermined genes are SHARP, BICD2, NRBF2, OAZIN, DHX8, VAPA, QKI, SLC18A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3 , TEPl, GPD2, or a functional fragment thereof.
[0039] In some embodiments the predetermined genes consist essentially of OAZIN, GLUL, BICD2, IGJ, or PLA2G6, or functional fragments thereof. In other embodiments, the genes consist essentially of OAZIN, GLUL, and JGJ", or functional fragments thereof. In further embodiments, the genes consist essentially of OAZIN and IGJ.
[0040] A fourth aspect of the invention provides a computer program comprising instructions to control a computer to conduct the method.
[0041] A fifth aspect of the invention provides a computer program comprising instructions to control a computer to operate as the apparatus .
[0042] A sixth aspect of the invention provides a computer program comprising instructions to control a computer to operate as the classifier.
[0043] A seventh aspect of the invention provides a computer readable storage medium for storing any one of the computer programs.
[0044] An eighth aspect of the invention provides a method of gene expression profiling comprising: obtaining gene expression level data reflecting expression levels of a large number of genes from a pool of biological samples; analysing the data to identify a plurality of gene expression levels predictive of an outcome in relation to a disease or disorder, thereby providing a gene expression profile.
[0045] In some embodiments the method further comprises further analysing the group to find a refined group highly related to the outcome .
[0046] In some embodiments the method further comprises refining the membership of the refined group to include probe sets which are predictive of the outcome when the probe sets are obtained by use of two or more different techniques to measure gene expression.
[0047] In some embodiments the method further comprises refining the membership of the refined group to exclude probe sets which are not significantly predictive of the outcome when the probe sets are obtained by use of two or more different techniques to measure gene expression.
[0048] In a ninth aspect the invention provides a method of expression profiling for pre-B ALL, comprising the steps of : (a) providing biological samples from individuals with or without pre-B ALL;
(b) isolating nucleic acid molecules from said biological samples;
(c) measuring the expression levels of said nucleic acid molecules;
(d) performing a first supervised analysis on data obtained from step (c) , wherein said first supervised analysis will classify said individuals with or without pre-B ALL into distinct subgroups; and (e) performing a second supervised analysis on data obtained from step (d) , wherein said second supervised analysis will classify said individuals further.
[0049] In some embodiments, the step of measuring the expression levels of nucleic acid molecules is conducted by hybridization of the nucleic acid molecule to a DNA microarray or by qRT-PCR.
[0050] In some embodiments, the first and second supervised analysis are performed by a decision-tree based algorithm. Preferably, the algorithm is a Random Forest algorithm.
[0051] In a tenth aspect, the present invention provides a method of prognosis or diagnosis of pre-B ALL, comprising the steps of:
- Il ia) providing biological samples from individuals with or without pre-B ALL;
(b) isolating nucleic acid molecules from said biological samples; (c) measuring the expression levels of said nucleic acid molecules;
(d) performing a first supervised analysis on data obtained from step (c) , wherein said first supervised analysis will classify said individuals with or without pre-B ALL into distinct subgroups; and
(e) performing a second supervised analysis on data obtained from step (d) , wherein said second supervised analysis will classify said individuals further.
[0052] In some embodiments, the step of measuring the expression levels involves measuring the expression levels of a group of 18 genes within a biological sample. Preferably, the 18 genes have gene ID numbers selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398,
3512, 8383, 7059, 7011 and 2820. More preferably the genes have the gene ID numbers 51582, 2752, 23299, 3512 or 8398. Most preferably, the genes have the gene ID numbers of 51582, 2752, and 3512.
[0053] In some embodiments, the step of measuring the expression levels of said genes is examined at the nucleic acid level or protein level.
[0054] It will be appreciated by those skilled in the art that once the expression levels of genes have been measured it will be possible to identify which genes have expression levels different to individuals that do not have pre-B ALL.
[0055] Thus, in a eleventh aspect, the present invention provides a method for prognosis or diagnosis of
pre-B ALL, comprising the step of measuring the expression levels of a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof .
[0056] In a twelfth aspect, the present invention provides a therapeutic or prophylactic composition for the treatment or prevention of pre-B ALL, comprising one or more of :
(a) a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof ;
(b) an isolated nucleic acid molecule which is the complement of a sequence of a) ;
(c) an isolated nucleic acid molecule which hybridises under stringent conditions to a nucleic acid molecule of a) or b) ; and/or
(d) an isolated polypeptide encoded by a nucleic acid molecule of a) , b) , or c) , together with a pharmaceutically acceptable carrier.
[0057] In a thirteenth aspect, the present invention provides a microarray for prognosis or diagnosis of pre-B ALL, comprising one or more genes, wherein said gene has a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof or antisense molecule thereto.
[0058] In a fourteenth aspect, the present invention provides a kit for prognosis or diagnosis of pre-B ALL, comprising a gene having a gene ID number selected from
the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof or antisense molecule thereto.
[0059] In a fifteenth aspect, the present invention provides a method for predicting the likelihood of relapse in an individual with pre-B ALL, comprising the step of measuring the expression level of a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof .
[0060] In an sixteenth aspect, the present invention provides a method of selecting an agent for the prevention and/or treatment of pre-B ALL in an individual, comprising:
(a) isolating a biological sample from an individual with pre-B ALL;
(b) measuring the expression level of a gene that has a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820;
(c) identifying a gene which has an expression level different to the expression level of the same gene in a subject not having pre-B ALL; and
(d) selecting an agent which modulates the expression of the gene of (c) or specifically binds to a polypeptide encoded by said gene.
[0061] In an seventeenth aspect, the present invention provides a method of screening for an agent capable of modulating the expression of one or more genes having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752,
3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 comprising the steps of:
(a) providing one or more of said genes or functional fragments thereof, under conditions which allow expression of the gene(s);
(b) determining the level of expression of the gene (s) ; and
(c) contacting the gene(s) with a test agent and determining whether the level of expression changes relative to step (b) , wherein a change in the level of expression indicates that the agent is capable of modulating the expression of one or more of said genes .
[0062] In a eighteenth aspect, the present invention provides a method of screening for an agent capable of treating or preventing pre-B ALL, comprising the steps of:
(a) providing one or more genes having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or functional fragment (s) thereof, under conditions which allow expression of the gene(s);
(b) determining the level of expression of the gene (s) ; and
(c) contacting the gene(s) with a test agent and determining whether the level of expression changes relative to step (b) , wherein a change in the level of expression indicates that the agent is capable of treating or preventing pre-B ALL.
[0063] In some embodiments the level of at least one gene or fragment is increased compared to a normal level. In further embodiments, the level of at least one gene or fragment may be decreased compared to a normal level.
[0064] The present invention demonstrates gene expression profiling using, for example, DNA microarray and supervised analysis can be used to classify subgroups of pre-B ALL, identify genes with differential expression in subsets of pre-B ALL patients, and identify potential therapeutic targets for pre-B ALL. For example, in some embodiments there are provided methods of diagnosis and prognosis for pre-B ALL or subgroups of pre-B ALL based on the expression of a group of 18 genes. There is also provided a method of pre-B ALL diagnosis based on the expression levels of 18 genes.
[0065] In a nineteenth aspect, the present invention provides a method for predicting the prognosis or diagnosis of pre-B ALL comprising the steps of: (a) providing a biological sample from a patient with or without pre-B ALL; (b) obtaining a sample of nucleic acid isolated from the sample in step (a) , wherein the nucleic acid is RNA or a cDNA copy of RNA; (c) determining the gene expression pattern of a panel of specific sequences comprising at least OAZIN and one further gene selected from the group consisting of SHARP, BICD2, NRBF2, , DHX8, VAPA, QKI, SLC18A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3, TEPl and GPD2 within each nucleic acid pool described in (c) have been predetermined to either increase or decrease in response to pre-B ALL, where the gene expression pattern comprises the relative level of mRNA or cDNA abundance for the panel of specific sequences; and (d) compare the expression patterns in step (c) to a known reference pattern, wherein the difference in the levels of expression is prognostic or diagnostic of pre-B ALL.
[0066] In some embodiments, the panel of specific sequences comprises at least OAZJN and JGJ. In other embodiments, the panel of specific sequences comprises at least OAZIN and/or JGJ and/or GLUL.
[0067] In another aspect of the present invention, there are provided methods of treatment for pre-B ALL. Such methods involve inhibiting or enhancing expression of genes that are found to be over-expressed or down- regulated respectively in individuals diagnosed with pre-B ALL as disclosed herein.
[0068] It will be appreciated that any biological sample may be used in the present invention. Preferably, the biological sample is cells, tissue, or fluid isolated from bone marrow, plasma, serum, spinal fluid, lymph fluid, the external sections of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, milk, whole blood, blood cells, tumours, organs, and also includes samples of in vivo cell culture constituents, including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, putatively virally infected cells, recombinant cells, and cell components. In some embodiments the biological sample is bone marrow or blood.
[0069] Other and further aspects, features, and advantages of the present invention will be apparent from the following description of the presently preferred embodiments of the invention. These embodiments are given for the purpose of disclosure.
BRIEF DESCRIPTION OF THE FIGURES
[0070] Figure 1 is a matrix representation of probe sets showing discrimination of outcomes according to an embodiment of the present invention.
[0071] Figure 2 is a matrix representation of probe sets showing discrimination of outcomes according to an embodiment of the present invention.
[0072] Figure 3 is a plot of principal component analysis results.
[0073] Figures 4A-4D are plots of Kaplan-Meier curves.
[0074] Figures 5A-5B are plots of outcome discrimination .
[0075] Figure 5C is a plot of a Kaplan-Meier curve.
[0076] Figures 6A-6D are plots of Pearson's correlations results.
[0077] Figures 7A-7B are plots of principal component analysis results .
[0078] Figures 8A-8B are plots of Kaplan-Meier curves.
[0079] Figures 9A-9F are differential expression plots.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0080] Before describing preferred embodiments in detail, it is to be understood that the terminology used herein is for the purpose of describing particular embodiments of the invention only and is not intended to be limiting.
[0081] All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entirety. However, publications mentioned herein are cited for the purpose of describing and disclosing the protocols and reagents which are reported in the publications and which might be used in connection with the invention. Nothing herein is to be construed as an admission that the
Substitute Sheet (Rule 26) RO/AU
invention is not entitled to antedate such disclosure by virtue of prior invention.
[0082] The practice of the present invention will employ, unless otherwise indicated, conventional techniques of statistical analysis, molecular biology, and genetics which are within the skill of the art. Such techniques are described in the literature. See, for example, Armitage, Berry, and Matthews, "Statistical methods in' medical research" Oxford, Maiden MA, Blackwell Science, 2002; Stanton and Glantz "Primer of biostatistics" New York, McGraw Hill Medical Pub Div, c2002; Tabachnick and Fidell "Using multivariate statistics" Boston Mass., Allyn and Bacon , 2001; Parmigianti, Garret, Irizarry and Zeger (Eds) "The analysis of gene expression data" New
York, Springer, 2003; Good "Permutation tests: a practical guide to resampling methods for testing hypotheses" New York, Springer, 2000; Bailey & Ollis, 1986, "Biochemical Engineering Fundamentals", 2nd Ed., McGraw-Hill, Toronto; "DNA Cloning: A Practical Approach", Volumes I and II
(Glover ed. , 1985) ; Handbook of Experimental Immunology, Volumes I-IV (Weir & Blackwell, eds . , 1986); Immunochemical Methods in Cell and Molecular Biology (Mayer and Walker, eds., Academic Press, London, 1987), Methods in Enzymology, VoIs. 154 and 155 (Wu et al . eds. 1987); "Molecular Cloning: A Laboratory Manual", 2nd Ed., (ed. by Sambrook, Fritsch and Maniatis) (Cold Spring Harbor Laboratory Press: 1989); "Nucleic Acid Hybridization", (Hames & Higgins eds. 1984); "Oligonucleotide Synthesis" (Gait ed. , 1984); Remington's Pharmaceutical Sciences, 17th Edition, Mack Publishing Company, Easton, Pennsylvania, USA. ; and "The Merck Index", 12th Edition (1996), Therapeutic Category and Biological Activity Index.
[0083] It must be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the"
include plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to "a nucleic acid molecule" includes a plurality of such molecules .
[0084] Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art to which this invention belongs.
[0085] In the description that follows, if there is no instruction, it will be appreciated that techniques such as PCR, molecular-biological methods, assays and immunological methods, and methods of separation and purification of nucleic acid molecules and proteins are well-known in this field and any such technique may be adopted.
[0086] Although any materials and methods similar or equivalent to those described herein can be used to practice or test the present invention, the preferred materials and methods are now described.
[0087] There is now strong evidence that global gene or nucleic acid molecule expression profiling can reveal molecular heterogeneity of similar or related hematopoietic malignancies that have been difficult to distinguish. Genes exhibiting significant differential expression between normal and malignant cells can be used in the development of clinically relevant diagnostics and prognostic indicators as well as providing clues into the basic mechanisms of cellular transformation. In fact, these profiles might even be used to identify malignant cells even in the absence of any clinical manifestations. In addition, biochemical pathways in which the products of these genes act may be targeted by novel therapeutics .
[0088] Accordingly, in the broadest aspect of the present invention, There is provided a method of analysing the level of expression of a nucleic acid molecule, such as a gene, of a subject, such as a human being, in a test cohort . Biological samples are obtained from each member of the test cohort. The test cohort comprises subjects exhibiting alternative outcomes of a disease or disorder, such as a proportion of the individuals in the cohort being in remission and a proportion having relapse of a disease, such as pre-B ALL. In this context, "outcome" means any one or more of a possible range of states including being healthy or being diagnosed as having a particular disease and a possible range of prognoses.
[0089] The term "disease" as used herein is a general term which refers to any departure from health in which a subject suffers. A "disorder" refers to an abnormal functioning of a function or part of a body. The disease or disorder may be any disease or disorder associated with a different level of expression of a gene compared with that in a normal subject because altered levels of expression of a gene are known to be associated with a range of diseases and disorders, including but not limited to pre-B ALL.
[0090] Pre-B ALL is a type of leukemia. Leukemia is a cancer involving the blood forming cells. Cancer means any malignant cell growth or tumour caused by abnormal and uncontrolled cell division.
[0091] Data of gene expression levels is obtained from the test cohort. Determining the level of expression is typically conducted using a commercially available microarray comprising probes specific to a range of genes. Such a microarray will typically produce data for 33,000 genetic probes.
[0092] The data from microarrays is generated by first hybridysing the test material (mRNA or cRNA) to the array platform. The degree of hybridisation to the elements on the microarray is detected by a scanner. The signals for each element on the microarray is the raw data from the assay and is subsequently processed to calculate expression levels of genes under test. The data is formatted to capture information on a plurality of probe sets. Each probe set corresponds to a particular gene. Each probe set comprises a set of data elements, where each data element contains an expression level (as measured by the amount of the test material that binds to the probe) for each one of a plurality of biological samples. Thus, in effect, a probe set is produced for each gene in the range testable by the microarray. Each probe set reflects a measure of the gene expression level of a particular gene for each of the biological samples of the test cohort. The gene expression levels in the probe sets are aligned so that the gene expression level of the genes of one of the biological samples is in the same position in each probe set .
[0093] Once the probe sets are obtained, a variance filter is applied to the probe sets to identify genes which are expressed at outcome-discriminating levels. The variance filter eliminates all probe sets below a set fold-change and a predetermined p-value. For example probe sets with a fold change of less than 1.15 and a p- value of greater than 0.1 (999 permutations) are eliminated.
[0094] A supervised analysis is performed on the remaining probe sets to rank the predictability of the outcomes based on the probe sets. Highly ranked probe sets will reflect a strong (highly predictive)
relationship between one or more of the outcomes and the genes to which the corresponding probes bind.
[0095] A decision-tree based algorithm is preferably used for the supervised analysis. The Random Forest algorithm is particularly useful for this analysis. The Random Forest Algorithm of Breiman and Cutler is described at the URL www. stat .berkeley.edu/~breiman/RandomForests/cc home .htm and is used in, for example, Zhang H, Yu CY, Singer B, et al: Recursive partitioning for tumor classification with gene expression microarray data. Proc Natl Acad Sci U S A 98:6730-5, 2001 and Beesley AH, Cummings AJ, Freitas JR, et al: The gene expression signature of relapse in paediatric acute lymphoblastic leukaemia: implications for mechanisms of therapy failure. Br J Haematol 131 : 447-56, 2005.
[0096] Random Forest (RF) is a decision based algorithm. RF analysis typically consists of 100,000 trees. For each tree, the intrinsic RF reiterative process randomly chooses a subset of samples and probe sets for initial analysis, and subsequently uses the remaining samples for testing back. All probe sets used for RF analysis are ranked according to their ability to discriminate between groups of interest . For each sample a classification accuracy is obtained, along with a measure of confidence.
[0097] The Random Forest algorithm can rank the probe sets so that a group is identified by for example, selecting the top 200 and cutting off the remainder. This has the effect of selecting the number of potentially relevant probe sets from about between 1,000 and several thousand down to about between 20 and 200 (depending on the number of members in the group) to produce a group of thinned probe sets .
[0098] An unsupervised analysis is conducted to determine grouping of the probes sets to confirm outcome predictability. A decision tree algorithm, complete- linkage hierarchical clustering and/or principal component analysis (PCA) can be used.
[0099] Various clustering methods are in use and they are designed to find groups of specimens under test that show similar patterns of gene expression. They are referred to as "unsupervised" clustering methods if the analysis is solely using the expression data from the microarray, contrasting with "supervised" clustering methods where independent information on the specimens is incorporated, e.g. belonging to a particular outcome group (relapse or remission) . The RF method is a supervised learning algorithm. Principal Component Analysis (PCA) is an established technique for the unsupervised analysis of microarray data and has been extensively used as the algorithm of choice to visualize microarray data.
[0100] A. classifier is then formed from -the thinned probe sets. The classifier is used to predict the outcome based on expression levels of the genes related to the thinned probe sets in a given sample.
[0101] The thinned probe sets can be further refined to form a refined group of probe sets prior to formation of the classifier. Again a supervised analysis is performed on the remaining probe sets to rank the predictability of the outcomes. A decision-tree based algorithm and in particular the Random Forest algorithm is used.
[0102] The outcome prediction accuracy of the refined group of probe sets is compared to the prediction accuracy of the further refined group of probe sets. Where prediction accuracy is better the further refined group of
probe sets can be used. The number of genes indicated in the further refined group of probe sets can be varied and the accuracy reassessed to determine the optimum number of probes sets that more accurately predict the outcome . Where different techniques are used to measure gene expression level, such as qRT-PCR, the optimum number of probe sets that more accurately predict the outcome for a desired technique can be determined.
[0103] The result is a relatively low number of probe sets which show a high accuracy in predicting the outcome of the disease or disorder. This refined group of probes sets indicates which genes to measure expression levels of in order to predict the outcome. These genes and their expression levels which are indicative of a particular outcome are referred to as a gene expression signature (GES) . A GES may be formed for each possible outcome.
[0104] It is desirable to validate the classifier with an independent verification cohort to test the robustness of the GES (s) . The expression level of each gene of the GES of subjects in the validation cohort can be determined by an appropriate measurement technique (eg qRT-PCR) . The classifier integrates the expression levels in the GES (s) in order to predict the outcome of each subject in the validation cohort with a significant likelihood of success .
[0105] The results can be confirmed by, for example, using a multivariate Cox regression analysis (Cox, 1972, J. R. Sta. Soc. B., 34:187-220).
[0106] Survival analysis is a form of regression that models subjects' time to an event (such as death or hospitalization) . Some subjects will not achieve the event of interest while under follow-up but are still included in the analysis and are said to be "censored"
when their follow-up ends. Cox proportional hazards regression is a non-parametric type of survival analysis which assumes that there is a risk function over time and that modelled factors such as sex or gene expression affect this baseline risk in a proportional manner (eg males may be consistently 1.3 times more likely to die than females) .
[0107] It is therefore possible to identify a robust GES that can distinguish patient outcome using a clinically expedient and useful measuring technique, such as qRT-PCR.
[0108] As mentioned above the GES can be used in a classifier which receives the gene expression levels as data and outputs a predicted outcome or a list of predicted outcomes along with an indication of the likelihood of that outcome.
[0109] The identified genes can also be used to perform a gene ontology functional analysis to identify patterns in the function of the genes to understand the disease or disorder at a macro genetic level .
[0110] The types of outcome that can be predicted include diagnosis of a disease or disorder and prognosis of a patient having a disease or disorder. However it is also possible to then apply the predicted outcome to determine a suitable treatment regime for a patient with a particular disease or disorder and/or preventative action to reduce the likelihood of a patient contracting a disease or disorder or advancement of a disease or disorder.
[0111] The classifier may be implemented in the form of a computer running a computer program. The analysis for
obtaining the GES may be implemented in the form of a computer running a computer program.
[0112] The expression level of a nucleic acid molecule described herein means both whether the nucleic acid molecule is expressed or not, as well as, the extent to which it is expressed. The expression level may be increased compared to the level in a normal subject. For example, the expression levels of IGJ and PLA2G6 were found by the inventors to be increased in poor outcome patients. Alternatively, the expression level of a nucleic acid molecule may be decreased compared to the level in a "normal" subject.
[0113] The level of expression of a nucleic acid molecule may be determined by any means known in the art and may be determined directly or indirectly. For example, the amount of RNA corresponding to a gene may be determined. Alternatively the amount of protein encoded by the gene may be determined.
[0114] The determination of the level of expression of one or more nucleic acid molecules can be used in the prognosis, diagnosis, or treatment of a disease or disorder. For example, determining the likelihood that an individual will regress following treatment for pre-B ALL may be used to predict whether the individual will survive the disease and may also be used to effect the most appropriate treatment for the individual .
[0115] As used herein "prognosis" means a prediction of the course and outcome of a disease or disorder or the likelihood of a subject developing a disease or disorder. For example, depending upon the level of expression of a gene of the invention, the subject can be identified as likely to develop pre-B ALL and/or classified as likely to suffer a recurrence of the disease.
[0116] As used herein "diagnosis" means the process of identifying a disease or disorder, such as by its symptoms, laboratory tests (including genotypic tests), and physical findings. The identification of the level of expression of a nucleic acid molecule of the invention can be used in the diagnosis of a disease associated with the gene .
[0117] In one preferred embodiment, there is provided a panel of genes or nucleic acid molecules which can be used in the prognosis, diagnosis, and/or treatment of pre-B ALL.
[0118] Pre-B ALL is a type of leukaemia. Leukaemia is a cancer involving the blood forming cells . Cancer means any malignant cell growth or tumour caused by abnormal and uncontrolled cell division.
[0119] Acute leukaemia is the most common malignancy in children and it is caused by the clonal proliferation of hematopoietic cells (usually white blood cells) . In leukaemia, abnormal immature white cells increase greatly and invade other tissues and organs. These white cells are not able to function at their normal task of fighting disease and they impact on the function of normal cells which makes the leukemic child vulnerable to infection or haemorrhage. There are four types of childhood leukaemia: 1) acute lymphoblastic leukaemia (ALL) 75%; 2) acute myeloid leukaemia (AML) 20%; 3) mixed lineage leukaemia 2%; and 4) chronic leukaemia 3%.
[0120] Among children 85% are diagnosed with B-lineage ALL (pre-B ALL) and 15% with T-cell ALL (T-ALL) .
[0121] In the context of pre-B ALL the term "diagnosis" means the process of identifying pre-B ALL by its
symptoms, via laboratory tests (including immunophenotyping and genotypic tests) or through physical findings. The identification of the level of expression of a nucleic acid molecule of the invention can be used in the diagnosis of a disease associated with the gene.
[0122] In the context of pre-B ALL the terms "prognosis" or "prognostic marker" shall be taken to mean an indicator of the likelihood of progression of pre-B ALL diagnosed in an individual or the likelihood of an individual developing pre-B ALL. For example, depending upon the expression level of a gene or nucleic acid molecule of the invention, an individual might be identified as likely to develop pre-B ALL and/or classified as likely to suffer a relapse of the disease.
[0123] As used herein the term "predictive marker" shall be taken to mean an indicator of response to therapy, said response is preferably defined according to patient survival and/or relapse of disease. It is preferably used to define patients with high, low and intermediate length of survival after treatment that is the result of the inherent heterogeneity of the disease process .
[0124] In order to perform the prognostic and diagnostic methods of the present invention biological samples are obtained. In the context of this invention the terms "obtaining or providing a biological sample" or "obtaining or providing a sample from an individual", shall not be taken to include the active retrieval of a sample from an individual, (e.g., the performance of a biopsy) . Said terms shall be taken to mean the obtainment of a sample previously isolated from an individual. Said samples may be isolated by any means standard in the art, including but not limited to biopsy, surgical removal, body fluids .isolated by means of aspiration. Furthermore
said samples may be provided by third parties including but not limited to clinicians, couriers, commercial sample providers and sample collections .
[0125] The term "biological sample" as used herein includes any biological material of an individual . Preferably, the biological sample is cells, tissue, or fluid isolated from bone marrow, plasma, serum, spinal fluid, lymph fluid, the external sections of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, milk, whole blood, blood cells, tumours, organs, and also includes samples of in vivo cell culture constituents, including but not limited to conditioned medium resulting from the growth of cells in cell culture medium, putatively virally infected cells, recombinant cells, and cell components. In some embodiments the biological sample is bone marrow or blood.
[0126] The biological sample may be tested using the techniques described herein directly after isolation or alternatively further processed in order to increase the quality of the data produced. In this regard, the selective expansion of cells is useful to induce proliferation and generate a "cell line" in which the frequency of the relevant cells are log scale greater than the same cells in a biological sample directly isolated from a subject. The literature has also shown that, if required, the cells can be further concentrated and purified by cloning the specific cells.
[0127] Once biological samples are obtained from an individual having pre-B ALL they can be screened for differentially expressed genes or nucleic acids.
[0128] In some embodiments of the present invention an individual is screened in order to determine the expression level of a gene or nucleic acid molecule
associated with pre-B ALL diagnosis or prognosis. The phrase "expression level" when used in reference to nucleic acid molecules or genes described herein shall be taken to mean the transcription and/or translation of a gene or nucleic acid molecule, as well as the genetic or the epigenetic modifications of the genomic DNA associated with the gene and/or regulatory or promoter regions thereof. Genetic modifications include SNPs, point mutations, deletions, insertions, repeat length, rearrangements and other polymorphisms. The analysis of either the expression levels of protein, or mRNA or the analysis of the individuals genetic or epigenetic modification of the gene(s) or nucleic acid molecule of the invention are herein summarized infra.
[0129] The expression level of a gene or nucleic acid molecule may be determined by the analysis of any factors associated with or indicative of the level of transcription and translation of a gene including, but not limited to, methylation analysis, loss of heterozygosity
(hereinafter also referred to as LOH) , RNA expression levels and protein expression levels.
[0130] The term "expression level" also encompasses the absence of expression. In some embodiments, the expression level may be increased compared to the level in an individual not having pre-B ALL. The individual not having pre-B ALL may have a disease or disorder other than ore-B ALL or may be an apparently healthy individual. The individual will have expression levels of at least one of the nucleic acid molecules of the invention within normal, typical and/or average levels. For example, the expression levels of IGJ and PLA2G6 were found by the inventors to be increased in individuals that had a poor prognosis. Alternatively, the expression level of a nucleic acid molecule may be decreased compared to the level in an individual not having pre-B ALL. For example,
the inventors found that the expression levels of OAZIN, GLUL, and BICD2 were decreased in individuals that had a poor prognosis.
[0131] As used herein the term "nucleic acid molecule" means a DNA or RNA molecule . A "gene" means a length of DNA which encodes a particular protein or RNA molecule. Nucleic acid molecules and genes disclosed herein may or may not include the 51 and 31 untranslated regions of the DNA.
[0132] As used herein a "DNA" molecule includes any type of DNA, such as genomic DNA or cDNA. Similarly, "RNA" may be any class of RNA, including messenger RNA (mRNA) , transfer RNA (tRNA) , or ribosomal RNA (rRNA) , as the sequence of RNA corresponds to that of DNA.
[0133] The nucleic acid molecules may be single- stranded or double-stranded, or antisense DNA or RNA molecules, and includes functional fragment (s) of the nucleic acid molecule and antisense molecules thereto.
[0134] A "double-stranded DNA or RNA molecule" refers to the polymeric form of deoxyribonucleotides (adenine, guanine, thymine, cytosine, or uridine) in a double- stranded helix. This term refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary structure. Thus, this term includes double-stranded DNA and RNA found inter alia in linear DNA or RNA molecules (eg restriction fragments) , viruses, plasmids, and chromosomes. In discussing the structure of particular double-stranded DNA or RNA molecules, sequences may be described herein according to the normal convention of giving only the sequence in the 5 ' to 3 ' direction along the non-transcribed strand of DNA or RNA, eg the strand having a sequence homologous to the mRNA.
[0135] Nucleotide sequences of preferred embodiments of the invention may include sequences that differ by one or more nucleotide substitutions, additions or deletions, such as allelic variants, and will also include sequences that differ due to the degeneracy of the genetic code, provided that the different sequence which retains a function of the starting molecule, for example, is expressed in an individual having pre-B ALL at a different level to that in an individual not having pre-B ALL.
[0136] The term "fragment" of a molecule means a portion of the entire molecule. The size of the fragment is limited only in that it must have expression levels in an individual having pre-B ALL that is different when compared to the expression level in an individual not having pre-B ALL. In some embodiments, for example when used in a microarray, the fragments may be functional fragments. As used herein a "functional fragment" of a molecule is one which retains a function of the full- length molecule, for example, is expressed in an individual having pre-B ALL at a different level to that in an individual not having pre-B ALL.
[0137] It is appreciated in the art that a nucleic acid molecule can encode a polypeptide. The terms "encode" or "encoded" refer generally to a nucleic acid sequence being present in a translatable form. An "antisense" molecule is also considered to encode a polypeptide sequence, since the same informational content is present in a readily accessible form, especially when linked to a sequence which promotes expression of the sense strand.
[0138] Nucleic acid molecules disclosed herein include those involved in cell communication, cell growth and/or maintenance, transcription, and metabolism, for example, RNA, DNA, phosphate, and protein metabolism.
[0139] The expression level of a nucleic acid molecule of the preferred embodiments of invention may be determined by any means known in the art and may be determined directly or indirectly. For example, the amount of RNA corresponding to a gene may be determined. Alternatively the amount of protein encoded by the gene may be determined.
[0140] Where the expression level of a gene is determined by detecting the amount of mRNA corresponding to the gene, this may be achieved by techniques such as reverse transcription polymerase chain reaction (RT-PCR) , Northern blot analysis, and an RNase protection assay.
[0141] "Polymerase chain reaction," or "PCR," as used herein generally refers to a method for amplification of a desired nucleotide sequence in vitro. In general, the PCR method involves repeated cycles of primer extension synthesis in the presence of PCR reagents, using two oligonucleotide primers capable of hybridizing preferentially to a template nucleic acid. Typically, the primers used in the PCR method will be complementary to nucleotide sequences within the template at both ends of or flanking the nucleotide sequence to be amplified, although primers complementary to the nucleotide sequence to be amplified also may be used. See Wang, et al . , in PCR Protocols, pp.70-75 (Academic Press, 1990); Ochman, et al., in PCR Protocols, pp. 219-227; Triglia, et al., Nucl . Acids Res. 16:8186 (1988).
[0142] PCR may also be used to determine whether a specific sequence is present, by using a primer that will specifically bind to the desired sequence, where the presence of an amplification product is indicative that a specific binding complex was formed. Detection of mRNA
having the subject sequence is indicative of the level of expression of the gene.
[0143] Alternatively, the amplified sample can be fractionated by electrophoresis, e.g. capillary or gel electrophoresis, transferred to a suitable support, e.g. nitrocellulose, and then probed with a fragment of the template sequence .
[0144] "Oligonucleotides", "oligonucleotide primers", or "oligonucleotide probes" are short-length, single- or double-stranded polydeoxynucleotides that are chemically synthesised by known methods (involving, for example, triester, phosphoramidite, or phosphonate chemistry) , such as described by Engels, et al . , Agnew. Chem. Int. Ed.
Engl. 28:716-734 (1989). Typically they are then purified, for example, by polyacrylamide gel electrophoresis. Oligonucleotide primers and probes of the invention are DNA molecules that are sufficiently complementary to regions of contiguous nucleic acid residues within the allergy-associated gene nucleic acid to hybridise thereto, preferably under high stringency conditions. Defining appropriate hybridisation conditions is within the skill of the art. See eg., Maniatis et al . , DNA Cloning, vols . I and II. Nucleic Acid Hybridisation. However, briefly,
"stringent conditions" for hybridisation or annealing of nucleic acid molecules are those that (1) employ low ionic strength and high temperature for washing, for example, 0.015M NaCl/0.0015M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50°C, or (2) employ during hybridisation a denaturing agent such as formamide, for example, 50% (vol/vol) formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1% polyvinylpyrrolidone/50mM sodium phosphate buffer at pH 6.5 with 75OmM NaCl, 75mM sodium citrate at 42°C. Another example is use of 50% formamide, 5 X SSC
(0.75M NaCl, 0.075M sodium citrate),. 5OmM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5 X Denhardt's
solution, sonicated salmon sperm DNA (50μg/mL) , 0.1% SDS, and 10% dextran sulfate at 42°C, with washes at 42°C in 0,2 X SSC and 0.1% SDS.
[0145] Exemplary primers and probes include oligonucleotides that are at least about 15 nucleic acid residues long and that are selected from any 15 or more contiguous residues of DNA. Preferably, oligonucleotides primers and probes used in some embodiments of the invention are at least about 20 nucleic acid residues long. The invention also contemplates oligonucleotide primers and probes that are 150 nucleic acid residues long or longer. Those of ordinary skill in the art realise that nucleic hybridisation conditions for achieving the hybridisation of a primer or probe of a particular length to a nucleic acid molecule of the invention can readily be determined. Such manipulations to achieve optimal hybridisation conditions for probes of varying lengths are well known in the art. See, eg., Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor (1989) , incorporated herein by reference .
[0146] As used herein, the term "PCR reagents" refers to the chemicals, apart from the template nucleic acid sequence, needed to perform the PCR process. These chemicals generally consist of five classes of components: (i) an aqueous buffer, (ii) a water soluble magnesium salt, (iii) at least four deoxyribonucleotide triphosphates (dNTPs) , (iv) oligonucleotide primers
(normally two primers for each template sequence, the sequences defining the 5 ' ends of the two complementary strands of the double-stranded template sequence) , and (v) a polynucleotide polymerase, preferably a DNA polymerase, more preferably a thermostable DNA polymerase, ie a DNA polymerase which can tolerate temperatures between 90°C and
100°C for a total time of at least 10 minutes without losing more than about half its activity.
[0147] The four conventional dNTPs are thymidine triphosphate (dTTP) , deoxyadenosine triphosphate (dATP) , deoxycitidine triphosphate (dCTP) , and deoxyguanosine triphosphate (dGTP) . These conventional deoxyribonucleotide triphosphates may be supplemented or replaced by dNTPs containing base analogues which Watson- Crick base pair like the conventional four bases, e.g. deoxyuridine triphosphate (dUTP) .
[0148] A detectable label may be included in an amplification reaction. Biotin-labelled nucleotides can be incorporated into DNA or RNA by such techniques as nick translation, chemical and enzymatic means, and the like. The biotinylated primers and probes are detected after hybridisation, using indicating means such as avidin/streptavidin, fluorescent-labelling agents, enzymes, colloidal gold conjugates, and the like. Nucleic acids may also be labelled with other fluorescent compounds, with immunodetectable fluorescent derivatives, with biotin analogues, and the like. Nucleic acids may also be labelled by means of attachment to a protein. Nucleic acids cross-linked to radioactive or fluorescent histone single-stranded binding protein may also be used. Those of ordinary skill in the art will recognise that there are other suitable methods for detecting oligonucleotide primers and probes and other suitable detectable labels that are available for use in the practice of the present invention. Moreover, fluorescent residues can be incorporated into oligonucleotides during chemical synthesis. Preferably, oligonucleotides primers and probes of the invention are labelled to render them readily detectable. Detectable labels may be any species or moiety that may be detected either visually or with the aid of an instrument.
[0149] Suitable labels include fluorochromes, eg. fluorescein isothiocyanate (FITC) , rhodamine, Texas Red, phycoerythrin, allophycocyanin, 6-carboxyfluorexcein (6- FAM), 21, 7'-dimethoxy-4', 5'-dichloro-6-carboxyfluorescein (JOE), 6-carboxy-X-rhodamine (ROX) , 6-carboxy-2\ 41, 71, 4 , 7- hexachlorofluorescein (HEX) , 5-carboxyfluorescein (5-FAM) or N,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA), radioactive labels, eg. 32P, 35S, 3H, as well as others. Another group of fluorescent compounds are the naphthylamines, having an amino group in the alpha or beta position. Included among such naphthylamino compounds are 1-dimethylaminonaphthyl-5-sulfonate, l-anilino-8- naphthalene sulfonate and 2-p-touidinyl-6 -naphthalene sulfonate. Other dyes include 3 -phenyl-7 - isocyanatocoumarin, acridines, such as 9- isothiocyanatoacridine acridine orange; N- (p- (2- benzoaxazolyl) phenyl) maleimide ; benzoxadiazoles , stilbenes, pyrenes, and the like. Most preferably, the fluorescent compounds are selected from the group consisting of VIC, carboxy fluorescein (FAM) , Lightcycler® 640, and Cy5.
[0150] The label may be a two stage system, where the amplified DNA is conjugated to biotin, hapten, or the like having a high affinity binding partner, e.g. avidin, specific antibodies, etc, where the binding partner is conjugated to a detectable label. The label may be conjugated to one or both of the primers. Alternatively, the pool of nucleotides used in the amplification is labelled, so as to incorporate the label into the amplification product.
[0151] RT-PCR is a form of PCR which can amplify a known mRNA sequence using a reverse transcriptase to convert the mRNA to cDNA prior to traditional PCR. In its simplest implementation, aliquots are removed from the PCR
every couple of cycles beginning at a point where product is undetectable (typically about cycle 20) and extending through the entire exponential phase. Products are then resolved electrophoretically and quantitated by densitometry, fluorescence or phosphorimaging .
Alternatively, a fluorescent signal can be used to report formation of PCR product as each cycle of the amplification proceeds, coupled with an automated PCR/fluorescent detection system (Heid et al. , 1996, Genome Res.; 6:986- 994) . Suitable detection systems for RT-PCR include SYBR Green (Molecular Beacons) , Scorpions (Molecular Probes), and TaqMan® (Applied Biosystems) .
[0152] In some embodiments the invention utilises a combined PCR and hybridisation probing system so as to make the most of the closed tube or homogenous assay systems such as the use of FRET probes as disclosed in US patents (Nos 6,140,054; 6,174,670), the entirety of which are also incorporated herein by reference. In one of its simplest configurations, the FRET or "fluorescent resonance energy transfer" approach employs two oligonucleotides which bind to adjacent sites on the same strand of the nucleic acid being amplified. One oligonucleotide is labelled with a donor fluorophore which absorbs light at a first wavelength and emits light in response, and the second is labelled with an acceptor fluorophore which is capable of fluorescence in response to the emitted light of the first donor (but not substantially by the light source exciting the first ? donor, and whose emission can be distinguished from that of the first fluorophore) . In this configuration, the second or acceptor fluorophore shows a substantial increase in fluorescence when it is in close proximity to the first or donor fluorophore, such as occurs when the two oligonucleotides come in close proximity when they hybridise to adjacent sites on the nucleic acid being amplified (for example in the annealing phase of PCR)
forming a fluorogenic complex. As more of the nucleic acid being amplified accumulates, so more of the fluorogenic complex can be formed and there is an increase in the fluorescence from the acceptor probe, and this can be measured. Hence the method allows detection of the amount of product as it is being formed. In another simple embodiment, and as applies to use of FRET probes in PCR based assays, one of the labelled oligonucleotides may also be a primer used for PCR. In this configuration, the labelled PCR primer is part of the DNA strand to which the second labelled oligonucleotide hybridises, as described by Neoh et al., 1999, J Clin Path; 52:766-769, von Ahsen et al., 2000, Clin Chew; 46:156-161, the entirety of which are incorporated by reference.
[0153] It will be appreciated by those of skill in the art that amplification and detection of amplification with hybridisation primers and probes can be conducted in two separate phases, for example by carrying out PCR amplification first, and then adding hybridisation probes under such conditions as to measure the amount of nucleic acid which has been amplified. However, one embodiment of the present invention utilises a combined PCR and hybridisation probing system so as to make the most of the closed tube or homogenous assay systems and is carried out on a Roche Lightcycler® or other similarly specified or appropriately configured instrument.
[0154] Such systems would also be adaptable to the detection methods described here. Those skilled in the art will appreciate that such probes can be used for allele discrimination if appropriately designed for the detection of point-mutation (s) , in addition to deletion(s) and insertion (s) . Alternatively or in addition, the unlabelled PCR primers may be designed for allele discrimination by methods well known to those skilled in the art (Ausubel 1989-1999) .
[0155] It will also be appreciated by those skilled in the art that detection of amplification in homogenous and/or closed tubes can be carried out using numerous means in the art, for example using TaqMan® hybridisation probes in the PCR reaction and measurement of fluorescence specific for the target nucleic acids once sufficient amplification has taken place.
[0156] Although those skilled in the art will be aware that other similar quantitative "real-time" and homogenous nucleic acid amplification/detection systems exist such as those based on the TaqMan approach (US patent Nos 5,538,848 and 5,691,146), fluorescence polarisation assays (eg Gibson et al . , 1997, Clin Chem. , 43: 1336-1341), and the Invader assay (eg Agarwal et al., 2000, Diagn MoI Pathol., 9(3): 158-164; Ryan et al . , 1999, MoI Diagn., 4(2) : 135-144) . Such systems would also be adaptable for use in the invention described, enabling real-time monitoring of nucleic acid amplification.
[0157] Northern blot analysis involves fractionating RNA species on the basis of size by denaturing gel electrophoresis followed by transfer of the RNA onto a membrane by capillary, vacuum or pressure blotting (Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, 2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY) . The RNA may be bound to the membrane in an apparent non- covalent interaction via exposure to short wave ultraviolet light or by heating at 800C in a vacuum oven. RNA sequences of interest are detected on the blot by hybridization to an oligonucleotide probe. Probes for Northern blot detection generally contain full or partial cDNA sequences and may be labelled by enzymatic incorporation of radiolabeled (usually 32P or 33P) nucleotides or with nucleotides conjugated to haptens such as biotin for subsequent
chemiluminescent detection. After probe hybridization and washing to remove non-specific label, the hybridization signal is generally detected by exposing blots to X-ray film or phosphor storage plates, after prior incubation with chemiluminescent substrates if necessary. The resulting band identified by the probe indicates the size of the mRNA, and the intensity of the band corresponds to the relative abundance. Autoradiograph band intensities may be quantitated by densitometry, by direct measurement of hybridized radiolabeled probe via storage phosphor imaging or by scintillation counting of excised bands.
[0158] The RNase protection assay (RPA) operates on the same principle as a Northern blot as it involves hybridization of a labelled probe to a target mRNA. However in the RPA, hybridization takes place in a solution containing both a labelled antisense RNA probe and the target mRNA without prior gel fractionation or blotting (Azrolan & Breslow, 1990, J". Lipid Res., 31:1141-1146; Sambrook et al . , 1989, supra). After incubation for several hours, unhybridized probe and sample RNA may be enzymatically degraded and the remaining hybrids are electrophoresed through a denaturing polyacrylamide gel and visualized by autoradiography or phosphorimaging . Alternatively, the RNase-resistant hybrids may be precipitated and bound to filters for direct quantitation by scintillation counting (Melton et al . , 1984, Nucleic Acids Res., 12:7035-7056). Furthermore, by performing titration reactions with unlabeled RNA transcripts corresponding to the mRNA sense strand, absolute RNA levels can be determined.
[0159] Another method of detecting the amount of mRNA transcribed from a gene involves using specific nucleic acid microarrays and microchip technology. A microarray is a tool for analysing gene expression and typically consists of a small membrane or glass slide onto which
samples of many nucleic acid and/or protein molecules have been arranged in a regular pattern.
[0160] Microarrays are particularly useful for directly or indirectly detecting the level of expression of a nucleic acid molecule of an individual . A microarray as disclosed in embodiments of the present invention may have DNA, RNA, and/or protein applied to a solid matrix in predetermined locations and in such a way that mutations can be detected in the one or more molecule (s) .
[0161] For example, oligonucleotide probes for nucleic acid molecules disclosed herein can be deposited or synthesised at predetermined locations on a glass slide or other support . Messenger RNA or cRNA isolated from a biological sample obtained from an individual can be added to the probes under conditions which allow binding between the probes and mRNA sequences if present in the biological sample. Alternatively, mRNA from an individual may be applied to the slide before adding probes for nucleic acid molecules disclosed herein. Binding between the probes and DNA can be detected by any means known in the art and specific binding between the DNA and a probe indicates the DNA is expressed in the subject.
[0162] The methodology of hybridisation of nucleic acids and microarray technology is well known in the art. For example, the specific preparation of nucleic acids for hybridisation and probes, slides, and hybridisation conditions are provided in PCT/USOl/10482, herein incorporated by reference. However, briefly, suitable hybridisation conditions include those provided by Sambrook et al., infra. Ordinarily, "stringent conditions" for hybridisation or annealing of nucleic acid molecules are those that
(1) employ low ionic strength and high temperature for washing, for example, 0.015M NaCl/0.0015M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50°C, or
(2) employ during hybridisation a denaturing agent such as formamide, for example, 50% (vol/vol) formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1% polyvinylpyrrolidone/50mM sodium phosphate buffer at pH 6.5 with 75OmM NaCl, 75mM sodium citrate at 42°C.
[0163] Alternatively, protein isolated from a biological sample from an individual may be applied to a plastic slide. The term "protein" as used herein refers to peptides, proteins, and polypeptides. Labelled antibodies may be applied to the protein under conditions which allow binding between an antibody and the protein. Binding between the protein and antibody indicates that the nucleic acid molecule encoding the protein is expressed in the individual.
[0164] Where the expression level of a gene is determined by detecting the amount of protein encoded by the gene, this may be achieved by ELISA, immunohistochemical staining, or flow cytometry (such as fluorescent activated cell sorting) .
[0165] Enzyme-linked Immunosorbent Assays (ELISAs) combine the specificity of antibodies with the sensitivity of simple enzyme assays, by using antibodies coupled to an easily-assayed enzyme. ELISAs can provide a useful measurement of antigen or antibody concentration and can be used to detect the presence of a protein encoded by a gene of the invention and recognized by an antibody. One of the most useful of the immunoassays is the two antibody "sandwich" ELISA. This assay is used to determine the concentration of a protein in a sample and can determine the absolute amount of protein in the sample. The sandwich ELISA requires two antibodies that bind to
epitopes that do not overlap on the protein. This can be accomplished with either two monoclonal antibodies that recognize discrete sites or one batch of affinity-purified polyclonal antibodies.
[0166] To utilize this assay, one antibody (the "capture" antibody) is purified and bound to a solid phase, typically attached to the bottom of a plate well. The sample is then added and protein in the sample is allowed to complex with the bound antibody. Unbound products are then removed with a wash, and a labelled second antibody (the "detection" antibody) is allowed to bind with the bound protein, thus completing the "sandwich" . The assay is then quantitated by measuring the amount of labelled second antibody bound to the matrix, through the use of a colorimetric substrate .
[0167] The expression level of a nucleic acid molecule as disclosed herein may be determined in a cell or tissue i.e. an aggregate of cells, using techniques such as immunohistochemical staining techniques . With immunohistochemical staining techniques, a cell sample is prepared, typically by dehydration and fixation, followed by reaction with labelled antibodies specific for the gene product coupled, where the labels are usually visually detectable, such as enzymatic labels, fluorescent labels, luminescent labels, and the like. A particularly sensitive staining technique suitable for use in the present invention is described by Hsu et al . , 1980, Am. J. Clin.- Path., 75:734-738. Antibodies useful for immunohistochemical staining may be either monoclonal or polyclonal. Conveniently, the antibodies may be prepared against a synthetic peptide based on the protein or peptide encoded by genes or nucleic acid molecules of embodiments of the invention.
[0168] The acronym FACS (Fluorescence Activated Cell Sorting) and flow cytometry are herein used interchangeably. FACS is a powerful method used to study cells. Individual cells held in a thin stream of fluid are passed through one or more laser beams cause light to scatter and fluorescent dyes to emit light at various frequencies. Photomultiplier tubes (PMT) convert light to electrical signals and cell data is collected. Cell sub- populations are identified and defined at high purity (-100%) .
[0169] Fluorescent labelling allows investigation of cell structure and function, including the determination of whether a particular gene is being expressed by the cell. The expressed protein or mRNA is labelled with fluorescent dyes, typically using antibodies which specifically bind the protein or mRNA, and FACS collects the fluorescence signals in one to several channels corresponding to different laser excitation and fluorescence emission wavelength. Immunofluorescence, the most widely used application, involves the staining of cells with antibodies conjugated to fluorescent dyes such as fluorescein and phycoerythrin. This method is often used to label molecules on the cell surface, but antibodies can be directed at targets in cytoplasm.
[0170] In direct immunofluorescence an antibody to a molecule of the invention is directly conjugated to a fluorescent dye. Cells are stained in one step. In indirect immunofluorescence the primary antibody is not labelled. A second fluorescently-conjugated antibody is added which is specific for the first antibody.
[0171] Microfluidic technology may also be used in the analysis of proteins (Figeys et al . , 1998, Anal Chem. ,
70:3728-3734; Figeys & Aebersold, 1998, Electrophoresis, 19:885-892) . For example, microfluidics can be linked with
a mass spectrometric analysis of proteins or peptides. Thus, peptides can be adsorbed onto hydrophobic membranes, desalted, and through the use of microfluidics eluted in a controlled manner to allow the direct mass spectrometric analysis of picomole amounts of peptides by electrospray ionisation mass spectrometry procedures (Lion et al., 2003, J. Chromatogr. A., 1003:11-19). Combinatorial peptidomics (Soloviev et al . , 2003, J Nanobiotechnology, 1:4) may also be used with integrated microfluidic systems.
[0172] Methods of the invention may further comprise the step of validating the results obtained by any one of the previous methods by RT-PCR, which is described above.
[0173] Once the biological sample is obtained, processed and analysed by, for example, one of the techniques disclosed supra, the gene(s) and/or nucleic acid molecules associated with pre-B ALL can be identified.
[0174] Irrespective of the technique used to generate the expression level data there is a requirement to convert the observed data to numerical values . Numerical values are required in order to analyse the expression levels and compare data from individuals before identifying genes or nucleic acid molecules that are discriminatory for pre-B ALL.
[0175] The expression levels of multiple (two or more) genes in one or more lists of genes associated with pre-B ALL can be measured, and those measurements are used, either alone or with other parameters, to assign the individual into a particular risk category. For example, gene expression levels can be correlated with intrinsic disease biology and/or etiology to define intrinsically related groups.
[0176] Gene or nucleic acid molecule expression levels can be displayed in a number of ways. A common method is to arrange a ratio matrix into a graphical dendogram where columns indicate test samples and rows indicate genes
(see, for example, Figure 1) . These data are arranged so genes that have similar expression profiles are proximal to each other. The relative gene expression level for each gene is visualized as a colour. For example, low expression may appear in the blue portion of the spectrum while high expression may appear as a colour in the red portion of the spectrum. Commercially available computer software programs are available to display such data including "GENESPRINT" from Silicon Genetics, Inc. and "DISCOVERY" and "INFER" software from Partek, Inc.
[0177] Analysis of the expression levels can also be conducted by comparing intensities generated in qPCR or microarrays . This is best done by generating a ratio matrix of the expression intensities of genes in a test sample versus those in a control sample. For instance, the gene expression intensities from a tissue from a pre-B ALL patient can be compared with the expression intensities generated from tissue obtained from an individual not having pre-B ALL. A ratio of these expression intensities indicates the fold-change in gene expression between the test and control samples .
[0178] Microarray technology allows for the measurement of the steady-state mRNA level of thousands of genes simultaneously thereby presenting a powerful tool for identifying expression levels. Two microarray technologies are currently in wide use. The first are cDNA arrays and the second are oligonucleotide arrays. Although differences exist in the construction of these chips, essentially all downstream data analysis and output are the same. The product of these analyses are typically
measurements of the intensity of the signal received from a labelled probe used to detect a cDNA sequence from the sample that hybridizes to a nucleic acid sequence at a known location on the microarray. Typically, the signal intensity is proportional to the cDNA quantity, and thus mRNA, expressed in the sample cells. A large number of such techniques are available and useful. Preferred methods can be found in US Pat. Nos . 6,271,002 to Linsley et al.; US Pat. No. 6,218,122 to Friend et al . ; US Pat. No. 6,218,114 to Peck et al . ; and US Pat. No. 6,004,755 to Wang et al . , the disclosure of each of which is incorporated herein by reference.
[0179] In a preferred method, biotinylated cRNA is prepared and hybridized to a microarray such as Affymetrix HG-U133A oligonucleotide microarrays (Affymetrix, Santa Clara, CA) by methods such as those described in Hoffmann et al., 2005, MoI Biotechnol . , 29:31-8. Array images can then be reduced to intensity values for each probe (CEL files) using, for example, Affymetrix MAS 5.0 software.
Expression measures are extracted using robust multi-array analysis (RMA) for example as described in Irizarry et al., 2003, Nucleic Acids Res., 31:el5 and Dallas et al . , 2005, BMC Genomics., 6:59. To identify outcome- discriminating genes, a variance filter can be applied to eliminate all data with a fold-change <1.15 and a p-value >0.1 (999 permutations) . Fold-change is measured by dividing expression levels for relapse sample by CCR samples, both being normalised. Supervised analysis can then be performed with the remaining informative probe sets (n=1128) using the decision-tree based Random Forest (RF) . The Random Forest Algorithm of Breiman & Cutler is described at the URL www. stat . berkeley . edu/~breiman/RandomForests/cc home .htm and is used in, for example, Zhang et al . , 2001, Proc Natl Acad Sci USA., 98:6730-5 and Beesley et al., 2005, Br J Haematol., 131: 447-56. The Random Forest algorithm can
for example reduce the number of potentially relevant probes from about between 1,000 and several thousand to about between 20 and 200.
[0180] From this initial RF analysis, various combinations of target genes (e.g. top-ranked 200 genes) can be selected and subjected to secondary RF analyses.
[0181] An unsupervised analysis can also be conducted to determine grouping of the probes or genes to confirm outcome predictability. A decision tree algorithm, complete-linkage hierarchical clustering and/or principal component analysis (PCA) can be used.
[0182] A classifier can be formed from the reduced number of target genes. The classifier is used to predict the likely outcome of treatment and the like eg prognosis, based on the expression levels of the genes in the reduced list of target genes.
[0183] The reduced list of target genes can be further refined prior to formation of the classifier. ' Again a supervised analysis is performed on the list of target genes to rank the predictability of the outcomes based on the probes or the genes to which the probes relate. A decision-tree based algorithm and in particular the Random Forest algorithm is used.
[0184] The outcome prediction accuracy of the further refined list of target genes can then be compared to the prediction accuracy of the original target gene list. The number of genes indicated in the further refined target gene list can be varied and the accuracy reassessed to determine the optimum number of probes. Where different techniques are used to measure gene expression level, such as qRT-PCR, the optimum number of probes that more
accurately predict the outcome for a desired technique will need to be ascertained.
[0185] The result is a relatively low number of gene targets, which show a high accuracy in predicting the outcome of pre-B ALL. This refined group of genes indicates which gene expression levels predict the outcome. This refined list of target genes and their expression levels is referred to as a gene expression signature (GES) . A GES may be formed for each possible outcome .
[0186] It is desirable to validate the classifier with an independent verification cohort to test the robustness of the GES (s). The expression level of each gene of the GES of individuals in a validation cohort can be determined by an appropriate measurement technique (eg qRT-PCR) . The classifier can compare the expression levels of the refined list of target genes to the expression levels in the GES (s) in order to predict the outcome of each subject in the validation cohort with a significant likelihood of success.
[0187] The results can be confirmed by, for example, using a multivariate Cox regression analysis (Cox, 1972, J. R. Stat. Soc. B., 34:187-220).
[0188] In some of the embodiments of the present invention, a refined list of target genes or nucleic acid molecules identified as being associated with pre-B ALL outcome by the methods disclosed herein are selected from the group consisting of WWOX, EIF3S5, MYST2, EIF3S10, GABARAP, GLUL, DDX24, PSMFl, SNRP70, PRNP, ANXA4 , JUNB, SUPT4H1, PPIF, PPIF, EIF'4Al , DUSP3, CALDl, OAZIN, PON2, CPD, KIF5B, SHARP, SHARP, CHERP, HBXIP, APG5L, DPMI,
DNAJB9, DNAJB9, PLAGL2, PDGFRA, TFAM, JARID2, PTPRM, DHX8, ClorfS, ZFHXlB, IGFlR, ZNF263, ACKl, NUBPl, TOPORS,
FOXO3A, EIFlAY, EIFlAY, PPMlD, ABCGl, SMAD7, ZBTBIl, SNAP91, MAFG, PRDM2, ACVR2, DBT, HSDIlBl, MLLTlO, TEPl, FOLHl, RoXaN, USP9Y, ZNF165, JARIDlD, CA6, SLK, CYorfl4 , CHN2, CREM, AD7C-NTP, NDELl, ATM, PCBPl, TDE2, NAPA, VAPA, SRPl2, SRPl2, KPNBl, SYNCRIP, PNRCl, RAB5A, TCF3, DNAJA2, ATP2A2, BICD2, PI4KII, THBS3 , TORlB, CHUK, TAF5, PAIPl, P2RX1, PLA2G6, DDX49, ATM, CDKl, CHN2, STAU, GPD2, MYST4, PCDHGAlO, ARL6IP, RABl, COL4A1, H41, YARS, WIRE, POLDIP3 , RHOQ, PIGF, NIPA2, NUDT4 , DKFZP564G2022, QKI ,FEMlB, GART, OAZIN, IGJ, PDCD4, QKI, LHFPL2, CAMK2G, RCHYl, FTL,
SEC24A, SMYD2, FLJ20214, KIAAOlOl, BICD2, FTL, ODZ4, CHN2, SLC18A2, ARHGDIA, YWHAE, CYorflδB, PTP4A3, RCHYl, RPLIl, RHOQ, ROCKl, MYOlC, GLUL, KIAA0220, KIAA0310, GRN, TAF4B, PBEFl, NSFLlC, ARLlOC, Clorf24, DNCLIl, C20orfl49, FLJ10330, MKRN2, ZDHHC3, VPS4B, MGC3061, CHMPl .5, CHMPl.5, MIG12, DHX40, RABGEFl, ZC3HDC1, C21orf6, ZCCHC8, NF138, BRIX, CRKl, FLJ10300, HPSE, CHSTIl, C6orf61, RCP, FLJIlO16, CRYZLl, FLJ21986, NSFLlC, CYP46A1, CARD14 , HSPC135, WBSCR23, ORlAl, RPLl5, APG3 , FLJ20232, RRAGD, RRAGD, FLJ11301, FLJl1301, PC4 , NRBF2, DDO, DCLRElC,
T0B2, PDCD6.
[0189] In other embodiments the genes and/or nucleic acid molecules of the invention consist essentially of SHARP, BICD2, NRBF2, OAZIN, DHX8, VAPA, QKI, SLCl8A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3, TEPl and GPD2. Details of these nucleic acid molecules are listed in Table 1. A description of these genes, including the gene id numbers can be located at http : //www. ncbi . nlm. nih . gov/entrez/query . fcgi?CMD=search&D
B=gene .
Table 1
to
oo
Ol
[0190] In still other embodiments the nucleic acid molecules of the invention are OAZIN, GLUL, BICD2, IGJ, and PLA2G6.
[0191] In still other embodiments the nucleic acid molecules of the invention are OAZIN, GLUL, and JGJ.
[0192] OAZIN (ornithine decarboxylase antizyme inhibitor) regulates ornithine decarboxylase, the rate limiting enzyme in polyamine synthesis, and the enhanced expressed of both OAZIN and ornithine decarboxylase has been detected in tumour tissues. GLUL (glutamine synthetase) catalyses the conversion of glutamate to glutamine, which is critical for cell proliferation, and over-expression of GLUL has been demonstrated in liver tumours. Expression of GLUL is induced by glucocorticoids, including the first-line anti-leukemic drugs, dexamethasone and prednisolone, and induction is mediated by the glucocorticoid receptor. IGJ (Immunoglobulin J chain) functions as a linker of IGM and IGA monomers to pentamers and dimers respectively, resulting in their assembly and secretion of the pentamer IgM antibody.
[0193] In some embodiments the refined list of target genes or nucleic acid molecules of the invention or functional fragments thereof or antisense molecules thereto are located on a microarray. For example, the microarray may have two or more of WWOX, EIF3S5, MYST2, EIF3S10, GABARAP, GLUL, DDX24, PSMFl, SNRP70, PRNP, ANXA4 , JUNB, SUPT4H1, PPIF, PPIF, EIF4Al, DUSP3, CALDl, OAZIN, P0N2, CPD, KIF5B, SHARP, SHARP, CHERP, HBXIP, APG5L, DPMI, DNAJB9, DNAJB9, PLAGL2, PDGFRA; TFAM, JARID2, PTPRM, DHX8, Clorf9, ZFHXlB, IGFlR, ZNF263, ACKl, NUBPl, TOPORS, F0X03A, EIFlAY, EIFlAY, PPMlD, ABCGl, SMAD7, ZBTBIl,
SNAP91, MAFG, PRDM2, ACVR2, DBT, HSDIlBl, MLLTlO, TEPl, FOLHl, RoXaN, USP9Y, ZNF165, JARIDlD, CA6, SLK, CYorfl4,
CHN2, CREM, AD7C-NTP, NDELl, ATM, PCBPl, TDE2, NAPA, VAPA, SRP72, SRPl2, KPNBl, SYNCRIP, PNRCl, RAB5A, TCF3, DNAJA2, ATP2A2, BICD2, PI4KII, THBS3, TORlB, CHUK, TAF5, PAIPl, P2RX1, PLA2G6, DDX49, ATM, COKl, CHN2, STAU, GPD2, MYST4 , PCDHGAlO, ARL6IP, RABl, COL4A1, H41, YARS, WIRE, POLDIP3, RHOQ, PIGF, NIPA2, NUDT4 , DKFZP564G2022, QKI ,FEMlB, GART, OAZIN, IGJ, PDCD4, QKI, LHFPL2, CAMK2G, RCHYl, FTL, SEC24A, SMYD2, FLJ20214, KIAAOlOl, BICD2, FTL, ODZ4, CHN2, SLC18A2, ARHGDIA, YWHAE, CYorfl5B, PTP4A3, RCHYl, RPLIl, RHOQ, ROCKl, MYOlC, GLUL, KIAA0220, KIAA0310, GRN, TAF4B, PBEFl, NSFLlC, ARLlOC, Clorf24, DNCLIl, C20orfl49, FLJ10330, MKRN2, ZDHHC3, VPS4B, MGC3061, CHMPl .5, CHMPl .5, MIG12, DHX40, RABGEFl, ZC3HDC1, C21orf6, ZCCHC8, NF138, BRIX, CRKl, FLJl0300, HPSE, CHSTIl, C6orf61, RCP, FLJ11016, CRYZLl, FLJ21986, NSFLlC, CYP46A1, CARD14, HSPC135, WBSCR23, ORlAl, RPLl5, APG3, FLJ20232, RRAGD, RRAGD, FLJ11301, FLJ11301, PC4, NRBF2, DDO, DCLRElC, TOB2, PDCD6.
[0194] In some embodiments the microarray has three or more DNA or RNA molecules and/or proteins, or fragments thereof . In other embodiments the microarray has four or more DNA or RNA molecules and/or proteins, or fragments thereof. In other embodiments the microarray has five or more DNA or RNA molecules and/or proteins, or fragments thereof. In other embodiments the microarray has six or more DNA or RNA molecules and/or proteins, or fragments thereof . In other embodiments the microarray has seven or more DNA or RNA molecules and/or proteins, or fragments thereof. In other embodiments the microarray has eight or more DNA or RNA molecules and/or proteins, or fragments thereof. In other embodiments the microarray has nine or more DNA or RNA molecules and/or proteins, or fragments thereof. In still other embodiments the microarray has ten or more DNA or RNA molecules and/or proteins, or fragments thereof .
[0195] A microarray as disclosed in embodiments of the invention may be part of a kit. The kit may further comprise reagents required to detect two or more mutations in a biological sample on which the microarray is used. Typically the kit would also include instructions for use.
[0196] In some embodiments, the proteins or peptides encoded by genes or nucleic acid molecules such as SHARP, BICD2, NRBF2, . OAZIN, DHX8, VAPA, QKI, SLC18A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3, TEPl1 and GPD2 may be used as an immunogen to generate antibodies. Such antibodies, which specifically bind to the peptides, are useful as standards in assays such as radioimmunoassay, enzyme-linked immunoassay, or competitive-type receptor binding assays, radioreceptor assay, as well as in affinity purification techniques.
[0197] As disclosed herein, it will be appreciated by those skilled in the art that the determination of the expression level of one or more nucleic acid molecules disclosed herein can be used in the prognosis, diagnosis, or treatment of a disease or disorder. For example, determining the likelihood that an individual will relapse following treatment for pre-B ALL may be used to predict whether the individual will survive the disease and may also be used to effect the most appropriate treatment for the individual .
[0198] Genes identified by the present invention that show significantly up-regulated or down-regulated expression in pre-B ALL are potential therapeutic targets for pre-B ALL. Over-expressed genes may be targets for small molecules or inhibitors that decrease their expression. Methods and materials that can be used to inhibit gene expression, e.g. small drug molecules, anti- sense oligonucleotides, or antibody would be readily apparent to a person having ordinary skill in this art. On
the other hand, under-expressed genes can be replaced by gene therapy or induced by drugs .
[0199] As used herein "treatment" or "treating" means any treatment of a disorder or disease in a subject by administering a medicament to the subject following the identification of the expression level of a gene, eg using pharmacogenomics . "Treatment" and "treating" includes: (a) inhibiting the disorder or disease, i.e., arresting its development; or (b) relieving or ameliorating the symptoms of the disorder or disease, i.e., cause regression of the symptoms of the disorder or disease. The effect may be therapeutic in terms of a partial or complete cure of the disorder or disease.
[0200] By "comprising" is meant including, but not limited to, whatever follows the word comprising". Thus, use of the term "comprising" indicates that the listed elements are required or mandatory, but that other elements are optional and may or may not be present. By "consisting of" is meant including, and limited to, whatever follows the phrase "consisting of". Thus, the phrase "consisting of" indicates that the listed elements are required or mandatory, and that no other elements may be present. By "consisting essentially of" is meant including any elements listed after the phrase, and limited to other elements that do not interfere with or contribute to the activity or action specified in the disclosure for the listed elements. Thus, the phrase "consisting essentially of" indicates that the listed elements are required or mandatory, but that other elements are optional and may or may not be present depending upon whether or not they affect the activity or action of the listed elements.
[0201] The invention will now be further described by way of reference only to the following non-limiting
examples. It should be understood, however, that the examples following are illustrative only, and should not be taken in any way as a restriction on the generality of the invention described above. In particular, while the invention is described in detail in relation to the use of ODOF protein, it will be clearly understood that the findings herein are not limited to this protein.
EXAMPLE 1 SCREENING OF PRE-B ALL INDIVIDUALS
[0202] A study was conducted using 101 cryopreserved, Ficoll-Hypaque purified pre-treatment bone marrow (BM) or peripheral blood (PB) specimens obtained from children diagnosed with ALL at Princess Margaret Hospital for Children, Perth, Western Australia. Eligibility for study included diagnosis of pre-B ALL, age >1 year at diagnosis and therapy on Children's Cancer Group (CCG) risk-adjusted protocols (Gaynon et al . , 2000, Leukemia, 14: 2223-33). Excluded from the study were patients who failed to achieve remission, died during induction therapy or who achieved continuous complete remission (CCR) after receiving a BM transplant . Patients in the study cohort were diagnosed between 1984 and 2002. Standard immunophenotype and cytogenetic analyses were performed at local institutions on pre-treatment BM or PB specimens. A test cohort comprised 55 patients, 39 of them achieved CCR and the follow-up time was 5 years or more. Their leukemia specimens were evaluated by gene expression profiling (GEP) . A validation cohort was tested exclusively by real- time quantitative RT-PCR (qRT-PCR) analysis and comprised 46 patients (CCR n=40, relapse n=6) . Patient characteristics of the test and validation cohort were found to be representative of those found in other childhood ALL studies.
[0203] Biotinylated cRNA was prepared from 2μg of total RNA and hybridisation to Affymetrix HG-U133A
oligonucleotide microarrays (Affymetrix, Santa Clara, CA) was performed as for example described in Hoffmann et al., 2005, MoI Biotechnol., 29:31-8.
[0204] Array images were reduced to intensity values for each probe (CEL files) using Affymetrix MAS 5.0 software. Expression measures were extracted using robust multi-array analysis (RMA) for example as described in Irizarry et al., 2003, Nucleic Acids Res., 31:el5 and Dallas et al . , 2005, BMC Genomics., 6:59.
[0205] Initial hierarchical cluster analysis using all genes revealed no segregation into groups based on either outcome or NCI risk stratification (i.e. age and WBC); however, subclustering was observed based on known cytogenetic subgroups was observed.
[0206] To identify outcome-discriminating genes, a variance filter was initially applied to eliminate all probe sets with a fold-change <1.15 and a p-value >0.1
(999 permutations) . Fold-change is measured by dividing expression levels for relapse sample by CCR samples, both being normalised to β-Actin (ACTB) . Supervised analysis was performed with the remaining informative probe sets (n=1128) using the decision-tree based Random Forest (RF) . From this initial RF analysis, various combinations of probe sets (e.g. top-ranked 200 genes) were selected and subjected to secondary RF analyses for outcome prediction.
[0207] Figure 1 shows a matrix 10 in which probe sets (and patients) have been sorted to show outcome discrimination in childhood pre-B ALL samples using hierarchical clustering of 200 probe sets. Each column 12 represents a patient labelled across the bottom with unshaded circles for good outcome (CCR) and shaded circles for poor outcome (relapse) . Each row represents a probe set. The colour scale 16 shows the relative gene
expression changes normalised by the standard deviation and indicates levels of expression according to the shade.
[0208] The same result was obtained using only the top 20 probe sets as shown in Figure 2. Figure 2 shows a similar matrix 30 with patients in columns 32 and probe sets in rows 34. The probe sets are identified by the respective probe label 38. The gene to which the probe binds is labelled 36. Those patients predicted to have CCR are grouped as 40 and those predicted to relapse are grouped as 42.
[0209] Two of the probes of the top 20 genes were selective of two genes, GLUL (gene ID No. 2752 and OAZIN (gene ID No. 51582), which were also selected by another two probes for the same gene. Thus, the 20 probes were selective of the expression levels of 18 genes.
[0210] A further analysis was conducted using principal component analysis (PCA) by projecting the samples into the space of the first two principle components (PCs) .
Here patients also clearly separated into two distinct clusters based on outcome as shown in Figure 3B. In
Figure 3 samples are projected into the space of the first two principle components. The circles represent patients predicted to relapse and triangles represent patients predicted to have CCR.
[0211] RF analysis was applied to evaluate the outcome prediction accuracy using the top 18 genes. For the 55 pre-B ALL patients, the long-term outcome was predicted with an accuracy of 89.1%. In this analysis, all patients who achieved CCR were correctly identified, and 10 of 16 patients were correctly predicted as relapse patients (specificity=100%; sensitivity=62.5%) . Noticeably, 60% (6/10) of patients who relapsed and were initially
stratified as SR, were correctly predicted as relapse patients .
[0212] Kaplan-Meier survival analysis was conducted on relapse-free survival data using log-rank test to compare outcome for patient groups according to the 18-GC RF prediction. Kaplan-Meier curves are shown for the duration of relapse-free survival in Figure 4D, which can be contrasted with Figures 4A-C showning conventional parameters, such as age (4A) , WBC at diagnosis (4D) and gender (4C) . Importantly, the 18-GC was the only parameter significantly associated with outcome (p<0.000001) . When we compared the relapse-free survival and initial risk stratification for the patients who were incorrectly predicted as CCR, it was found that the majority of patients (4/6) had experienced a late relapse (>36 months after diagnosis) .
[0213] To investigate the association between risk of relapse, the multigene classifiers and known prognostic factors, Cox proportional hazard regression analysis was performed. The combined effect of the expression levels of genes included in this model was calculated by PCA and the weight of the first component was used to compute a gene- classifier (GC) score, thus, the GC score represents a linear combination of the genes. In the Cox proportional hazard regression analysis the GC score was included as a continuous variable.
[0214] Moreover, in a multivariate Cox regression analysis, the 18-GC score (see Materials and Methods) was the most significant factor contributing to outcome (p<0.000001) besides male gender (p=0.049).
[0215] To determine which genes to include in a multigene classifier (i.e. feature selection decribed in Simon, 2005, J Natl Cancer Inst., 97:866-7) several
combinations of the top 18 genes were tested by RF analysis, resulting in a selection of a subset of five genes. These five genes alone yielded superior outcome prediction accuracy compared to the analyses performed with all 18 genes, with an overall accuracy of 94.5% (specificity=100%; sensitivity=81.3%) . While three of these genes (OAZIN, GLUL, BICD2 (gene ID numbers 51582, 2752 and 23299, respectively) ) showed decreased expression levels, two genes (IGJ, PLA2G6 (gene id numbers 3512 and 8398, respectively)) showed increased expression in poor outcome patients (see Table 2) .
[0216] Mean gene expression values and fold-changes for five genes measured by microarray and qRT-PCR
TABLE 2
RF Gene Mean expression
Fold-change (R/CCR)* Rank Probe ID Symbol HG-U133A
CCR Relapse (R) HG-U133A qRT-PCR
2 201772 _at OAZIW 317.50 222.89 0.64 0.56
3 212592 _at IGJ 52.25 346.20 6.70 6.28
5 215001_s_at GLUL 630.45 455.47 0.64 0.41
10 213154_s_at BICD2 194.13 129.60 0.65 0.70
14 210647_x_at PLA2G6 201.53 231.91 1.12 0.81 *expression values normalised to β-actin (ACTB)
EXAMPLE 2 REAL-TIME QUANTITATIVE RT-PCR
[0217] All real-time quantitative RT-PCR (qRT-PCR) assays were carried out using primer and probe sets designed by Applied Biosystems
(http://www.appliedbiosystems.com/) in accordance with the manufacturer's standard protocols as described in detail Dallas et al., 2005, supra. To account for inter- experimental variations, five ALL cell lines (PER-278, PER-371, PER-487, PER-495, PER-607) were included in each experiment for calibration, see, for example, Kees et al . , 2003, MoI Cancer Ther. , 2: 671-7. Significance differences in gene expression levels between outcome groups were determined using t-test. Pearson's correlations were used for the comparison of qRT-PCR and microarray data and P- values were obtained using Fisher's z-transformation.
[0218] Figure 6 shows Pearson's correlations between gene expression levels determined by qRT-PCR and HG-U133A microarray for (A) BICD2 (p < 0.000001), (B) IGJ (p < 0.000001), (C) OAZIN (p < 0.00001) and (D) GLUL (p < 0.000001). All data are shown as Iog2.
EXAMPLE 3 ' VALIDATION OF MICROARRAY RESULT WITH qRT-PCR
[0219] To confirm the results of microarray analysis for these genes, qRT-PCR was performed on all 55 samples of the test cohort. Data generated by qRT-PCR and microarray correlated well for four genes, with correlation coefficients between 0.57 (OAZIN, p<0.00001) and 0.88 (IGJ, p<0.000001) (Figure 6) and similar fold- changes in expression levels. The differential expression of PLA2G6 could not be recapitulated by qRT-PCR and most likely reflects the relatively low fold-changes observed for microarray expression levels.
[0220] Further analysis by RF and PCA using either qRT- PCR or microarray expression measures revealed slightly higher outcome prediction accuracies using three genes only (OAZIN, GLUL, IGJ) compared to four genes. Using the 3-gene classifier (3-GC) patients in the test cohort we were able identify those that would achieve poor or good outcome. Figure 7 shows outcome discrimination in childhood pre-B ALL samples (test cohort: n=55) by principal component analysis (PCA) using 3 genes (OAZIN, GLUL, IGJ) . Samples are projected into the space of the first two principle components (PC) and the expression levels were determined by (7A) microarray or (7B) qRT-PCR. Patients with good outcome are indicated by white circles and patients that experienced a relapse are indicated by black circles.
[0221] RF analysis based on values from array expression resulted in an overall prediction accuracy of 89.1% (specificity = 97.4%; sensitivity = 68.8%). In comparison, a slightly lower overall prediction accuracy of 83.6% was obtained when qRT-PCR expression measures were applied to a RF analysis (specificity = 94.9%, sensitivity = 56.3). The table below shows a multivariate Cox regression analysis that included age, WBC and gender confirmed the 3 -GC score as the most significant parameter related to outcome (p<0.000001) .
[0222] Cox proportional hazards regression analysis of the risk of relapse in the test cohort (n=55) in relation to known prognostic factors and the 3-gene classifier (3- GC) score is shown in Table 3.
TABLE 3
Variable Number of patients Hazard ratio (HR) 95% CI P-value
Age
<10 years 46 1.0*
≥IO years 9 3.18 0.78-13.06 0.11
WBC
<50/nl 50 1.0*
≥50/nl 5 0.48 0.1-2.32 0.36
Gender
Female 26 1.0*
Male 29 4.15 1.26-13.64 0.019
3 -GC score 55 3.79 2.25-6.39 <0.000001
Reference group
[0223] This was additionally confirmed by Kaplan-Meier analysis (p<0.000001) . Figure 8 shows Kaplan Meier survival curves for test cohort: (8A) Comparison of the duration of relapse-free survival for patient groups based on 3 -GC with expression measures determined by microarray and (8B) with expression measures determined by qRT-PCR.
[0224] A comparison of relapse-free survival and initial risk stratification for the patients who were wrongly assigned to the CCR group confirmed our previous observation of these misclassifications mainly occurring for patients who experienced a late relapse.
[0152] The diagnostic multigene classifier for long-term outcome prediction calculates the probability of relapse (PR) for each patient. The probability was calculated using logistic regression. This probability was converted into a prediction of relapse/CCR using a standard cut-off of 50%. The effect of each gene was modelled separately and interactions were investigated by using the product of gene expression scores.
[0153] A suitable model is as follows: (PR = exp(g) / (1 + exp(g)), where PR is the probability of relapse; where g = -1.7643 - 0.2727 * log (GLUL) + 0.5874 * log (IGJ) + 1.4731 * log (OAZIN) + 0.6329 * log (GLUL) * log (OAZIN)); where GLUL is expression by qRT-PCR; where OAZIN is expression by qRT-PCR; and where IGJ is expression by qRT-PCR.
[0225] Figure 5 shows outcome discrimination in the independent validation cohort using a defined diagnostic 3 -GC: (A) development of a diagnostic multigene classifier for long-term outcome prediction in the test cohort (n=55) by logistic regression. The probability of relapse was converted into a prediction of relapse / CCR using a
standard cut-off of 50%. (B) application of defined diagnostic 3 -GC to independent validation cohort (n=46) . (C) Kaplan Meier analysis for validation cohort showing the duration of relapse-free survival for patient groups stratified according to 3 -GC.
[0226] Validation of a 3-gene classifier (3-GC) in an independent cohort and development of a defined diagnostic 3 -GC The robustness of the identified gene expression signature was assessed in a completely independent patient cohort. A defined diagnostic multigene classifier for long-term outcome prediction was developed based on the qRT-PCR expression measures of OAZIN1 GLUL and JGJ" using logistic regression. The model was built using only qRT-PCR data obtained for the test cohort (n=55, Figure 5A) . Next the expression levels of the three genes were determined by qRT-PCR in BM specimens from 46 children diagnosed with pre-B ALL, of whom 6 patients experienced a relapse (see Table 4) .
TABLE 4
Clinical features of pre-B ALL patients
Test cohort Validation cohort -
(n=55) (n=46)
CCR (n=39) Relapse (n=16) CCR (n=40) Relapse (n=6) Male/Female 18/21 11/5 20/20 3/3
Age at diag, y 5 g (1_6.14_4) 7 1 (3.0-I6.3) 4.1 (1.0-15.3) 6.2 (2.3-10.2) (range)
Standard Risk 32 (82%) 10 (63%) 28 (70%) 2 (33%)
High Risk 7 (18%) 6 (37%) 12 (30%) 4 (67%)
% Blast (range) 97 (80-100) 98 (85-100) 98 (81-100) 99 (94-99)
Follow-up time, y . . 14.7 (5.0-
(range) 7'9 t5-6"11-9) 19.9)
Time to relapse, (0.5-4.9) 2.7 (1.3-3.6) y (range)
[0227] Importantly, the differential expression for all three genes in the validation cohort was confirmed.
[0228] Figure 9 shows differential expression of GLUL, IGJ and OAZIN in the test cohort (n=55) and validation cohort (n=46) as determined by qRT-PCR.
[0229] From these data the 3-GC values to predict outcome in this validation cohort were determined. The results clearly showed that outcome prediction for these patients was achieved (Figure 5B) . Patients were stratified according to the 3-GC prediction score and Kaplan-Meier curves illustrate the significant association of the defined diagnostic 3-GC with long-term outcome in this validation cohort (p<0.000001, Figure 5C). In this cohort, a significant association with outcome was also observed for WBC, but not for age and gender. These results were also confirmed in a multivariate Cox regression analysis (see table below) .
[0230] Cox proportional hazards regression analysis of the risk of relapse in the validation cohort (n=46) in relation to known prognostic factors and a 3 -gene classifier (3-GC) . See Table 5.
TABLE 5
Variable Number of patients Hazard ratio (HR) 95% CI P-value
Age
<10 years 37 1.0*
≥IO years 9 0.97 0.09-10.6 0.98
WBC
<50/nl 37 1.0*
>50/nl 9 3.53 0.52-24. 1 0.20
Gender
Female 23 1.0*
Male 23 0.56 0.09-3.75 0.55
3 -GC score 46 550 2.84-106000 0.02
Reference group
[0231] The diagnostic multigene classifier for long- term outcome prediction calculates the probability of relapse (PR) for each patient. The probablility was calculated using logistic regression. This probability was converted into a prediction of relapse/CCR using a standard cut-off of 50%. The effect of each gene was modelled separately and interactions were investigated by using the product of gene expression scores.
EXAMPLE 4 GENE ONTOLOGY AND IN SILICO ANALYSIS OF
PUBLISHED DATA SETS
[0232] Gene Ontology (GO) Biological Process analysis of the 200 top-ranked genes identified by RF was performed using the Affymetrix NetAffx GO Mining Tool
(https : //www. affymetrix. com/analysis/netaffx/go_analysis_n etaffx4.affx) as published in Beesley et al . , 2005, Br J Haematol., 131: 447-56.
[0233] Functional evaluation of these 200 genes established 11 major biological process categories according to the GO database (134 GO annotations; Supplemental Document, Figure S3) and revealed differential expression of genes involved in cell communication, cell growth and/or maintenance, transcription and metabolism, in particular RNA, DNA, phosphate and protein metabolism.
[0234] Because it is possible to identify a robust gene expression signature that can distinguish patients with good outcome from those that experienced a relapse which in present study of childhood pre-B ALL reached 89.1%, and is more predictive than conventional parameters currently used for risk stratification, a discriminator of these genes is not only significantly linked to clinical outcome it has broader application.
[0235] The identification of expression signatures are of prognostic relevance irrespective of known genetic subtypes of B-lineage ALL. In addition, we could demonstrate that the majority of patients who experienced an early relapse were correctly predicted as poor outcome, irrespective whether they were stratified as HR or SR patients based on conventional criteria. The 3-GC was able to identify 68.8% of poor outcome patients (11/16) and included 50% of SR patients that were not captured as being at risk of relapse by conventional risk stratification schemes.
[0236] Accurate SR classification is of particular importance, since these patients are likely to benefit most from an intensified front-line treatment. This is particularly relevant in light of the concept that early relapses may develop as a result of insufficient clearance of the leukemic blast from the BM during initial chemotherapy and/or the (re) growth of leukemic cells despite chemotherapy. Thus the present invention is useful to determine an appropriate treatment regime .
[0237] Furthermore by determining the function of genes highly predictable of a disease appropriate drug selection can be more readily made. For example, over representation of genes involved in nucleic acid metabolism has been observed in children diagnosed with B- lineage ALL that showed cross-resistance to four commonly used antileukemic agents, whilst a large number of genes involved in protein synthesis protein synthesis were linked associated with the treatment response to vincristine and asparaginase. By knowing the expression levels of significant genes in prognosis, the drug selection can be more tuned to the predisposition of the patient and their likely prognosis with each type of available drug.
[0238] The polyamine biosynthesis pathway is being explored as a target for anti-neoplastic therapy with particular focus on inhibitors of ODC. The second outcome- discriminator gene identified in this study, GLUL, catalyses the conversion from glutamate to glutamine, which is critical for cell proliferation, and its overexpression has been demonstrated in liver tumors. Several studies have described that expression of GLUL is induced by glucocorticoids, including the first-line anti- leukemic drugs dexamethasone and prednisolone, and induction is mediated by the glucocorticoid receptor (Olkku et al., 2004, Bone, 34: 320-9; Harmon & Thompson, 1982, J Cell Physiol. , 110:155-60). Since patients achieving CCR in our study were found to exhibit increased expression of GLUL, it could be used as a marker for early and successful response to therapy.
Claims
1. A method of determining a predictive relationship between data related to gene expression levels and an outcome related to a disease or disorder, said method comprising: providing data comprising a plurality of probe sets, each probe set comprising a plurality of elements of data related to a gene, each data element reflecting a gene expression level indicated using the probe and each data element in each probe set being from a respective biological sample; analysing the data to find a group of one or more of the probe sets which have a predictive relationship to the outcome.
2. A method of claim 1, further comprising further analysing the group to find a refined group of one of more probe sets highly related to the outcome .
3. A method of claim 1 or claim 2, further comprising refining the membership of the refined group to include probe sets which are predictive of the outcome when the data related to the probe sets are obtained by use of two or more different techniques to measure gene expression.
4. A method of claim 3, wherein the predictive relationship is indicative of one or more outcomes of a set of possible outcomes.
5. A method of any one of claims 1 to 4 , wherein the group is discriminatory of the outcome according to each expression level of the gene associated with each corresponding probe in the group.
6. A method of any one of claims 1 to 5 , wherein analysing the data to find the group of probe sets comprises conducting supervised analysis to rank the relevance of each probe set to the outcome.
7. A method of claim 2, wherein finding the refined group comprises conducting further supervised analysis of the group.
8. A method of claim 3, wherein measurement of gene expression level is conducted using an oligonucleotide microarray.
9. A method of claim 3, wherein measurement of gene expression level is conducted by qRT-PCR.
10. A method of claim 3, wherein one measurement of gene expression level using an oligonucleotide microarray and another measurement gene expression level is conducted by qRT-PCR.
11. A method of any one of claims 1 to 10, further comprising verifying the relationship between gene expression levels and the outcome.
12. A method of claim 11, wherein the verification is conducted by using an unsupervised clustering technique.
13. A method of claim 2, further comprising calculating a score of the accuracy in which the group is indicative of the outcome.
14. A method of claim 13, wherein finding the refined group comprises decreasing the membership of the group by removing members of the group which have a low accuracy score.
15. A method of any one of claims 1 to 14, wherein the data undergoes a preliminary variance filter prior to analysis to identify outcome-discriminating probe sets.
16. A method of any one of claims 6 or 7, wherein the supervised analysis is performed by a decision-tree based algorithm.
17. A method of claim 16, wherein the algorithm is a Random Forest algorithm.
18. A method of claim 12, wherein the unsupervised analysis is conducted by one or more of a decision tree algorithm, complete-linkage hierarchical clustering and/or principal component analysis (PCA) .
19. A method of any one of claims 1 to 18, wherein the probes related to the group of probes is used to form a classifier of the outcome.
20. A method of claim 19, wherein the classifier receives data in the form of gene expression levels and predicts an outcome .
21. A method of any one of claims 1 to 20, wherein genes to which probes bind that are related to the group of probe sets are used to diagnose the disease or disorder.
22. A method of any one of claims 1 to 20, wherein genes to which probes bind that are related to the group of probe sets are used to predict a prognosis of a patient having the disease or disorder.
23. A method of any one of claims 1 to 20, wherein genes to which probes bind that are related to the group of probe sets are used to identify biological functions related to the disease or disorder.
24. A method of claim 23, wherein the identified biological functions are used to affect the outcome in a subject suffering from the disease or disorder or to prevent the subject from developing the disease or disorder.
25. A method of any one of claims 1 to 24, wherein the predictive relationship is between genes indicated by the probes to which the group of probes sets relate and the outcome of either regression or relapse of pre-B ALL.
26. A method of claim 25, wherein the genes are selected from the group consisting of SHARP, BICD2, NRBF2, OAZIN, DHX8, VAPA, QKI, SLC18A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3, TEPl, GPD2, or a functional fragment thereof .
27. A method of claim 25, wherein the genes are OAZIN, GLUL, BICD2, IGJ, or PLA2G6, or a functional fragment thereof .
28. A method of claim 25, wherein the genes are OAZIN, GLUL1 or JGJ", or a functional fragment thereof.
29. A method of any one of claims 1 to 28, wherein the expression level of the gene related to the group of probes indicative of the outcome is used to produce a model which is predictive of the outcome or is used to determine a course of treatment to achieve the outcome or a prophylactic course of treatment.
30. A method of claim 29, wherein the model predicts an outcome of relapse with a probability of PR = exp(g) / (1 + exp(g), where g = -1.7643 - 0.2727 * log (GLUL) + 0.5874 * log (IGJ) + 1.4731 * log (OAZIN) + 0.6329 * log (GLUL) * log ( OAZIN) )
31. A method of gene expression profiling comprising: obtaining gene expression level data reflecting expression levels of a large number of genes from a pool of biological samples; analysing the data to identify a plurality of gene expression levels predictive of an outcome in relation to a disease or disorder, thereby providing a gene expression profile.
32. A method according to claim 31, further comprising further analysing the gene expression levels to find a refined group highly related to the outcome.
33. A method according to claim 31, further comprising refining the gene expression levels to exclude those which are not significantly predictive of the outcome when the gene expression levels are obtained by use of two or more different techniques.
34. An apparatus for determining a predictive relationship between data related to gene expression levels and an outcome related to a disease or disorder, said apparatus comprising: a receiver of data comprising a plurality of probe sets, each probe set comprising a plurality of elements of data related to a gene, each data element reflecting a gene expression level indicated using the probe and each data element in each probe set being from a respective biological sample,- an analyser for finding a group of one or more of the probe sets which have a predictive relationship to the outcome .
35. An apparatus of claim 34, wherein the analyser is arranged to further analyse the group to find a refined group highly related to the outcome.
36. An apparatus of claim 34 or claim 35, wherein the analyser is arranged to refine the membership of the refined group to include probe sets which are predictive of the outcome when the probe sets are collected by use of two or more different techniques to measure gene expression.
37. An apparatus of any one of claims 34 to 36 which further comprises a verifier for verifying the relationship between the gene expression levels and the outcome .
38. An apparatus of any one of claims 34 to 37, wherein the analyser is configured to conduct supervised analysis to rank the relevance of each probe set to the outcome .
39. An apparatus of any one of claims 34 to 38, wherein the analyser is configured to conduct further supervised analysis of the group to find the refined group .
40. An apparatus of any one of claims 34 to 39, wherein the apparatus comprises a storage means for storing the group and the refined group.
41. A classifier for classifying data comprising a plurality of levels of expression of predetermined genes of a biological sample, said classifier comprising: an analyser for predicting an outcome according to a predictive relationship between the data and the outcome, thereby classifying the data.
42. A classifier as claimed in claim 41, wherein the predetermined genes are determined according to the method of any one of claim 1 to 33.
43. A classifier as claimed in claim 41, wherein the predetermined genes are SHARP, BICD2, NRBF2, OAZIN, DHX8, VAPA, QKI, SLC18A2, GLUL, HSDIlBl, FOLHl, WWOX, PLA2G6, IGJ, ORlAl, THBS3, TEPl, GPD2, or a functional fragment thereof .
44. A classifier as claimed in claim 41, wherein the predetermined genes are OAZIN, GLUL, BICD2, IGJ1 or
PLA2G6, or a functional fragment thereof.
45. A classifier as claimed in claim 41, wherein the predetermined genes are OAZIN, GLUL1 or IGJ, or a functional fragment thereof.
46. A computer program comprising instructions to control a computer to conduct the method of any one of claims 1 to 33.
47. A computer program comprising instructions to control a computer to operate as the apparatus of any one of claims 34 to 40.
48. A computer program comprising instructions to control a computer to operate as the classifier of any one of claims 41 to 45.
49. A computer readable storage medium for storing the computer program of any one of claims 46 to 48.
50. A method of expression profiling for pre-B ALL, comprising the steps of:
(a) providing biological samples from individuals with or without pre-B ALL;
(b) isolating nucleic acid molecules from said biological samples; (c) measuring the expression levels of said nucleic acid molecules;
(d) performing a first supervised analysis on data obtained from step (c) , wherein said first supervised analysis will classify said individuals with or without pre-B ALL into distinct subgroups; and
(e) performing a second supervised analysis on data obtained from step (d) , wherein said second supervised analysis will classify said individuals further.
51. A method according to claim 50, wherein the step of measuring the nucleic acid molecule expression level is conducted by hybridization of said nucleic acid molecule to a DNA microarray.
52. A method according to claim 50, wherein the step of measuring the nucleic acid molecule expression level is conducted by qRT-PCR.
53. A method according to any one of claims 50 to 52, wherein the first and second supervised analysis is performed by a decision-tree based algorithm.
54. A method according to claim 53, wherein the algorithm is a Random Forest algorithm.
55. A method according to any one of claims 50 to 54, wherein the data undergoes a preliminary variance filter prior to first supervised anaylsis.
56. An isolated molecule identified by a method according to any one of claims 50 to 55, wherein said isolated nucleic acid molecule is selected from the group consisting essentially of: (a) a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof ;
(b) an isolated nucleic acid molecule which is the complement of a sequence of a) ; (c) an isolated nucleic acid molecule which hybridises under stringent conditions to a nucleic acid molecule of a) or b) ; and/or
(d) an isolated polypeptide encoded by a nucleic acid molecule of a) , b) , or c) , for use in the prognosis, diagnosis, and/or treatment of pre-B ALL.
57 A therapeutic or prophylactic composition for the treatment or prevention of pre-B ALL, comprising one or more of:
(a) a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof;
(b) an isolated nucleic acid molecule which is the complement of a sequence of a) ;
(c) an isolated nucleic acid molecule which hybridises under stringent conditions to a nucleic acid molecule of a) or b) ; and/or
(d) an isolated polypeptide encoded by a nucleic acid molecule of a) , b) , or c) , together with a pharmaceutically acceptable carrier.
58. A microarray for prognosis or diagnosis of pre-B ALL, comprising a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof or antisense molecule thereto.
59. A kit for prognosis or diagnosis of pre-B ALL, comprising a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof or antisense molecule thereto.
60. A method for prognosis or diagnosis of pre-B ALL, comprising the step of measuring the expression levels of a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof.
61. A method of prognosis or diagnosis of pre-B ALL, comprising the steps of:
(a) providing a biological sample from an individual ;
(b) measuring the expression levels of a group of 18 genes within said biological sample, said 18 genes have gene ID numbers 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820; and
(c) performing statistical analysis on the expression levels of said genes, wherein a statistically significant value of said analysis indicates that said individual has pre-B ALL.
62. A method according to claim 61, wherein the step of measuring the expression levels of said genes is examined at the nucleic acid level or protein level.
63. A method of prognosis or diagnosis of pre-B ALL, comprising the steps of: (a) isolating biological samples from individuals with or without pre-B ALL; (b) isolating nucleic acid molecules from said biological samples,-
(c) measuring the expression levels of said nucleic acid molecules; (d) performing a first supervised analysis on data obtained from step (c) , wherein said first supervised analysis will classify said individuals with or without pre-B ALL into distinct subgroups; and
(e) performing a second supervised analysis on data obtained from step (d) , wherein said second supervised analysis will classify said individuals further.
64. A method for predicting the likelihood of a relapse in a subject with pre-B ALL, comprising the step of measuring the expression levels of a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or a functional fragment thereof .
65. A method of selecting an agent for the prevention and/or treatment of pre-B ALL in an individual, comprising:
(a) isolating a biological sample from an individual with pre-B ALL;
(b) measuring the expression levels of a gene that has a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820;
(c) identify a gene which has an expression level different to the expression level of the same gene in a subject not having pre-B ALL; and
(d) selecting an agent which modulates the expression of the gene of (c) or specifically binds to a polypeptide encoded by said gene.
66. A method of screening for an agent capable of modulating the expression of one or more genes having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 comprising the steps of:
(a) providing one or more of said genes or functional fragments thereof, under conditions which allow expression of the gene(s); (b) determining the level of expression of the gene (s) ; and
(c) contacting the gene(s) with a test agent and determining whether the level of expression changes relative to step (b) , wherein a change in the level of expression indicates that the agent is capable of modulating the expression of one or more of said genes .
67. A method of screening for an agent capable of treating or preventing pre-B ALL, comprising: comprising the steps of:
(a) providing one or more genes having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820 or functional fragments thereof, under conditions which allow expression of the gene(s);
(b) determining the level of expression of the gene ( s) ; and (c) contacting the gene(s) with a test agent and determining whether the level of expression changes relative to step (b) , wherein a change in the level of expression indicates that the agent is capable of treating or preventing the disease.
68. A method according to claim 66 or claim 67, wherein the level of expression in at least one of the genes is increased compared to the level of expression in step (b) .
69. A method according to claim 66 or claim 67, wherein the level of expression in at least one of the genes is decreased compared to the level of expression in step (b) .
70. A method of any one of claims 65 to 69, wherein the agent is a nucleic acid molecule which is antisense to a gene having a gene ID number selected from the group consisting of 23013, 23299, 29982, 51582, 1659, 9218, 9444, 6571, 2752, 3290, 2346, 51741, 8398, 3512, 8383, 7059, 7011 and 2820.
71. A method according to claim 65, wherein the agent which specifically binds to a polypeptide encoded by said gene is a polyclonal or monoclonal antibody or a functional fragment thereof.
72. A molecule according to claim 56, a composition according to claim 57, a microarray according to claim 58, a kit according to claim 59 or a method according to any one of claims 60 to 63, wherein the gene has a gene ID number selected from the group consisting of 51582, 2752, 23299, 3512 and 8398, or a functional fragment thereof.
73. A method according to any one of claims 50 to 55, wherein said first and second supervised analysis will identify genes with significantly different levels of expression in pre-B ALL patients as compared to normal individuals, wherein said genes are potential therapeutic targets for pre-B ALL.
74. A method of treatment for pre-B ALL, comprising the step of increasing the expression levels of a gene that has a gene ID number selected from the group consisting of 23299, 51582 and 2752 or combinations thereof .
75. A method of treatment for pre-B ALL, comprising the step of decreasing the expression levels of a gene that has a gene ID number selected from the group consisting of 8398 and 3512 or combinations thereof.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2006902939 | 2006-05-31 | ||
AU2006902939A AU2006902939A0 (en) | 2006-05-31 | Method and system for predicting a relationship between gene expression levels and an outcome | |
AU2006902940 | 2006-05-31 | ||
AU2006902940A AU2006902940A0 (en) | 2006-05-31 | Diagnostic and prognostic indicators of cancer |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007137366A1 true WO2007137366A1 (en) | 2007-12-06 |
Family
ID=38778026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU2007/000768 WO2007137366A1 (en) | 2006-05-31 | 2007-05-31 | Diagnostic and prognostic indicators of cancer |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2007137366A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013002750A2 (en) * | 2011-06-29 | 2013-01-03 | Biotheranostics, Inc. | Determining tumor origin |
US9670553B2 (en) | 2004-06-04 | 2017-06-06 | Biotheranostics, Inc. | Determining tumor origin |
US10538816B2 (en) | 2004-06-04 | 2020-01-21 | Biotheranostics, Inc. | Identification of tumors |
US11430544B2 (en) | 2005-06-03 | 2022-08-30 | Biotheranostics, Inc. | Identification of tumors and tissues |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004053074A2 (en) * | 2002-12-06 | 2004-06-24 | Science And Technology Corporation @ Unm | Outcome prediction and risk classification in childhood leukemia |
WO2006048264A2 (en) * | 2004-11-04 | 2006-05-11 | Roche Diagnostics Gmbh | Gene expression profiling in acute lymphoblastic leukemia (all), biphenotypic acute leukemia (bal), and acute myeloid leukemia (aml) m0 |
-
2007
- 2007-05-31 WO PCT/AU2007/000768 patent/WO2007137366A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004053074A2 (en) * | 2002-12-06 | 2004-06-24 | Science And Technology Corporation @ Unm | Outcome prediction and risk classification in childhood leukemia |
WO2006048264A2 (en) * | 2004-11-04 | 2006-05-11 | Roche Diagnostics Gmbh | Gene expression profiling in acute lymphoblastic leukemia (all), biphenotypic acute leukemia (bal), and acute myeloid leukemia (aml) m0 |
Non-Patent Citations (7)
Title |
---|
ANDERSSON A. ET AL.: "Molecular signatures in childhood acute leukemiae and their correlations to expression patterns in normal hematopoietic subpopulations", PROCEEDINGS OF THE NATIONAL ACADEMY OF THE UNITED STATES OF AMERICA, vol. 102, no. 52, 27 December 2005 (2005-12-27), pages 19069 - 19074, XP008091851 * |
BEESLEY A. ET AL.: "The gene expression signature of relapse in paediatric acute lymphoblastic leukaemia: implications for mechanisms of therapy failure", BRITISH JOURNAL OF HAEMATOLOGY, vol. 131, no. 4, 2005, pages 447 - 456, XP008091172 * |
DE ZEN L. ET AL.: "Computational analysis of flow-cytometry antigen expression profiles in childhood acute lymphoblastic leukemia: an MLL/AF4 identification", LEUKEMIA, vol. 17, no. 8, 2003, pages 1557 - 1565, XP008091543 * |
HOFFMANN K. ET AL.: "Translating microarray data for diagnostic testing in childhood leukaemia", BMC CANCER, vol. 6, 26 September 2006 (2006-09-26), pages 229, XP021023015 * |
KEES U.R. ET AL.: "Gene Expression Profiles in a Panel of Childhood Leukemia Cell Lines Mirror Critical Features of the Disease", MOLECULAR CANCER THERAPEUTICS, vol. 2, no. 7, July 2003 (2003-07-01), pages 671 - 677, XP008091572 * |
KUCHINSKAYA E. ET AL.: "Children and adults with acute lymphoblastic leukaemia have similar gene expression profiles", EUROPEAN JOURNAL OF HAEMATOLOGY, vol. 74, no. 6, 2005, pages 466 - 480, XP008091368 * |
MOOS P.J. ET AL.: "Identification of Gene Expression Profiles That Segregate Patients with Childhood Leukemia", CLINICAL CANCER RESEARCH, vol. 8, no. 10, October 2002 (2002-10-01), pages 3118 - 3130, XP008091334 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9670553B2 (en) | 2004-06-04 | 2017-06-06 | Biotheranostics, Inc. | Determining tumor origin |
US10538816B2 (en) | 2004-06-04 | 2020-01-21 | Biotheranostics, Inc. | Identification of tumors |
US11430544B2 (en) | 2005-06-03 | 2022-08-30 | Biotheranostics, Inc. | Identification of tumors and tissues |
WO2013002750A2 (en) * | 2011-06-29 | 2013-01-03 | Biotheranostics, Inc. | Determining tumor origin |
WO2013002750A3 (en) * | 2011-06-29 | 2013-05-10 | Biotheranostics, Inc. | Determining tumor origin |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10378066B2 (en) | Molecular diagnostic test for cancer | |
AU2012261820B2 (en) | Molecular diagnostic test for cancer | |
EP2925885B1 (en) | Molecular diagnostic test for cancer | |
US20200131586A1 (en) | Methods and compositions for diagnosing or detecting lung cancers | |
CA2993142A1 (en) | Gene signature for immune therapies in cancer | |
AU2012261820A1 (en) | Molecular diagnostic test for cancer | |
US20040018513A1 (en) | Classification and prognosis prediction of acute lymphoblastic leukemia by gene expression profiling | |
US20230366034A1 (en) | Compositions and methods for diagnosing lung cancers using gene expression profiles | |
US20170349951A1 (en) | Compositions and methods for diagnosing thyroid cancer | |
AU2013277971A1 (en) | Molecular malignancy in melanocytic lesions | |
KR20160057416A (en) | Molecular diagnostic test for oesophageal cancer | |
WO2014165753A1 (en) | Methods and compositions for diagnosis of glioblastoma or a subtype thereof | |
JP2011509689A (en) | Molecular staging and prognosis of stage II and III colon cancer | |
WO2017216559A1 (en) | Predicting responsiveness to therapy in prostate cancer | |
US20050186577A1 (en) | Breast cancer prognostics | |
WO2007137366A1 (en) | Diagnostic and prognostic indicators of cancer | |
US20210079479A1 (en) | Compostions and methods for diagnosing lung cancers using gene expression profiles | |
US20090297506A1 (en) | Classification of cancer | |
WO2019215394A1 (en) | Arpp19 as biomarker for haematological cancers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07719013 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07719013 Country of ref document: EP Kind code of ref document: A1 |