WO2007056332A2 - Molecular diagnosis of autoimmune diseases - Google Patents

Molecular diagnosis of autoimmune diseases Download PDF

Info

Publication number
WO2007056332A2
WO2007056332A2 PCT/US2006/043272 US2006043272W WO2007056332A2 WO 2007056332 A2 WO2007056332 A2 WO 2007056332A2 US 2006043272 W US2006043272 W US 2006043272W WO 2007056332 A2 WO2007056332 A2 WO 2007056332A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
expression level
gene
represented
genes
Prior art date
Application number
PCT/US2006/043272
Other languages
French (fr)
Other versions
WO2007056332A3 (en
Inventor
Thomas M. Aune
Nancy J. Olsen
Philip S. Crooke Iii
Cindy L. Vnencak-Jones
Sallyanne C. Fossey
Original Assignee
Vanderbilt University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vanderbilt University filed Critical Vanderbilt University
Publication of WO2007056332A2 publication Critical patent/WO2007056332A2/en
Publication of WO2007056332A3 publication Critical patent/WO2007056332A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the presently disclosed subject matter generally relates to the diagnosis of autoimmune disease. More specifically, the presently disclosed subject matter relates to a method for diagnosing an autoimmune disease, such as a multiple sclerosis syndrome.
  • Autoimmune diseases are heterogeneous diseases believed to arise from immune-mediated attack against self-antigens.
  • multiple sclerosis is the most common demyelinating disease of the central nervous system and develops from destruction of myelin sheaths.
  • Both genetic and environmental factors play important roles in the onset and pathogenesis of autoimmune diseases.
  • Epidemiologic data along with genetic linkage studies clearly support the presence of a genetic contribution to susceptibility to autoimmune disease.
  • autoimmune diseases can present difficulties to the clinician. For example, there is no single definitive laboratory test for MS; it remains a clinical diagnosis. Abnormal brain magnetic resonance imaging (MRI) findings and immunological changes in cerebrospinal fluid (elevated IgG index, presence of oligoclonal bands) raise clinical suspicion, but are not disease specific. Therefore, patients who present with features highly suspicious for MS, or clinically isolated syndromes (CIS), present a diagnostic challenge. The identification of biomarkers characteristic of MS can aid in its diagnosis.
  • MRI magnetic resonance imaging
  • CIS clinically isolated syndromes
  • the identification of gene expression signatures that are common to several autoimmune diseases and/or are unique to an individual autoimmune disease would be extremely useful for the diagnosis of autoimmune diseases including, but not limited to MS.
  • the presently disclosed subject matter provides methods and products for diagnosis of autoimmune disease based in part on determinations of differential gene expression in a biological sample.
  • the presently disclosed subject matter provides methods for detecting a multiple sclerosis (MS) syndrome in a subject.
  • the methods comprise (a) obtaining a biological sample from the subject; (b) determining expression levels for one or more genes in the biological sample, wherein the one or more genes are selected from among the genes represented by SEQ ID NOs: 1-39; and (c) comparing the expression levels of each of the one or more genes determined in step (b) with a standard, wherein the comparing detects a multiple sclerosis (MS) syndrome in the subject.
  • the comparing comprises (a) establishing an average expression level for each of the one or more genes in a population, wherein the population comprises statistically significant numbers of subjects with an MS syndrome and subjects that do not have an MS syndrome; (b) assigning a first value to each gene for which the expression level in the subject is higher than the average expression level in the population and a second value to each gene for which the expression level in the subject is lower than the average expression level in the population; and (c) adding the values assigned in step (b) to arrive at a sum, wherein the sum is indicative of a presence of an MS syndrome in the subject.
  • the comparing comprises calculating a ratio of the expression levels of the two or more genes represented by SEQ IS NOs. 1-9 to thereby detect the presence of a multiple sclerosis syndrome in the subject. /
  • the multiple sclerosis syndrome is selected from among relapsing remitting multiple sclerosis (RRMS); secondary progressive multiple sclerosis (SPMS); primary progressive multiple sclerosis (PPMS); clinically isolated syndrome (CIS); and combinations thereof.
  • the biological sample is a cell present in whole blood or a fraction thereof isolated from the subject.
  • the biological sample comprises a peripheral blood mononuclear cell.
  • the determining comprises a technique selected from the group consisting of a Northern blot, hybridization to a nucleic acid microarray, and a reverse transcription-polymerase chain reaction (RT-PCR).
  • RT-PCR is quantitative RT-PCR (Q-RT-PCR).
  • the determining is of the expression levels of at least three, four, or five genes represented by SEQ ID NOs: 1-9.
  • the calculating comprises calculating a ratio using an equation selected from among (a) the expression level of a gene product represented by SEQ ID NO: 3 divided by the expression level of a gene product represented by SEQ ID NO: 7; (b) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 3 divided by the expression level of a gene product represented by SEQ ID NO: 7 squared; (c) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 multiplied by the expression level of a gene product represented by SEQ ID NO: 6 divided by one thousand times the expression level of a gene product represented by SEQ ID NO: 7 cubed; and (d) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 squared multiplied by the expression
  • the presently disclosed subject matter also provides methods for diagnosing a multiple sclerosis (MS) syndrome in a subject comprising (a) providing an array comprising a plurality of nucleic acid sequences, wherein the plurality of nucleic acid sequences correspond to at least two of the gene products represented by SEQ ID NOs: 1-9; (b) providing a nucleic acid sample isolated from or generated from a biological sample from the subject; (c) hybridizing the nucleic acid sample to the array; (d) detecting nucleic acids on the array to which the nucleic acid sample hybridizes; (e) determining an expression level for each nucleic acid detected; and (f) calculating a ratio of the expression levels of the two or more genes determined in step (e) to thereby detect the presence of a multiple sclerosis syndrome in the subject.
  • MS multiple sclerosis
  • the multiple sclerosis syndrome is selected from among relapsing remitting multiple sclerosis (RRMS); secondary progressive multiple sclerosis (SPMS); primary progressive multiple sclerosis (PPMS); clinically isolated syndrome (CIS); and combinations thereof.
  • the array is selected from the group consisting of a microarray. chip and a membrane-based filter array.
  • the array comprises nucleic acid sequences that correspond to at least three, four, five, six, or all nine genes represented by SEQ ID NOs: 1-9.
  • the array comprises more than one identifying location for at least one of the gene products represented by SEQ ID NOs: 1-9.
  • the array further comprises at least one internal control gene.
  • the biological sample comprises a cell present in whole blood or a fraction thereof isolated from the subject.
  • the cell is a peripheral blood mononuclear cell.
  • the determining comprises a technique selected from the group consisting of a Northern blot, hybridization to a nucleic acid microarray, and a reverse transcription-polymerase chain reaction (RT-PCR).
  • RT-PCR reverse transcription-polymerase chain reaction
  • the RT-PCR is quantitative RT-PCR (Q-RT-PCR).
  • the determining is of the expression levels of at least two, three, four, five, six, or all nine genes represented by SEQ ID NOs: 1-9.
  • the calculating comprises calculating a ratio using an equation selected from among: (a) the expression level of a gene product represented by SEQ ID NO: 3 divided by (the expression level of a gene product represented by SEQ ID NO: 7; (b) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 3 divided by the expression level of a gene product represented by SEQ ID NO: 7 squared; (c) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 multiplied by the expression level of a gene product represented by SEQ ID NO: 6 divided by one thousand times the expression level of a gene product represented by SEQ ID NO: 7 cubed; and (d) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 squared multiplied by
  • the subject matter described herein providing method for detecting a multiple sclerosis (MS) syndrome in a subject can be implemented using a computer program product comprising computer executable instructions embodied in a computer-readable medium.
  • Exemplary computer-readable media suitable for implementing the subject matter described herein include chip memory devices, disk memory devices, programmable logic devices, application specific integrated circuits, and downloadable electrical signals.
  • a computer program product that implements the subject matter described herein can be located on a single device or computing platform or can be distributed across multiple devices and/or computing platforms.
  • kits comprising a plurality of oligonucleotide primers and instructions for employing the plurality of oligonucleotide primers to determine the expression level of at least two of the genes represented by SEQ ID NOs: 1-9.
  • the kits comprise oligonucleotide primers to determine the expression level of at least three, four, five, six, or all nine of the genes represented by SEQ ID NOs: 1-9.
  • the kits further comprise oligonucleotide primers to determine the expression level of a control gene.
  • the presently disclosed subject matter also provides methods for assigning an uncharacterized subject to one of two populations of subjects.
  • the methods comprise (a) acquiring an first input data set comprising a plurality of first gene expression levels, each of the plurality of first gene expression levels corresponding to an expression level of a gene product in a first population of subjects; (b) acquiring a second input data set comprising a plurality of second gene expression levels, each of the plurality of second gene expression levels corresponding to an expression level of a gene product in a second population of subjects; (c) calculating a first deterministic series of ratios between and among various combinations of the first gene expression levels and a second deterministic series of ratios between and among various combinations of the second gene expression levels; and (d) identifying one or more ratio values that differ in the first deterministic series of ratios from one or more related ratio values in the second deterministic series of ratios to a degree sufficient that the one or more ratios can be used predict whether an uncharacterized subject would be appropriately characterized as being a member
  • the first population is a population of subjects that do not have a multiple sclerosis syndrome and the second population is a population of subjects that do have a multiple sclerosis syndrome.
  • the plurality of first gene expression levels and the plurality of second gene expression levels correspond to one or more of the genes represented by SEQ ID NOs: 1-9.
  • the presently disclosed subject matter also provides computer program products comprising computer-executable instructions embodied in a computer- readable medium for performing steps comprising (a) acquiring an first input data set comprising a plurality of first gene expression levels, each of the plurality of first gene expression levels corresponding to an expression level of a gene product in a control population of subjects; (b) acquiring a second input data set comprising a plurality of second gene expression levels, each of the plurality of second gene expression levels corresponding to an expression level of a gene product in a test population of subjects; (c) calculating a first deterministic series of ratios between and among various combinations of the first gene expression levels and a second deterministic series of ratios between and among various combinations of the second gene expression levels; and (d) identifying one or more ratio values that differ in the first deterministic series of ratios from one ore more related ratio values in the second deterministic series of ratios to a degree sufficient that the one or more ratios can be used predict whether an uncharacterized subject would be appropriately characterized
  • the first population is a population of subjects that do not have a multiple sclerosis syndrome and the second population is a population of subjects that do have a multiple sclerosis syndrome.
  • the plurality of first gene expression levels and the plurality of second gene expression levels correspond to one or more of the genes represented by SEQ ID NOs: 1-9.
  • FIGS 1A-1 C depict the results of analyses of the 4-ratio discriminator in the MS and Control groups.
  • Figure 1 A is a plot presenting the results of individual test scores within each cohort determined using the expression equation: [CTSS X LLGL2 2 X TGM2] I [TAF11 2 X TP53 2 ].
  • Figure 1 B is a plot of the sensitivity (circles) and the specificity (squares) of the test as scoring threshold increases.
  • Figure 1C is a plot depicting the accuracy of test results with varying threshold.
  • Figures 2A-2C depict the results of analyses of the 4-ratio discriminator in different autoimmune and chronic diseases.
  • Figure 2A is a plot presenting the results of individual test scores within each cohort determined using the expression equation: [CTSS X LLGL2 2 X TGM2] I [TAF11 2 X TP53 2 ].
  • Figure 2B is a plot of the sensitivity (circles) and the specificity (rectangles) of the test as scoring threshold increases.
  • Figure 2C is a plot depicting the accuracy of test results with varying threshold.
  • Figures 3A and 3B are receiver operating characteristics (ROC) curves of the 1-, 2-, 3-, and 4-ratio discriminators. For both Figures 3A and 3B, the data points are as follows: circles: 1 -ratio discriminator; triangles: 2-ratio discriminator; squares: 3-ratio discriminator; and diamonds: 4-ratio discriminator.
  • Figure 3A is a ROC curve of MS versus control samples. True Positive
  • Figure 3B is a ROC curve comparing MS to controls, and other disease groups (from Figure 2A). TP and FP fractions were calculated as a function of test score.
  • Figure 4 is a plot presenting the results of individual MS test scores within each clinical sub-type. Individual test scores within each MS clinical sub-type were determined using the expression equation: [CTSS X LLGL2 2 X
  • Figure 5 is a bar graph presenting average test scores as a function of type of therapy.
  • the MS patient samples were segregated into all MS patients, patients receiving a /?-interferon, patients not receiving ⁇ -interferon, and patients receiving Copaxone. Average scores ⁇ standard deviation were calculated and P values were calculated using the Mann-Whitney test.
  • SEQ ID NO: 1 is a nucleic acid sequence of a cDNA corresponding to a human ARP1 actin-related protein 1 homolog A, centractin alpha (yeast) (ACTR1A) gene product (GENBANK® Accession No. NM_005736).
  • SEQ ID NO: 2 is a nucleic acid sequence of a cDNA corresponding to a human breast cancer 1 , early onset (BRCA1 ), transcript variant BRCA1 a gene product (GENBANK® Accession No. NM_007294).
  • SEQ ID NO: 3 is a nucleic acid sequence of a cDNA corresponding to a human cathepsin S (CTSS) gene product (GENBANK® Accession No. NM_004079).
  • SEQ ID NO: 4 is a nucleic acid sequence of a cDNA corresponding to a human epoxide hydrolase 2, cytoplasmic (EPHX2) gene product (GENBANK® Accession No. NM_001979).
  • SEQ ID NO: 5 is a nucleic acid sequence of a cDNA corresponding to a human lethal giant larvae homolog 2 (LLGL2) gene product (GENBANK® Accession No. NM_004524).
  • SEQ ID NO: 6 is a nucleic acid sequence of a cDNA corresponding to a human Spi-B transcription factor (Spi-1/PU.1 related; SPIB) gene product (GENBANK® Accession No. NM_003121 ).
  • SEQ ID NO: 7 is a nucleic acid sequence of a cDNA corresponding to a human TATA box binding protein (TBP)-associated factor 11 (TAF11 ) RNA polymerase II, 28 kilodalton (kDa) gene product (GENBANK® Accession No. NM_005643).
  • TBP TATA box binding protein
  • TAF11 TAF11
  • kDa 28 kilodalton
  • SEQ ID NO: 8 is a nucleic acid sequence of a cDNA corresponding to a human transglutaminase 2 (TGM2) gene product (GENBANK® Accession No. NMJD04613).
  • SEQ ID NO: 9 is a nucleic acid sequence of a cDNA corresponding to a human tumor protein p53 (TP53; Li-Fraumeni syndrome) gene product (GENBANK® Accession No. NM_000546).
  • the presently disclosed subject matter relates to methods for detecting an autoimmune disorder (e.g., a multiple sclerosis syndrome) in a subject by analyzing gene expression profiles for selected genes in biological samples isolated from the subject and comparing the gene expression profiles to standards.
  • the methods involve determining the expression levels of a set of genes expressed biological samples (e.g. , whole blood or cells isolated therefrom) from a subject suspected of having an autoimmune disease and comparing the expression levels of these genes with the levels of expression of these genes in normal subjects and subjects with confirmed autoimmune diseases.
  • Using the methods of the presently disclosed subject matter it is possible to determine whether or not a subject has an autoimmune disease (for example, a multiple sclerosis syndrome) or whether the subject does not have autoimmune disease.
  • the expression levels of many genes can be analyzed simultaneously using microarrays or membrane-based filter arrays.
  • a representative filter array is the GF211 Human "Named Genes” GENEFILTERS® Microarrays Release 1 (available from RESGENTM, a division of Invitrogen Corporation, Carlsbad, California, United States of America), although other arrays can also be used.
  • GENEFILTERS® Microarrays Release 1 available from RESGENTM, a division of Invitrogen Corporation, Carlsbad, California, United States of America
  • Multiple sclerosis is a demyelinating disease of the central nervous system with a presumed autoimmune etiology.
  • Quantitative real-time PCR (Q- RT-PCR) analysis was employed to identify a minimum number of genes for which transcript levels discriminated multiple sclerosis subjects from subjects with other chronic diseases and controls.
  • a computer program was employed to search quantitative transcript levels to identify optimum ratios that distinguished among the different categories.
  • MS from the other autoimmune diseases A focus was placed on MS because it is one of the more difficult autoimmune diseases to diagnose.
  • Q-RT-PCR was employed to measure transcript levels of genes identified from microarray data that were either control genes (equivalent transcript levels in subjects with autoimmune disease and control individuals) or test genes (different transcript levels between autoimmune subjects and controls).
  • a new algorithm that would give each gene in the analysis equal weight but would also provide more accurate weight to quantitative differences in transcript levels was developed. Using this analysis, it was possible to distinguish subjects with MS from control subjects and subjects with other diseases including autoimmune disease in a retrospective analysis. IL .
  • the term "about,” when referring to a value or to an amount of mass, weight, time, volume, concentration or percentage is meant to encompass variations of in some embodiments ⁇ 20%, in some embodiments ⁇ 10%, in some embodiments ⁇ 5%, in some embodiments ⁇ 1 %, in some embodiments ⁇ 0.5%, in some embodiments ⁇ 0.1%, and in some embodiments ⁇ 0.1 % from the specified amount, as such variations are appropriate to perform the disclosed methods or employ the disclosed compositions.
  • “significance” or “significant” relates to a statistical analysis of the probability that there is a non-random association between two or more entities.
  • a relationship is “significant” or has “significance”
  • statistical manipulations of the data can be performed to calculate a probability, expressed as a "P value”.
  • P value Those P values that fall below a user-defined cutoff point are regarded as significant.
  • the phrase "multiple sclerosis syndrome” refers generally to any disorder that would be classified as a multiple sclerosis (MS) or a precursor thereto.
  • RRMS relapsing remitting MS
  • PPMS primary progressive MS
  • SPMS secondary progressive MS
  • pre-MS also called clinically isolated syndrome
  • CIS clinically isolated syndrome
  • nucleic acid molecules employed in accordance with the presently disclosed subject matter include any nucleic acid molecule for which expression is desired to be assessed in evaluating the presence or absence of an autoimmune disease.
  • Representative nucleic acid molecules include, but are not limited to, the isolated nucleic acid molecules of any one of SEQ ID NOs: 1 - 9, complementary DNA molecules, sequences having 80%, 85%, 90%, 92%, 94%, 95%, 96%, 98%, 99%, or greater than 99% identity to a nucleic acid sequence of any one of SEQ ID NOs: 1 -9, sequences capable of hybridizing to any one of SEQ ID NOs: 1-9 under conditions disclosed herein, and corresponding RNA molecules.
  • nucleic acid and “nucleic acid molecule” refer to any of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), oligonucleotides, fragments generated by the polymerase chain reaction (PCR), and fragments generated by any of ligation, scission, endonuclease action, and exonuclease action.
  • Nucleic acids can comprise monomers that are naturally occurring nucleotides (such as deoxyribonucleotides and ribonucleotides), or analogs of naturally occurring nucleotides (e.g., ⁇ -enantiomeric forms of naturally occurring nucleotides), or a combination of both.
  • Modified nucleotides can have modifications in sugar moieties and/or in pyrimidine or purine base moieties.
  • Sugar modifications include, for example, replacement of one or more hydroxyl groups with halogens, alkyl groups, amines, and azido groups.
  • Sugars can also be functionalized as ethers or esters.
  • the entire sugar moiety can be replaced with sterically and electronically similar structures, such as aza-sugars and carbocyclic sugar analogs.
  • modifications in a base moiety include alkylated purines and pyrimidines, acylated purines or pyrimidines, or other well-known heterocyclic substitutes.
  • Nucleic acid monomers can be linked by phosphodiester bonds or analogs of phosphodiester bonds. Analogs of phosphodiester linkages include phosphorothioate, phosphorodithioate, phosphoroselenoate, phosphorodiselenoate, phosphoroanilothioate, phosphoranilidate, phosphoramidate, and the like.
  • nucleic acid molecule or nucleotide sequence
  • nucleic acid can also be used in place of "gene”, “cDNA”, or "mRNA”.
  • Nucleic acids can be derived from any source, including any organism. In some embodiments, a nucleic acid is derived from a biological sample isolated from a subject.
  • sequence refers to a sequence of nucleic acids that comprises a part of a longer nucleic acid sequence.
  • An exemplary subsequence is a probe, or a primer.
  • primer refers to a contiguous sequence comprising in one example about 8 or more deoxyribonucleotides or ribonucleotides, in another example 10-20 nucleotides, and in yet another example 20-30 nucleotides of a selected nucleic acid molecule.
  • the primers disclosed herein encompass oligonucleotides of sufficient length and appropriate sequence so as to provide initiation of polymerization on a target nucleic acid molecule.
  • elongated sequence refers to an addition of nucleotides (or other analogous molecules) incorporated into the nucleic acid.
  • a polymerase e.g., a DNA polymerase
  • the nucleotide sequence can be combined with other DNA sequences, such as promoters, promoter regions, enhancers, polyadenylation signals, intronic sequences, additional restriction enzyme sites, multiple cloning sites, and other coding segments.
  • the phrases "open reading frame” and "ORF” are given their common meaning and refer to a contiguous series of deoxyribonucleotides or ribonucleotides that encode a polypeptide or a fragment of a polypeptide.
  • the ORF will be discontinuous in the genome. Splicing produces a continuous ORF that can be translated to produce a polypeptide.
  • the complete ORF includes those nucleic acid sequences beginning with the start codon and ending with the stop codon.
  • the ORF includes those nucleic acid sequences present in the non-full-length cDNA that are included within the complete ORF of the corresponding full-length cDNA.
  • coding sequence is used interchangeably with “open reading frame” and “ORF” and refers to a nucleic acid sequence that is transcribed into RNA including, but not limited to mRNA, rRNA, tRNA, snRNA, sense RNA, or antisense RNA.
  • RNA can then be translated in vitro or in vivo to produce a protein.
  • complementary and complementary sequences refer to two nucleotide sequences that comprise antiparallel nucleotide sequences capable of pairing with one another upon formation of hydrogen bonds between base pairs.
  • complementary sequences means nucleotide sequences which are substantially complementary, as can be assessed by the same nucleotide comparison set forth herein, or is defined as being capable of hybridizing to the nucleic acid segment in question under relatively stringent conditions such as those described herein.
  • a complementary sequence is at least 80% complementary to the nucleotide sequence with which is it capable of pairing.
  • a complementary sequence is at least 85% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at least 90% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at least 95% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at least 98% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at least 99% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at 100% complementary to the nucleotide sequence with which is it capable of pairing.
  • a particular example of a complementary nucleic acid segment is an antisense oligonucleotide.
  • gene refers broadly to any segment of DNA associated with a biological function.
  • a gene encompasses sequences including, but not limited to a coding sequence, a promoter region, a transcriptional regulatory sequence, a non-expressed DNA segment that is a specific recognition sequence for regulatory proteins, a non-expressed DNA segment that contributes to gene expression, a DNA segment designed to have desired parameters, or combinations thereof.
  • a gene can be obtained by a variety of methods, including isolation or cloning from a biological sample, synthesis based on known or predicted sequence information, and recombinant derivation of an existing sequence.
  • a reference gene is a gene, a cDNA, or an EST for which the nucleic acid sequence has been determined (i.e. is known).
  • a reference gene is represented by one of the nucleic acid sequences disclosed in SEQ ID NOs: 1-9.
  • a reference gene is represented by a nucleic acid sequence complementary to one of the nucleic acid sequences disclosed in SEQ ID NOs: 1-9.
  • a reference gene is represented by a nucleic acid sequence having 80% or higher identity to any one of SEQ ID NOs: 1-9. In some embodiments, a reference gene is represented by a nucleic acid sequence capable of hybridizing to any one of SEQ ID NOs: 1-9 under conditions disclosed herein. In some embodiments, a reference gene is represented by an RNA molecule corresponding to any one of SEQ ID NOs: 1 - 9. In some embodiments, a reference gene is represented by a nucleic acid sequence present on an array. As used herein, the terms "corresponding to" and "representing",
  • nucleic acid sequence corresponding to or representing a gene refers to a nucleic acid sequence that results from transcription, reverse transcription, or replication from a particular genetic locus, gene, or gene product (for example, an mRNA).
  • an EST, partial cDNA, or full- length cDNA corresponding to a particular reference gene is a nucleic acid sequence that one of ordinary skill in the art would recognize as being a product of either transcription or replication of that reference gene (for example, a product produced by transcription of the reference gene).
  • the EST, partial cDNA, or full-length cDNA itself is produced by in vitro manipulation to convert the mRNA into an EST or cDNA, for example by reverse transcription of an isolated RNA molecule that was transcribed from the reference gene.
  • the product of a reverse transcription is a double-stranded DNA molecule, and that a given strand of that double-stranded molecule can embody either the coding strand or the non-coding strand of the gene.
  • sequences presented in the Sequence Listing are single-stranded, however, and it is to be understood that the presently disclosed subject matter is intended to encompass the genes represented by the sequences presented in SEQ ID NOs: 1-9, including the specific sequences set forth as well as the reverse/complement of each of these sequences.
  • a known gene and/or reference gene also includes, but is not limited to those genes that have been identified as being differentially expressed in autoimmune patients versus normal patients, such as but not limited to those set forth in SEQ ID NOs: 1-9.
  • a reference gene is also intended to include nucleic acid sequences that substantially hybridize to one of such genes, including but not limited to one of the nucleic acid sequences disclosed in SEQ ID NOs: 1-9.
  • a reference gene includes a nucleic acid sequence that has one or more polymorphisms such that while the particular nucleic acid sequence might diverge somewhat from one of such genes, including but not limited to one of those disclosed in SEQ ID NOs: 1-9, one of ordinary skill in the art would nonetheless recognize the particular nucleic acid sequence as corresponding to a gene represented by one of such genes, including but not limited to one of the sequences disclosed in SEQ ID NOs: 1-9.
  • the GENBANK® database has at least three accession numbers that are identified as corresponding to the human breast cancer 1 , early onset (BRCA1 ) mRNA.
  • transcript variants a, a', and b represent transcript variants a, a', and b, and have accession numbers NM_007294, NM_007295, and NM_007296, respectively. It is understood that the presently disclosed subject matter, which identifies NM_007294 as SEQ ID NO: 2, also encompasses the other transcript variants.
  • a reference gene is also intended to include nucleic acid sequences that substantially hybridize to a nucleic acid corresponding to a gene represented by one of the nucleic acid sequences disclosed in SEQ ID NOs: 1-9.
  • a reference gene includes a nucleic acid sequence that has one or more polymorphisms such that while the particular nucleic acid sequence might diverge somewhat from those disclosed in SEQ ID NOs: 1 -9, one of ordinary skill in the art would nonetheless recognize the particular nucleic acid sequence as corresponding to a gene represented by one of the sequences disclosed in SEQ ID NOs: 1-9.
  • gene expression generally refers to the cellular processes by which a biologically active polypeptide is produced from a DNA sequence. Generally, gene expression comprises the processes of transcription and translation, along with those modifications that normally occur in the cell to modify the newly translated protein to an active form and to direct it to its proper subcellular or extracellular location.
  • gene expression level and “expression level” as used herein refer to an amount of gene-specific RNA or polypeptide that is present in a biological sample. When used in relation to an RNA molecule, the term “abundance” can be used interchangeably with the terms “gene expression level” and "expression level”.
  • control gene can be, for example, a known quantity of a nucleic acid derived from a gene for which the expression level is either known or can be accurately determined, unknown expression levels of other genes can be compared to the known internal control.
  • an appropriate internal control could be a housekeeping gene (e.g. glucose-6- phosphate dehydrogenase or elongation factor-1), a ideal housekeeping gene being defined as a gene for which the expression level in all cell types and under all conditions is the same.
  • a housekeeping gene e.g. glucose-6- phosphate dehydrogenase or elongation factor-1
  • a ideal housekeeping gene being defined as a gene for which the expression level in all cell types and under all conditions is the same.
  • Use of such an internal control allows relative expression levels to be determined (e.g. relative to the expression of the housekeeping gene) both for the nucleic acids present on the solid support and also between different experiments using the same solid support. This discrete expression level can then be normalized to a value relative to the expression level of the control gene (for example, a housekeeping gene).
  • the term "normalized”, and grammatical derivatives thereof, refers to a manipulation of discrete expression level data wherein the expression level of a reference gene is expressed relative to the expression level of a control gene.
  • the expression level of the control gene can be set at 1 , and the expression levels of all reference genes can be expressed in units relative to the expression of the control gene.
  • average expression level refers to the mean expression level, in whatever units are chosen, of a gene in a particular biological sample of a population. To determine an average expression level, a population is defined, and the expression level of the gene in that population is determined for each member of the population by analyzing the same biological sample from each member of the population. The determined expression levels are then added together, and the sum is divided by the number of members in the population.
  • average expression level is also used to refer to a calculated value that can be used to compare two populations.
  • the average expression level in a population consisting of all patients regardless of autoimmune disease status can be calculated using the method above for a population that consists of statistically significant numbers of patients with and without autoimmune disease (the latter can also be referred to as the "unaffected subpopulation").
  • the population is made up of unequal numbers of patients with and without autoimmune disease, the calculated value for all genes differentially expressed in these two subpopulations will likely be skewed towards the expression level determined for the subpopulation having the greater number of members.
  • the average expression level in the described population can also be calculated by: (a) determining the average expression level of a gene in the autoimmune patient subpopulation; (b) determining the average expression level of the same gene in the unaffected subpopulation; (c) adding the two determined values together; and (d) dividing the sum of the two determined values by 2 to achieve a value: this value also being defined herein as an "average expression level".
  • a profile can be created.
  • the term “profile” refers to a repository of the expression level data that can be used to compare the expression levels of different genes among various subjects. For example, for a given subject, the term “profile” can encompass the expression levels of all genes detected in whatever units (as described herein above) are chosen.
  • a standard is prepared by determining the average expression level of a gene in a normal population, a normal population being defined as subjects that do not have autoimmune disease.
  • a standard is prepared by determining the average expression level of a gene in a population of subjects that have an autoimmune disease (for example, RA, MS, IDDM, and/or SLE).
  • a standard is prepared by determining the average expression level of a gene in the population as a whole (i.e. subjects are grouped together irrespective of autoimmune disease status).
  • a standard is prepared by determining the average expression level of a gene in a normal population, the average expression level of a gene in an autoimmune population, adding those two values, and dividing the sum by two to determine the midpoint of the average expression in these populations.
  • a profile for a "new" subject can be compared to the standard, and the profile can further comprise data indicating whether for each gene, the expression level in the new subject is higher or lower than the expression level of that gene in the standard.
  • a new subject's profile can comprise a score of "1" for each gene for which the expression in the subject is higher than in the standard, and a score of "0" for each gene for which the expression in the subject is lower than in the standard.
  • a profile can comprise an overall "score", the score being defined as the sum total of all the ones and zeroes present in the profile.
  • isolated indicates that the nucleic acid molecule exists apart from its native environment and is not a product of nature.
  • An isolated DNA molecule can exist in a purified form or can exist in a non-native environment such as, for example, in a host cell transformed with a vector comprising the DNA molecule.
  • percent identity and percent identical in the context of two nucleic acid or protein sequences, refer to two or more sequences or subsequences that have in some embodiments at least 60%, in some embodiments at least 70%, in some embodiments at least 80%, in some embodiments at least 85%, in some embodiments at least 90%, in some embodiments at least 95%, in some embodiments at least 98%, and in some embodiments at least 99% nucleotide or amino acid residue identity, when compared and aligned for maximum correspondence, as measured using one of the following sequence comparison algorithms or by visual inspection.
  • the percent identity exists in some embodiments over a region of the sequences that is at least about 50 residues in length, in some embodiments over a region of at least about 100 residues, and in some embodiments the percent identity exists over at least about 150 residues. In some embodiments, the percent identity exists over the entire length of a given region, such as a coding region.
  • a nucleic acid is at least 80% identical to one of SEQ ID NOs: 1-9.
  • sequence comparison typically one sequence acts as a reference sequence to which test sequences are compared.
  • test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated.
  • sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.
  • Optimal alignment of sequences for comparison can be conducted, for example, by the local homology algorithm described in Smith & Waterman (1981 ) Adv Appl Math 2:482-489, by the homology alignment algorithm described in Needleman &Wunsch (1970) J MoI Biol 48:443-453, by the search for similarity method described in Pearson & Lipman (1988) Proc Natl Acad Sci U S A 85:2444-2448, by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the GCG Wisconsin Package, available from Accelrys, Inc., San Diego, California, United States of America), or by visual inspection. See generally, Ausubel et al. (1994) Current Protocols in Molecular Biology.
  • HSPs high scoring sequence pairs
  • initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them.
  • the word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always ⁇ 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when the cumulative alignment score falls off by the quantity X from its maximum achieved value, the cumulative score goes to zero or below due to the accumulation of one or more negative-scoring residue alignments, or the end of either sequence is reached.
  • the BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment.
  • the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix. See Henikoff & Henikoff (1992) Proc ⁇ /afMcad Sc/ 1/ S /I 89: 10915-10919.
  • the BLAST algorithm In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences. See e.g., Karlin & Altschul (1993) Proc Natl Acad Sci U S A 90:5873-5877.
  • One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
  • a test nucleic acid sequence is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid sequence to the reference nucleic acid sequence is in some embodiments less than about 0.1 , in some embodiments less than about 0.01 , and in some embodiments less than about 0.001.
  • substantially identical in the context of two nucleotide sequences, refers to two or more sequences or subsequences that have in some embodiments at least about 80% nucleotide identity, in some embodiments at least about 85% nucleotide identity, in some embodiments at least about 90% nucleotide identity, in some embodiments at least about 95% nucleotide identity, in some embodiments at least about 98% nucleotide identity, and in some embodiments at least about 99% nucleotide identity, when compared and aligned for maximum correspondence, as measured using one of the following sequence comparison algorithms or by visual inspection.
  • polymorphic sequences can be substantially identical sequences.
  • the term "polymorphic" refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population. An allelic difference can be as small as one base pair. Nonetheless, one of ordinary skill in the art would recognize that the polymorphic sequences correspond to the same gene.
  • SEQ ID NO: 9 is a cDNA sequence representing a human TP53 gene product that is present in the GENBANK® database under Accession Number NM_000546.
  • the TP53 gene is characterized by polymorphisms at nucleotide positions 390, 466, 1470, 1927, 1950, 1976, 1977, 2075, 2076, 2497, and 2498. Nucleic acid sequences comprising any or all of these polymorphisms are substantially identical to SEQ ID NO: 9, and thus are intended to be encompassed within the claimed subject matter.
  • nucleic acid sequences are substantially identical in that the two molecules specifically or substantially hybridize to each other under stringent conditions.
  • two nucleic acid sequences being compared can be designated a "probe sequence” and a "target sequence".
  • a “probe sequence” is a reference nucleic acid molecule
  • a "'target sequence” is a test nucleic acid molecule, often found within a heterogeneous population of nucleic acid molecules.
  • a “target sequence” is synonymous with a "test sequence”.
  • An exemplary nucleotide sequence employed for hybridization studies or assays includes probe sequences that are complementary to or mimic in some embodiments at least an about 14 to 40 nucleotide sequence of a nucleic acid molecule of the presently disclosed subject matter.
  • probes comprise 14 to 20 nucleotides, or even longer where desired, such as 30, 40, 50, 60, 100, 200, 300, or 500 nucleotides or up to the full length of any of the genes represented by SEQ ID NOs: 1-9.
  • Such fragments can be readily prepared by, for example, directly synthesizing the fragment by chemical synthesis, by application of nucleic acid amplification technology, or by introducing selected sequences into recombinant vectors for recombinant production.
  • hybridizing specifically to refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex nucleic acid mixture (e.g., total cellular DNA or RNA).
  • hybridizing substantially to refers to complementary hybridization between a probe nucleic acid molecule and a target nucleic acid molecule and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired hybridization.
  • Stringent hybridization conditions and “stringent hybridization wash conditions” in the context of nucleic acid hybridization experiments such as Southern and Northern blot analysis are both sequence- and environment- dependent. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biologv- Hybridization with Nucleic Acid Probes. Elsevier, New York, United States of America. Generally, highly stringent hybridization and wash conditions are selected to be about 5 0 C lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength and pH. Typically, under “stringent conditions” a probe will hybridize specifically to its target subsequence, but to no other sequences.
  • T m thermal melting point
  • the T m is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe.
  • Very stringent conditions are selected to be equal to the T m for a particular probe.
  • An example of stringent hybridization conditions for Southern or Northern Blot analysis of complementary nucleic acids having more than about 100 complementary residues is overnight hybridization in 50% formamide with 1 mg of heparin at 42 0 C.
  • An example of highly stringent wash conditions is 15 minutes in 0.1x SSC, SM NaCI at 65°C.
  • An example of stringent wash conditions is 15 minutes in 0.2x SSC buffer at 65°C (see Sambrook & Russell (2001 ) Molecular Cloning: A Laboratory Manual, 3 rd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, United States of America, for a description of SSC buffer). Often, a high stringency wash is preceded by a low stringency wash to remove background probe signal.
  • An example of medium stringency wash conditions for a duplex of more than about 100 nucleotides is 15 minutes in 1X SSC at 45°C.
  • An example of low stringency wash for a duplex of more than about 100 nucleotides is 15 minutes in 4-6X SSC at 40 0 C.
  • stringent conditions typically involve salt concentrations of less than about 1 M Na + ion, typically about 0.01 to 1 M Na + ion concentration (or other salts) at pH 7.0-8.3, and the temperature is typically at least about 30 0 C.
  • Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide.
  • destabilizing agents such as formamide.
  • a signal to noise ratio of 2-fold (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization.
  • a probe nucleotide sequence hybridizes in one example to a target nucleotide sequence in 7% sodium dodecyl sulfate (SDS), 0.5M NaPO 4 , 1 mm EDTA at 5O 0 C followed by washing in 2X SSC, 0.1 % SDS at 5O 0 C; in another example, a probe and target sequence hybridize in 7% SDS, 0.5M NaPO 4 , 1 mm EDTA at 50 0 C followed by washing in 1X SSC, 0.1 % SDS at 50°C; in another example, a probe and target sequence hybridize in 7% SDS, 0.5M NaPO 4 , 1 mm EDTA at 5O 0 C followed by washing in 0.5X SSC, 0.1 % SDS at 50°C; in another example,
  • hybridization conditions comprise hybridization in a roller tube for at least 12 hours at 42 0 C.
  • Pre-made hybridization solutions are also commercially available from various suppliers.
  • a hybridization solution comprises M1CROHYBTM (RESGENTM), and in some embodiments a hybridization solution comprises MICROHYBTM further comprising 5.0 ⁇ g COT-1® DNA (Invitrogen Corporation, Carlsbad, California, United States of America) and 5.0 ⁇ g poly- dA.
  • post-hybridization wash conditions comprise two washes in 2X SSC/1% SDS at 50 0 C for 20 minutes each followed by a third wash in 0.5X SSC/1 % SDS at 55°C for 15 minutes.
  • the term "purified”, when applied to a nucleic acid or protein, denotes that the nucleic acid or protein is essentially free of other cellular components with which it is associated in the natural state. It can be in a homogeneous state although it also can be in either a dry or aqueous solution. Purity and homogeneity are typically determined using analytical chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid chromatography. A protein that is the predominant species present in a preparation is substantially purified. The term “purified” denotes that a nucleic acid or protein gives rise to essentially one band in an electrophoretic gel. Particularly, it means that the nucleic acid or protein is in some embodiments at least about 50% pure, in some embodiments at least about 85% pure, and in some embodiments at least about 99% pure. II. B. Biological Samples
  • biomolecules include, but are not limited to total RNA, mRNA, and polypeptides.
  • a biological sample can comprise a cell or a group of cells. Any cell or group of cells can be used with the methods of the presently disclosed subject matter, although cell-types and organs that would be predicted to show differential gene expression in subjects with autoimmune disease versus normal subjects are best suited.
  • gene expression levels are determined where the biological sample comprises a cell isolated from a biological fluid including, but not limited to whole blood or a fraction thereof.
  • a biological sample comprises PBMCs that have been isolated from a subject.
  • the biological sample comprises one or more of the constituent cell types that make up a PBMC preparation, including but not limited to T cells, B cells, monocytes, and NK/NKT cells.
  • a representative PMBC preparation can comprise about 75% T cells, about 5% to about 10% B cells, about 5% to about 10% monocytes, and a small percentage of NK/NKT cells.
  • the biological sample comprises epithelial cells, such as cheek epithelial cells. Also encompassed within the phrase "biological sample" are biomolecules that are derived from a cell or group of cells that permit gene expression levels to be determined, e.g. nucleic acids and polypeptides.
  • the expression level of the gene can be determined using molecular biology techniques that are well known in the art. For example, if the expression level is to be determined by analyzing RNA isolated from the biological sample, techniques for determining the expression level include, but are not limited to Northern blotting, quantitative PCR (e.g., Q-RT-PCR), and the use of nucleic acid arrays and microarrays.
  • the expression level of a gene is determined by hybridizing 33 P-labeled cDNA generated from total RNA isolated from a biological sample to one or more DNA sequences representing one or more genes that has been affixed to a solid support, e.g. a membrane.
  • a membrane comprises nucleic acids representing many genes (including internal controls)
  • the relative expression level of many genes can be determined.
  • the presence of internal control sequences on the membrane also allows experiment-to-experiment variations to be detected, yielding a strategy whereby the raw expression data derived from each experiment can be compared from experiment-to-experiment.
  • gene expression can be determined by analyzing protein levels in a biological sample using antibodies.
  • Representative antibody-based techniques include, but are not limited to immunoprecipitation, Western blotting, and the use of immunoaffinity columns.
  • the presently disclosed subject matter encompasses use of a sufficiently large biological sample to enable a comprehensive survey of low abundance nucleic acids in the sample.
  • the sample can optionally be concentrated prior to isolation of nucleic acids.
  • concentration have been developed that alternatively use slide supports (Kohsaka & Carson (1994)
  • SEPHADEX® matrix (Sigma, St. Louis, Missouri,
  • Methods for nucleic acid isolation can comprise simultaneous isolation of total nucleic acid, or separate and/or sequential isolation of individual nucleic acid types (e.g., genomic DNA, cDNA, organelle DNA, genomic RNA, mRNA, polyA + RNA, rRNA, tRNA) followed by optional combination of multiple nucleic acid types into a single sample.
  • individual nucleic acid types e.g., genomic DNA, cDNA, organelle DNA, genomic RNA, mRNA, polyA + RNA, rRNA, tRNA
  • RNA isolation methods are known to one of skill in the art. See
  • Simple and semi-automated extraction methods can also be used for nucleic acid isolation, including for example, the SPLIT SECONDTM system (Boehringer Mannheim, Indianapolis, Indiana, United States of America), the TRIZOLTM Reagent system (Life Technologies, Gaithersburg, Maryland, United States of America), and the FASTPREPTM system (Bio 101 , La JoIIa, California, United States of America). See also Paladichuk (1999) The Engineer 13(16):20-23.
  • Nucleic acids that are used for subsequent amplification and labeling can be analytically pure as determined by spectrophotometric measurements or by visual inspection following electrophoretic resolution.
  • the nucleic acid sample can be free of contaminants such as polysaccharides, proteins, and inhibitors of enzyme reactions.
  • RNA sample When an RNA sample is intended for use as probe, it can be free of nuclease contamination. Contaminants and inhibitors can be removed or substantially reduced using resins for DNA extraction (e.g., CHELEXTM 100 from BioRad Laboratories, Hercules, California, United States of America) or by standard phenol extraction and ethanol precipitation.
  • Isolated nucleic acids can optionally be fragmented by restriction enzyme digestion or shearing prior to amplification. lll.C.
  • template nucleic acid and “target nucleic acid” as used herein each refers to nucleic acids isolated from a biological sample as described herein above.
  • template nucleic acid pool and “target nucleic acid pool” each refers to an amplified sample of "template nucleic acid”.
  • a target pool comprises amplicons generated by performing an amplification reaction using the template nucleic acid.
  • a target pool is amplified using a random amplification procedure as described herein.
  • target-specific primer refers to a primer that hybridizes selectively and predictably to a target sequence, for example a sequence that shows differential expression in a patient with an autoimmune disease relative to a normal patient, in a target nucleic acid sample.
  • a target-specific primer can be selected or synthesized to be complementary to known nucleotide sequences of target nucleic acids.
  • random primer refers to a primer having an arbitrary sequence.
  • the nucleotide sequence of a random primer can be known, although such sequence is considered arbitrary in that it is not designed for complementarity to a nucleotide sequence of the target-specific probe.
  • random primer encompasses selection of an arbitrary sequence having increased probability to be efficiently utilized in an amplification reaction.
  • Random Oligonucleotide Construction Kit (ROCK; available from http://www.sru.edu/depts/artsci/bio/ROCK.htm) is a macro-based program that facilitates the generation and analysis of random oligonucleotide primers (Strain & Chmielewski (2001 ) BioTechniques 30:1286-1293).
  • Representative primers include, but are not limited to random hexamers and rapid amplification of polymorphic DNA (RAPD)-type primers as described in Williams et al. 1990.
  • a random primer can also be degenerate or partially degenerate as described in Telenius et al. (1992) Genomics 13:718-725. Briefly, degeneracy can be introduced by selection of alternate oligonucleotide sequences that can encode a same amino acid sequence.
  • random primers can be prepared by shearing or digesting a portion of the template nucleic acid sample. Random primers so- constructed comprise a sample-specific set of random primers.
  • heterologous primer refers to a primer complementary to a sequence that has been introduced into the template nucleic acid pool.
  • a primer that is complementary to a linker or adaptor is a heterologous primer.
  • Representative heterologous primers can optionally include a poly(dT) primer, a poly(T) primer, or as appropriate, a poly(dA) primer or a poly(A) primer.
  • primer refers to a contiguous sequence comprising in some embodiments about 6. or more nucleotides, in some embodiments about 10-20 nucleotides (e.g. 15-mer), and in some embodiments about 20-30 nucleotides (e.g. a 22-mer). Primers used to perform the method of the presently disclosed subject matter encompass oligonucleotides of sufficient length and appropriate sequence so as to provide initiation of polymerization on a nucleic acid molecule. 111.C.1. Quantitative RT-PCR
  • the abundance of specific imRNA species present in a biological sample is assessed by quantitative RT-PCR.
  • standard molecular biological techniques are used in conjunction with specific PCR primers to quantitatively amplify those mRNA molecules corresponding to the genes of interest.
  • Methods for designing specific PCR primers and for performing quantitative amplification of nucleic acids including mRNA are well known in the art. See e.g. Sambrook & Russell (2001 ) Molecular Cloning: A Laboratory Manual, 3 rd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, United States of America.
  • RNA can be amplified using a technique referred to as Amplified Antisense RNA (aaRNA).
  • aaRNA Amplified Antisense RNA
  • an oligo(dT) primer is synthesized such that the 5' end of the primer includes a T7 RNA polymerase promoter.
  • This oligonucleotide can be used to prime the poly(A) + mRNA population to generate cDNA.
  • second strand cDNA is generated using RNA nicking and priming (Sambrook & Russell (2001 ) Molecular Cloning: A Laboratory Manual, 3 rd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, United States of America).
  • the resulting cDNA is treated briefly with S1 nuclease and blunt-ended with T4 DNA polymerase.
  • the cDNA is then used as a template for transcription-based amplification using the T7 RNA polymerase promoter to direct RNA synthesis.
  • Eberwine et al. adapted the aaRNA procedure for in situ random amplification of RNA followed by target-specific amplification.
  • the successful amplification of under represented transcripts suggests that the pool of transcripts amplified by aaRNA is representative of the initial mRNA population (Eberwine et al. (1992) Proc Natl Acad Sci U S A 89:3010-3014). III.C.3.Global RNA Amplification.
  • U.S. Patent No. 6,066,457 to Hampson et al. describes a method for substantially uniform amplification of a collection of single stranded nucleic acid molecules such as RNA. Briefly, the nucleic acid starting material is anchored and processed to produce a mixture of directional shorter random size DNA molecules suitable for amplification of the sample.
  • any one of the above-mentioned PCR techniques or related techniques can be employed to perform the step of amplifying the nucleic acid sample.
  • such methods can be optimized for amplification of a particular subset of nucleic acid (e.g., specific mRNA molecules versus total mRNA), and representative optimization criteria and related guidance can be found in the art. See Cha & Thilly (1993) PCR Methods Appl 3:S18-29; Linz et al.
  • kits comprising a plurality of oligonucleotide primers that can be used in the methods of the presently disclosed subject matter to assess gene expression levels of genes of interest.
  • the kit can comprise oligonucleotide primers designed to be used to determine the expression level of one or more (e.g. 1 , 2, 3, 4, 5, 6, 7, 8, or all) of the genes set forth in SEQ ID NOs: 1-9.
  • the kit can comprise instructions for using the primers, including T/US2006/043272
  • the expression level of a gene in a biological sample is determined by hybridizing total RNA isolated from the biological sample to an array containing known quantities of nucleic acid sequences corresponding to known genes.
  • the array can comprise single- stranded nucleic acids (also referred to herein as "probes” and/or “probe sets”) in known amounts for specific genes, which can then be hybridized to nucleic acids isolated from the biological sample.
  • the array can be set up such that the nucleic acids are present on a solid support in such a manner as to allow the identification of those genes on the array to which the total RNA hybridizes.
  • the total RNA is hybridized to the array, and the genes to which the total. RNA hybridizes are detected using standard techniques.
  • the amplified nucleic acids are labeled with a radioactive nucleotide prior to hybridization to the array, and the genes on the array to which the RNA hybridizes are detected by autoradiography or phosphorimage analysis.
  • nucleic acids isolated from a biological sample are hybridized with a set of probes without prior labeling of the nucleic acids.
  • unlabeled total RNA isolated from the biological sample can be detected by hybridization to one or more labeled probes, the labeled probes being specific for those genes found to be useful in the methods of the presently disclosed subject matter (e.g. those genes represented by SEQ ID NOs: 1-9).
  • both the nucleic acids and the one or more probes include a label, wherein the proximity of the labels following hybridization enables detection.
  • the nucleic acids or probes/probe sets can be labeled using any detectable label. It will be understood to one of skill in the art that any suitable method for labeling can be used, and no particular detectable label or technique for labeling should be construed as a limitation of the disclosed methods.
  • Direct labeling techniques include incorporation of radioisotopic (e.g. 32 P, 33 P, or 35 S) or fluorescent nucleotide analogues into nucleic acids by enzymatic synthesis in the presence of labeled nucleotides or labeled PCR primers.
  • a radio-isotopic label can be detected using autoradiography or phosphorimaging.
  • a fluorescent label can be detected directly using emission and absorbance spectra that are appropriate for the particular label used.
  • Any detectable fluorescent dye can be used, including but not limited to fluorescein isothiocyanate (FITC), FLUOR XTM, ALEXA FLUOR® 488, OREGON GREEN® 488, 6-JOE ( ⁇ -carboxy ⁇ '. ⁇ '-dichloro ⁇ ' ⁇ '-dimethoxyfluorescein, succinimidyl ester), ALEXA FLUOR® 532, Cy3, ALEXA FLUOR® 546, TMR (tetramethylrhodamine), ALEXA FLUOR® 568, ROX (X-rhodamine), ALEXA FLUOR® 594, TEXAS RED®, BODIPY® 630/650, and Cy5 (available from Amersham Pharmacia Biotech, Piscataway, New Jersey, United States of America, or from Molecular Probes Inc., Eugene, Oregon, United States of America).
  • FITC fluorescein isothiocyanate
  • FLUOR XTM
  • Fluorescent tags also include sulfonated cyanine dyes (available from Li-Cor, Inc., Lincoln, Kansas, United States of America) that can be detected using infrared imaging. Methods for direct labeling of a heterogeneous nucleic acid sample are known in the art and representative protocols can be found in, for example, DeRisi ef a/. (1996) /Vaf GeneM 4:457- 460; Sapolsky & Lipshutz (1996) Genomics 33:445-456; Schena et al. (1995) Science 270:467-470; Schena etal. (1996) Proc Natl Acad Sci U S A 93: 10614- 10619; Shalon et al.
  • Indirect labeling techniques can also be used in accordance with the methods of the presently disclosed subject matter, and in some cases, can facilitate detection of rare target sequences by amplifying the label during the detection step.
  • Indirect labeling involves incorporation of epitopes, including recognition sites for restriction endonucleases, into amplified nucleic acids prior to hybridization with a set of probes. Following hybridization, a protein that binds the epitope is used to detect the epitope tag.
  • a biotinylated nucleotide can be included in the amplification reactions to produce a biotin-labeled nucleic acid sample.
  • the label can be detected by binding of an avidin-conjugated fluorophore, for example streptavidin-phycoerythrin, to the biotin label.
  • the label can be detected by binding of an avidin-horseradish peroxidase (HRP) streptavidin conjugate, followed by colorimetric detection of an HRP enzymatic product.
  • HRP avidin-horseradish peroxidase
  • the quality of probe or nucleic acid sample labeling can be approximated by determining the specific activity of label incorporation.
  • the specific activity of incorporation can be determined by the absorbance at 260 nm and 550 nm (for Cy3) or 650 nm (for Cy5) using published extinction coefficients (Randolph &. Waggoner (1995) Nucleic Acids Res 25:2923-2929).
  • Very high label incorporation (specific activities of >1 fluorescent molecule/20 nucleotides) can result in a decreased hybridization signal compared with probe with lower label incorporation.
  • Very low specific activity ⁇ 1 fluorescent molecule/100 nucleotides
  • nucleic acids isolated from a biological sample are hybridized to a microarray, wherein the microarray comprises nucleic acids corresponding to those genes to be tested (optionally also including one or more internal control genes).
  • the genes are immobilized on a solid support, such that each position on the support identifies a particular gene, and each gene the expression level of which is to be analyzed is represented one or more times on the solid support.
  • Solid supports include, but are not limited to nitrocellulose and nylon membranes. Solid supports can also be glass or silicon-based (i.e. "gene chips").
  • a microarray comprises a nylon membrane (for example, the GF211 Human "Named Genes” GENEFILTERS® Microarrays Release 1 available from RESGENTM).
  • a microarray can be assembled using any suitable method known to one of skill in the art, and any one microarray configuration or method of construction is not considered to be a limitation of the presently disclosed subject matter. Representative microarray formats that can be used in accordance with the methods of the presently disclosed subject matter are described herein below.
  • the substrate for printing the array should be substantially rigid and amenable to DNA immobilization and detection methods ⁇ e.g., in the case of fluorescent detection, the substrate must have low background fluorescence in the region of the fluorescent dye excitation wavelengths).
  • the substrate can be nonporous or porous as determined most suitable for a particular application. Representative substrates include, but are not limited to a glass microscope slide, a glass coverslip, silicon, plastic, a polymer matrix, an agar gel, a polyacrylamide gel, and a membrane, such as a nylon, nitrocellulose or ANAPORETM (Whatman, Maidstone, United Kingdom) membrane.
  • Porous substrates are preferred in that they permit immobilization of relatively large amount of probe molecules and provide a three-dimensional hydrophilic environment for biomolecular interactions to occur (Dubiley et al. (1997) Nucleic Acids Res 25:2259-2265; Yershov et al. (1996) Proc Natl Acad ScI U S A 93:4913-4918).
  • a BIOCHIP ARRAYERTM dispenser (Packard Instrument Company, Meriden, Connecticut, United States of America) can effectively dispense probes onto membranes such that the spot size is consistent among spots whether one, two, or four droplets were dispensed per spot (Englert (2000) in Schena M, ed, Microarrav Biochip Technology, pp. 231-246, Eaton Publishing, Natick, Massachusetts,
  • the array can also comprise a dot blot or a slot blot.
  • a microarray substrate for use in accordance with the methods of the presently disclosed subject matter can have either a two-dimensional (planar) or a three-dimensional (non-planar) configuration.
  • An exemplary three- dimensional microarray is the FLOW-THRUTM chip (Gene Logic, Inc., Gaithersburg, Maryland, United States of America), which has implemented a gel pad to create a third dimension.
  • Such a three-dimensional microarray can be constructed of any suitable substrate, including glass capillary, silicon, metal oxide filters, or porous polymers. See Yang et al. (1998) Science 282:2244- 2246; Steel et al. (2000) jn. Schena M, ed, Microarrav Biochip Technology, pp.
  • a FLOW-THRUTM chip (Gene Logic, Inc.) comprises a uniformly porous substrate having pores or microchannels connecting upper and lower faces of the chip. Probes are immobilized on the walls of the microchannels and a hybridization solution comprising sample nucleic acids can flow through the microchannels. This configuration increases the capacity for probe and target binding by providing additional surface relative to two-dimensional arrays. See U.S. Patent No. 5,843,767. V.B. Surface Chemistry
  • the particular surface chemistry employed is inherent in the microarray substrate and substrate preparation. Immobilization of nucleic acids probes post-synthesis can be accomplished by various approaches, including adsorption, entrapment, and covalent attachment. Preferably, the binding technique does not disrupt the activity of the probe.
  • a representative hetero-bifunctional cross- linker comprises gamma-maleimidobutyryloxy-succimide (GMBS) that can bind maleimide to a primary amine of a probe.
  • GMBS gamma-maleimidobutyryloxy-succimide
  • Procedures for using such linkers are known to one of skill in the art and are summarized in Hermanson (1990) Bioconiuqate Techniques, Academic Press, San Diego, California, United States of America.
  • a representative protocol for covalent attachment of DNA to silicon wafers is described in O'Donnell et al. (1997) Anal Chem 69:2438-2443.
  • the glass should be substantially free of debris and other deposits and have a substantially uniform coating.
  • Pretreatment of slides to remove organic compounds that can be deposited during their manufacture can be accomplished, for example, by washing in hot nitric acid. Cleaned slides can then be coated with 3- aminopropyltrimethoxysilane using vapor-phase techniques. After silane deposition, slides are washed with deionized water to remove any silane that is not attached to the glass and to catalyze unreacted methoxy groups to crosslink to neighboring silane moieties on the slide.
  • the uniformity of the coating can be assessed by known methods, for example electron spectroscopy for chemical analysis (ESCA) or ellipsometry (Ratner & Castner (1997) in Vickerman JC, ed, Surface Analysis: The Principal Techniques, John Wiley & Sons, New York, New York, United States of America; Schena et al. (1995) Science 270:467-470). See also Worley et al. (2000) in Schena M, ed, Microarrav Biochip Technology, pp. 65-86, Eaton Publishing, Natick, Massachusetts, United States of America.
  • noncovalent binding For attachment of probes greater than about 300 base pairs, noncovalent binding is suitable.
  • a representative technique for noncovalent linkage involves use of sodium isothiocyanate (NaSCN) in the spotting solution, as described in EXAMPLE 9.
  • NaSCN sodium isothiocyanate
  • amino-silanized slides can be used since this coating improves nucleic acid binding when compared to bare glass. This method works well for spotting applications that use about 100 ng/ ⁇ l (Worley et al. (2000) in Schena M, ed, Microarrav Biochip Technology, pp.
  • a microarray for the detection of gene expression levels in a biological sample can be constructed using any one of several methods available in the art including, but not limited to photolithographic and microfluidic methods, further described herein below.
  • the method of construction is flexible, such that a microarray can be tailored for a particular purpose.
  • a solid support for use in the presently disclosed subject matter comprises in some embodiments about 10 or more spots, in some embodiments about 100 or more spots, in some embodiments about 1 ,000 or more spots, and in some embodiments about 10,000 or more spots.
  • the volume deposited per spot is about 10 picoliters to about 10 nanoliters, and in some embodiments about 50 picoliters to about 500 picoliters.
  • the diameter of a spot is in some embodiments about 50 ⁇ m to about 1000 ⁇ m, and in some embodiments about 100 ⁇ m to about 250 ⁇ m.
  • a variation of the method called Digital Optical Chemistry, employs mirrors to direct light synthesis in place of photolithographic masks (PCT International Patent Application Publication No. WO 99/63385). This approach is generally limited to probes of about 25 nucleotides in length or less. See also Warrington et al. (2000) in Schena M, ed, Microarrav Biochip Technology, pp. 119-148, Eaton Publishing, Natick, Massachusetts, United States of America.. Contact Printing.
  • Several procedures and tools have been developed for printing microarrays using rigid pin tools. In surface contact printing, the pin tools are dipped into a sample solution, resulting in the transfer of a small volume of fluid onto the tip of the pins. Touching the pins or pin samples onto a microarray surface leaves a spot, the diameter of which is determined by the surface energies of the pin, fluid, and microarray surface.
  • the transferred fluid comprises a volume in the nanoliter or picoliter range.
  • a replicator pin is a tool for picking up a sample from one stationary location and transporting it to a defined location on a solid support.
  • a typical configuration for a replicating head is an array of solid pins, generally in an 8 x 12 format, spaced at 9-mm centers that are compatible with 96- and 384-well plates. The pins are dipped into the wells, lifted, moved to a position over the microarray substrate, lowered to touch the solid support, whereby the sample is transferred. The process is repeated to complete transfer of all the samples. See Maier et al: (1994) J Biotechnol 35: 191 -203.
  • Solid pins for microarray printing can be purchased, for example, from
  • the CHIPMAKERTM and STEALTHTM pins from TeleChem contain a stainless steel shaft with a fine point. A narrow gap is machined into the point to serve as a reservoir for sample loading and spotting.
  • the pins have a loading volume of 0.2 ⁇ l to 0.6 ⁇ l to create spot sizes ranging from 75 ⁇ m to 360 ⁇ m in diameter.
  • quill-based et al. tools including printing capillaries, tweezers, and split pins T/US2006/043272
  • Quill-based arrayers withdraw a small volume of fluid into a depositing device from a microwell plate by capillary action. See Schena et a/. 1995) Science 270:467-470. The diameter of the capillary typically ranges from about 10 ⁇ m to about 100 ⁇ m.
  • a robot then moves the head with quills to the desired location for dispensing. The quill carries the sample to all spotting locations, where a fraction of the sample is deposited. The forces acting on the fluid held in the quill must be overcome for the fluid to be released.
  • a variation of the pin printing process is the PIN-AND-RINGTM technique developed by Genetic Microsystems Inc. of Woburn, Massachusetts, United States of America. This technique involves dipping a small ring into the sample well and removing it to capture liquid in the ring. A solid pin is then pushed through the sample in the ring, and the sample trapped on the flat end of the pin is deposited onto the surface. See Mace et al. (2000) in Schena M ed, Microarray Biochip Technology, pp. 39-64, Eaton Publishing, Natick, Massachusetts, United States of America.
  • the PIN-AND-RINGTM technique is suitable for spotting onto rigid supports or soft substrates such as agar, gels, nitrocellulose, and nylon.
  • a representative instrument that employs the PIN- AND-RINGTM technique is the 417TM Arrayer available from Affymetrix, Inc. of Santa Clara, California, United States of America.
  • Noncontact Ink-Jet Printing A representative method for noncontact ink- jet printing uses a piezoelectric crystal closely apposed to the fluid reservoir.
  • One configuration places the piezoelectric crystal in contact with a glass capillary that holds the sample fluid.
  • the sample is drawn up into the reservoir and the crystal is biased with a voltage, which causes the crystal to deform, squeeze the capillary, and eject a small amount of fluid from the tip.
  • Piezoelectric pumps offer the capability of controllable, fast jetting rates and consistent volume deposition. Most piezoelectric pumps are unidirectional pumps that need to be directly connected, for example by flexible capillary tubing, to a source of sample supply or wash solution.
  • the capillary and jet orifices should be of sufficient inner diameter so that molecules are not sheared.
  • the void volume of fluid contained in the capillary typically ranges from about 100 ⁇ l to about 500 ⁇ l and generally is not recoverable. See U.S. Patent No. 5,965,352.
  • Syringe-Solenoid Printing combines a syringe pump with a microsolenoid valve to provide quantitative dispensing of nanoliter sample volumes.
  • a high-resolution syringe pump is connected to both a high-speed microsolenoid valve and a reservoir through a switching valve.
  • the system is filled with a system fluid, typically water, and the syringe is connected to the microsolenoid valve. Withdrawing the syringe causes the sample to move upward into the tip. The syringe then pressurizes the system such that opening the microsolenoid valve causes droplets to be ejected onto the surface.
  • a minimum dispense volume is on the order of 4 nl to 8 nl.
  • the positive displacement nature of the dispensing mechanism creates a substantially reliable system. See U.S. Patent Nos. 5,743,960 and 5,916,524. Electronic Addressing. This method involves placing charged molecules at specific positions on a blank microarray substrate, for example a NANOCHIPTM substrate (Nanogen Inc., San Diego, California, United States of America). A nucleic acid probe is introduced to the microchip, and the negatively-charged probe moves to the selected charged position, where it is concentrated and bound. Serial application of different probes can be performed to assemble an array of probes at distinct positions. See U.S. Patent No. 6,225,059 and PCT International Patent Application Publication No.
  • Nanoelectrode Synthesis An alternative array that can also be used in accordance with the methods of the presently disclosed subject matter provides ultra small structures (nanostructures) of a single or a few atomic layers synthesized on a semiconductor surface such as silicon.
  • the nanostructures can be designed to correspond precisely to the three-dimensional shape and electro-chemical properties of molecules, and thus can be used to recognize nucleic acids of a particular nucleotide sequence. See U.S. Patent No. 6,123,819. VL Hybridization
  • a nucleic acid sequence used to assay a gene expression level can comprise sequences corresponding to the open reading frame (or a portion thereof), the 5' untranslated region, and/or the 3' untranslated region. It is understood that any nucleic acid sequence that allows the expression level of a reference gene to be specifically determined can be employed with the methods and compositions of the presently disclosed subject matter.
  • an amplified and labeled nucleic acid sample is hybridized to probes or probe sets that are immobilized on a continuous solid support comprising a plurality of identifying positions.
  • hybridization at 65°C is too stringent for typical use, at least in part because the presence of fluorescent labels destabilizes the nucleic acid duplexes (Randolph & Waggoner (1997) Nucleic Acids Res 25:2923-2929).
  • hybridization can be performed in a formamide-based hybridization buffer as described in Pietu etal. (1996) Genome Res 6:492-503.
  • a microarray format can be selected for use based on its suitability for electrochemical-enhanced hybridization. Provision of an electric current to the microarray, or to one or more discrete positions on the microarray facilitates localization of a target nucleic acid sample near probes immobilized on the microarray surface. Concentration of target nucleic acid near arrayed probe accelerates hybridization of a nucleic acid of the sample to a probe. Further, electronic stringency control allows the removal of unbound and nonspecifically bound DNA after hybridization. See U.S. Patent Nos. 6,017,696 and
  • an amplified and labeled nucleic acid sample is hybridized to one or more probes in solution.
  • Representative stringent hybridization conditions for complementary nucleic acids having more than about 100 complementary residues are overnight hybridization in 50% formamide with 1 mg of heparin at 42°C.
  • An example of highly stringent wash conditions is 15 minutes in 0.1X SSC, 5M NaCI at 65°C.
  • An example of stringent wash conditions is 15 minutes in 0.2X SSC buffer at 65°C ⁇ See Sambrook & Russell (2001) Sambrook & Russell (2001) Molecular Cloning: A Laboratory Manual, 3 rd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, United States of America, for a description of SSC buffer).
  • a high stringency wash can be preceded by a low stringency wash to remove background probe signal.
  • An example of medium stringency wash conditions for a duplex of more than about 100 nucleotides is 15 minutes in 1X SSC at 45°C.
  • An example of low stringency wash for a duplex of more than about 100 nucleotides is 15 minutes in 4-6X SSC at 40 0 C.
  • Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide.
  • stringent conditions typically involve salt concentrations of less than about 1 M Na + ion, typically about 0.01 M to 1 M Na + ion concentration (or other salts) at pH 7.0-8.3, and the temperature is typically at least about 3O 0 C.
  • nucleic acid duplexes or hybrids can be captured from the solution for subsequent analysis, including detection assays.
  • detection assays For example, in a simple assay, a single probe set is hybridized to an amplified and labeled RNA sample derived from a target nucleic acid sample. Following hybridization, an antibody that recognizes DNA: RNA hybrids is used to precipitate the hybrids for subsequent analysis. The expression level of the gene is determined by detection of the label in the precipitate.
  • Alternate capture techniques can be used as will be understood to one of skill in the art, for example, purification by a metal affinity column when using probes comprising a histidine tag.
  • the hybridized sample can be hydrolyzed by alkaline treatment wherein the double-stranded hybrids are protected while non-hybridizing single-stranded template and excess probe are hydrolyzed.
  • the hybrids are then collected using any nucleic acid purification technique for further analysis.
  • probes or probe sets can be distinguished by differential labeling of probes or probe sets.
  • probes or probe sets can be spatially separated in different hybridization vessels. Representative embodiments of each approach are described herein below.
  • a probe or probe set having a unique label is prepared for each gene to be analyzed.
  • a first probe or probe set can be labeled with a first fluorescent label
  • a second probe or probe set can be labeled with a second fluorescent label.
  • Multi-labeling experiments should consider label characteristics and detection techniques to optimize detection of each label.
  • Representative first and second fluorescent labels are Cy3 and Cy5 (Amersham Pharmacia Biotech, Piscataway, New Jersey, United States of America), which can be analyzed with good contrast and minimal signal leakage.
  • a unique label for each probe or probe set can further comprise a labeled microsphere to which a probe or probe set is attached.
  • a representative system is LabMAP (Luminex Corporation, Austin, Texas, United States of America). Briefly, LabMAP (Laboratory Multiple Analyte Profiling) technology involves performing molecular reactions, including hybridization reactions, on the surface of color-coded microscopic beads called microspheres.
  • LabMAP Laboratory Multiple Analyte Profiling
  • an individual probe or probe set is attached to beads having a single color-code such that they can be identified throughout the assay.
  • Successful hybridization is measured using a detectable label of the amplified nucleic acid sample, wherein the detectable label can be distinguished from each color-code used to identify individual microspheres.
  • the hybridization mixture is analyzed to detect the signal of the color-code as well as the label of a sample nucleic acid bound to the microsphere. See Vignali (2000) J Immunol Methods 243:243- 255; Smith et a/. (1998) CHn Chem 44:2054-2056; PCT International Patent Application Publication Nos. WO 01/13120, WO 01/14589, WO 99/19515, and WO 97/14028. VIL Detection
  • Methods for detecting a hybridization duplex or triplex are selected according to the label employed.
  • a radioactive label e.g., 32 P-, 33 P-, or 35 S-dNTP
  • detection can be accomplished by autoradiography or by using a phosphorimager as is known to one of skill in the art.
  • a detection method can be automated and is adapted for simultaneous detection of numerous samples.
  • Common research equipment has been developed to perform high- throughput fluorescence detecting, including instruments from GSI Lumonics (Watertown, Massachusetts, United States of America), Amersham Pharmacia Biotech/Molecular Dynamics (Sunnyvale, California, United States of America), Applied Precision Inc. (Issauah, Washington, United States of America), Genomic Solutions Inc. (Ann Arbor, Michigan, United States of America), Genetic Microsystems Inc.
  • a nucleic acid sample or probes are labeled with far infrared, near infrared, or infrared fluorescent dyes.
  • the mixture of amplified nucleic acids and probes is scanned photoelectrical ⁇ with a laser diode and a sensor, wherein the laser scans with scanning light at a wavelength within the absorbance spectrum of the fluorescent label, and light is sensed at the emission wavelength of the label.
  • a laser diode and a sensor wherein the laser scans with scanning light at a wavelength within the absorbance spectrum of the fluorescent label, and light is sensed at the emission wavelength of the label.
  • a protein or compound that binds the epitope can be used to detect the epitope.
  • an enzyme-linked protein can be subsequently detected by development of a colorimetric or luminescent reaction product that is measurable using a spectrophotometer or luminometer, respectively.
  • INVADER® technology (Third Wave Technologies, Madison, Wisconsin, United States of America) is used to detect target nucleic acid/probe complexes. Briefly, a nucleic acid cleavage site (such as that recognized by a variety of enzymes having 5' nuclease activity) is created on a target sequence, and the target sequence is cleaved in a site-specific manner, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. See U.S. Patent Nos. 5,846,717; 5,985,557; 5,994,069; 6,001 ,567; and 6,090,543.
  • target nucleic acid/probe complexes are detected using an amplifying molecule, for example a poly-dA oligonucleotide as described in Lisle etal. (2001 ) Biotechniques 30:1268-1272.
  • an amplifying molecule for example a poly-dA oligonucleotide as described in Lisle etal. (2001 ) Biotechniques 30:1268-1272.
  • a tethered probe is employed against a target nucleic acid having a complementary nucleotide sequence.
  • a target nucleic acid having a poly-dT sequence which can be added to any nucleic acid sequence using methods known to one of skill in the art, hybridizes with an amplifying molecule comprising a poly-dA oligonucleotide.
  • Short oligo-dT 40 signaling moieties are labeled with any suitable label (e.g., fluorescent, chemiluminescent, radioisotopic labels).
  • the short oligo-dT 40 signaling moieties are subsequently hybridized along the molecule, and the label is detected.
  • Genes that were the differentially expressed in patients with a multiple sclerosis syndrome compared to a control population were chosen to determine if they could be used to classify individuals with a multiple sclerosis syndrome.
  • the genes that were employed include those listed in Table 1.
  • TGM2 transglutaminase 2 8 TP53 human tumor protein p53 9
  • ANP32B/SSP29 GenBANK® Accession No.: NM_006401 ; SEQ ID NO: 10
  • TNFAIP2 Gen BANK® Accession No.: NM_006291 ; SEQ ID NO: 11
  • SIP1 GenBANK® Accession No.: NMJ303616; SEQ ID NO: 12
  • BPHL GenBANK® Accession No.: NMJD04332; SEQ ID NO: 13
  • CCDC85B GenBANK® Accession No.: NM_006848; SEQ ID NO: 14
  • ASL GenBANK® Accession No.: NM_000048; SEQ ID NO: 15
  • GNB5 GenBANK® Accession No.: NM_000048; SEQ ID NO: 16
  • MAN1A1 GenBANK® Accession No.: NM_005907; SEQ ID NO: 17
  • the expression level of the genes listed in Table 1 was determined as described herein. Computer-based methods were employed for identifying which ratios of gene expression could be employed discriminating subjects with an MS syndrome from subjects without an MS syndrome. The analysis included numbers of genes to include in any single equation that was limited to between 1 and 5.
  • kits comprising one or more reagents for performing the presently disclosed methods.
  • the kits comprise a plurality of oligonucleotide primers and instructions for employing the plurality of oligonucleotide primers to determine the expression level of at least one, two, three, four, five, six, seven, eight, or all nine of the genes represented by SEQ ID NOs: 1-9.
  • the kits also comprise a plurality of oligonucleotide primers that can be employed for determining an expression level for one or more additional genes such as a control gene.
  • the presently disclosed subject matter provides computer program products comprising computer-executable instructions embodied in a computer- readable medium.
  • the computer program products perform steps comprising (a) acquiring an first input data set comprising a plurality of first gene expression levels, each of the plurality of first gene expression levels corresponding to an expression level of a gene product in a control population of subjects; (b) acquiring a second input data set comprising a plurality of second gene expression levels, each of the plurality of second gene expression levels corresponding to an expression level of a gene product in a test population of subjects; (c) calculating a first deterministic series of ratios between and among various combinations of the first gene expression levels and a second deterministic series of ratios between and among various combinations of the second gene expression levels; and (d) identifying one or more ratio values that differ in the first deterministic series of ratios from one ore more related ratio values in the second deterministic series of ratios to a degree sufficient that the one or more ratios can be used predict whether an un
  • the first population is a population of subjects that do not have a multiple sclerosis syndrome and the second population is a population of subjects that do have a multiple sclerosis syndrome.
  • the plurality of first gene expression levels and the plurality of second gene expression levels correspond to one or more of the genes represented by SEQ ID NOs: 1-9.
  • the presently disclosed subject matter also provides methods for assigning an uncharacterized subject to one of two populations of subjects.
  • the methods comprise (a) acquiring an first input data set comprising a plurality of first gene expression levels, each of the plurality of first gene expression levels corresponding to an expression level of a gene product in a first population of subjects; (b) acquiring a second input data set comprising a plurality of second gene expression levels, each of the plurality of second gene expression levels corresponding to an expression level of a gene product in a second population of subjects; (c) calculating a first deterministic series of ratios between and among various combinations of the first gene expression levels and a second deterministic series of ratios between and among various combinations of the second gene expression levels; and (d) identifying one or more ratio values that differ in the first deterministic series of ratios from one or more related ratio values in the second deterministic series of ratios to a degree sufficient that the one or more ratios can be used predict whether an uncharacterized subject would be appropriately characterized as being a member
  • the first population is a population of subjects that do not have a multiple sclerosis syndrome and the second population is a population of subjects that do have a multiple sclerosis syndrome.
  • the plurality of first gene expression levels and the plurality of second gene expression levels correspond to one or more of the genes represented by SEQ ID NOs: 1-9.
  • the first and second data sets can include expression data for any number of genes from any number of subjects.
  • the gene expression data can be data resulting from gene expression analysis techniques as disclosed herein, and can include, but are not limited to data generated by RT-PCR (e.g., Q-RT-PCR), from Northern blots, and from analyses of gene arrays (e.g., gene chips, filter-based arrays, etc.).
  • the gene expression data is normalized, for example, to an expression level of a control gene such as a housekeeping gene.
  • Genes that can be employed as control genes include, but are not limited to /?-actin (ACTB), aldolase A, fructose-bisphosphate (ALDOA), glyceraldehyde-3-phosphate dehydrogenase (GAPD), phosphoglycerate kinase 1 (PGK1 ), and lactate dehydrogenase A (LDHA).
  • ACTB /?-actin
  • ALDHA phosphoglycerate kinase 1
  • Other such genes are set forth in Eisenberg & Levanon (2003) Trends in Genetics 19:362-365.
  • RRMS relapsing remitting MS
  • SPMS secondary progressive MS
  • PPMS primary progressive MS
  • CIS clinically isolated syndrome
  • RNA was isolated using the Versagene PAXGENETM-compatible isolation kit according to the manufacturer's recommendations (Gentra Systems, Inc., Minneapolis,
  • cDNA equivalent to 100 ng of total RNA was used in replicate Q-RT-PCR reactions for each gene assay. Patient clinical history was blinded during processing and data collection.
  • TAQMAN® gene expression assays (Applied Biosystems, Inc., Foster City, California) and detected on an ABI7700/SDS platform (Applied Biosystems,
  • ACTR1 A SEQ ID NO: 1
  • BRCA1 BRCA1
  • SPIB SEQ ID NO: 6
  • TAF11 SEQ ID NO: 7
  • TGM2 SEQ ID NO: 8
  • CT X, and 2 X calculated the linear expression value.
  • EXAMPLE 3 Statistical Analysis A computer program was designed to identify the most discriminatory combination of ratios (ranging between 1 and 5). All gene expression ratios (e.g., ACTR1A/BRCA1, TAF11/ACTR1A, etc) in the control and MS data sets were searched to first find an optimal ratio. The search was entirely deterministic, since every possible ratio using two gene expression levels was computed. The value of a test ratio was used to separate the MS data set from the control set. For each control individual and MS individual the test ratio was computed. ⁇ C1 ,C2,...,Cn ⁇ denoted the test ratios for each of the n control individuals and ⁇ MS1 ,MS2,...,MSk ⁇ denoted the test ratios for each of the k MS individuals. Perfect separation of the two sets (an optimal ratio) would occur if the largest test ratio for the control individuals was less than the smallest test ratio for the MS individuals, but this optimal ratio was not identified.
  • All gene expression ratios e.g., ACTR1A/B
  • the optimal test ratio separated the two data sets such that second largest ratio in ⁇ C1 ,C2,...,Cn ⁇ was less than the largest number of ratios in ⁇ MS1 ,MS2,...,MSk ⁇ .
  • This optimal ratio was used to identify a cutoff value that produced the highest sensitivity and specificity for the two data sets. This was accomplished by plotting sensitivity and specificity curves as functions of the cutoff value and identifying the intersection of the curves. The cutoff value at this intersection was designated the optimal ratio discriminator.
  • Receiver operating characteristics (ROC) curves were used to examine performance characteristics of the tests. Briefly, the number of true positives (TP) and false positives (FP) were determined for a range of cutoff scores. The fraction of TP (TPF) was determined by dividing the TP by the total number of cases and the fraction of FP (FPF) was determined by dividing the FP by the total number of controls.
  • TPF true positives
  • FPF fraction of FP
  • a nonlinear regression curve was calculated using a function in Mathematica's Statistics' NonlinearFit package (Wolfram Research, Inc., Champaign, Illinois, United States of America). The nonlinear regression was integrated to determine the area under the curve (AUC). The significance of observed differences (P) was determined using the Mann-Whitney test.
  • the data set comprised 29 patients with different clinical forms of MS (see Table 2) and 49 controls subjects.
  • Nine genes were selected from the microarray data set for analysis (see EXAMPLE 2) and examined their expression profile using TAQMAN® gene expression assays. Genes were selected from the microarray data set for which expression level did not vary (control genes: LLGL2, CTSS, TGM2), and for which expression level varied significantly among control and autoimmune subjects (test genes: ACTR1A, BRCA1, EPHX2, SPIB, TAF11, and TP53).
  • MS and control group see Table 3.
  • a wide range of individual gene expression levels was observed from the lowest value of 0.35 ⁇ BRCA1), to the highest value of 22,851 (CTSS).
  • CTSS 22,851
  • the average expression values were significantly different between the MS and control groups for 5 genes (see Table 3).
  • Linear expression values were entered into a computer program, which searched all possible 1-, 2-, 3-, and 4- ratio combinations of gene expression levels. These ratio combinations, or discriminators, generated individual scores, which were then analyzed to determine sensitivity and specificity (see
  • Scores were determined for each test equation as outlined in methods and are expressed as average score ⁇ standard deviation. Sensitivity (Sens.) and specificity (Spec.) were determined by standard calculations as described in the methods section. P values were determined using the Mann-Whitney test.
  • the computer program identified more than one 1-, 2-, 3-, or 4-ratio discriminator that performed with equal sensitivity and specificity. For example, at the 1 -component stage, several genes functioned equally well in the numerator but TAF11 was always in the denominator, consistent with its low expression value in the MS cohort. Several 4-ratio discriminators also performed equally well and these varied in two ways. First,
  • MS scores ranged from 0.6 - 69.1 , while control scores ranged between 0 0.2 and 27.7 (see Figure 1A; MS-1).
  • One control subject received a score of 27.7, which was 5.6 times higher than the next highest control score of 4.9, and 12.6 fold higher than the average control score of 2.2.
  • the initial results were validated using only the best 4-ratio discriminator [CTSS X LLGLf X TGM2] I [TAF11 2 X TP53 2 ] by determining gene expression levels in whole blood from 5 an independent cohort of 26 MS patients (see Figure 1 A; MS-2) and performing the same analysis to produce scores.
  • the average score of the second group was 14.4 ⁇ 13.6 and the sensitivity was 88%.
  • Combined sensitivity of the test and validation groups was 91 %.
  • the performance of the 4-ratio discriminator [(CTSSX LLGL2 2 X TGM2) / (TAF11 2 X TP53 2 )] was also evaluated using the receiver operating characteristics (ROC) curve.
  • ROC receiver operating characteristics
  • the TPF and the FPF were compared using the MS and control samples applying the best 1-, 2-, 3-, and 4- ratio discriminators (from Table 4 and Figure 1A). This comparison yielded an area under the curve (AUC) of 0.96 fro the 4-ratio discriminator (see Figure 3A). AUCs for the 3-, 2-, and 1 -ratio discriminators were less than the AUC for the 4-ratio discriminator.
  • the TPF and FPF were determined using the MS samples and all other samples (combined data from Figure 1A and 2A).
  • results from a genome-wide microarray analysis were employed to design a sensitive and specific Q-RT-PCR-based assay capable of distinguishing subjects with MS from control subjects and subjects with other chronic diseases including autoimmune diseases.
  • This assay discriminates individuals with MS from controls with a specificity of 98% and an overall sensitivity of 91%. This level of specificity and sensitivity is among the highest reported in laboratory testing for MS (specificity -95% for MRI; -85% for OGCB).
  • this Q-RT-PCR assay can easily be adapted into a clinical molecular genetics laboratory. The search algorithm was designed with three desirable goals in mind.
  • the instant co-inventors had identified a conserved pattern of gene expression in individuals with different forms of autoimmune disease. Using these data, it was possible to design a scoring system that accurately discriminated subjects with an autoimmune disease from control subjects or from subjects undergoing an immune response after influenza vaccination. When the data set from the Q-RT-PCR assays was analyzed with different autoimmune subjects included, the discriminator still performed with a specificity of 87%, NDP of 98%, and overall accuracy of 86%. The greatest degree of overlap was identified with RA patients, but it is important to note that distinguishing between MS and RA is not traditionally a clinical dilemma.
  • TAF11 was the most under-expressed gene in the majority of MS patients and was represented in all the component ratios.
  • TP53 was moderately underexpressed on average in the MS population and its inclusion in the discriminator did not improve sensitivity. Its inclusion did increase specificity from 93% to 98%, however.
  • TAF11 encodes a small subunit of transcription factor ND that is present in all TFIID complexes and interacts with TATA-binding protein. Its function in mammalian systems is not well understood. Its function is better understood in yeast. In yeast, promoters of genes have been grouped based upon their interactions with TAFs such that deletion or mutation of an individual TAF can alter transcription of a class of genes and change the transcriptional profile of a cell. Therefore, reduced TAF11 expression could have a pleiotropic effect altering normal transcriptional regulation of multiple genes involved in the MS phenotype.
  • TP53 encodes the tumor suppressor protein, p53, which regulates cell proliferation, DNA damage/repair, and apoptosis.
  • Lymphocytes from RA and MS patients have reduced TP53 transcript levels, p53 protein levels, and defects in lymphocyte apoptosis induced by gamma radiation, a process known to be dependent upon p53. Defects in apoptosis are hypothesized to contribute to autoimmunity.
  • disclosed herein are a sensitive and highly specific Q-RT- PCR assay and a mathematic approach to evaluate the data. The assay allowed for the discrimination of patients with MS from controls and other autoimmune diseases. For MS, this approach offers a non-invasive, rapid test that provides diagnostic utility to assist in the clinical decision making process for this complex and challenging disease.
  • PBMC Peripheral blood mononuclear cells
  • RNA labeling includes three steps: priming, elongation, and probe purification.
  • priming 1-10 ⁇ g of total RNA (in a volume of less than 8.0 ⁇ l diethylpyrocarbonate (DEPC)-treated water) and 2.0 ⁇ g oligo-dT (10-20 mer mixture; 1 ⁇ g/ ⁇ l) are mixed in a total volume of 10 ⁇ l (balance DEPC-treated water) in a 1.5 ml microcentrifuge tube. The tube is placed at 70°C for 10 minutes and then briefly chilled on ice.
  • DEPC diethylpyrocarbonate
  • 6.0 ⁇ l 5x First Strand Buffer (Invitrogen catalogue number Y00146), 1.0 ⁇ l 0.1 M DTT, 1.5 ⁇ l dNTP mixture (each dNTP at 20 mM), and 1.5 ⁇ l SUPERSCRIPTTM Il reverse transcriptase (Invitrogen) is added to the microcentrifuge tube.
  • 10 ⁇ l 33 P-dCTP (10 mCi/ml; specific activity 3000 Ci/mmol; ICN Biomedicals Inc., Irvine, California, United States of America) is added to the microcentrifuge tube, the contents mixed thoroughly, and the tube is incubated at 37°C for 90 minutes.
  • Probe purification is accomplished by passing the elongation reaction mixture through a Bio-Spin 6 chromatography column (Bio-Rad Laboratories, Hercules, California, United States of America).
  • Hybridization of the Labeled RNA to the Membrane 5 ⁇ g of 33 P-labeled total RNA isolated from PBMCs is hybridized to GF211 GENEFILTERS® membranes (RESGENTM, a division of Invitrogen Corporation, Carlsbad, California, United States of America; the genes present on the GF211 membrane can be found at RESGENTM's ftp site).
  • RESGENTM GF211 GENEFILTERS® membranes
  • the filter Prior to hybridization, the filter is pre-treated with 0.5% SDS. The SDS solution is heated to boiling and poured over the membrane, which is then incubated in the SDS solution with gentle agitation for 5 minutes.
  • the filter is prehybridized by placing the filter in a hybridization roller tube (35 x 150 mm; DNA side facing the interior of the tube) and 5 ml MICROHYBTM solution (RESGENTM) is added to the tube. Additional blocking agents (5 ⁇ g COT-1® DNA, Invitrogen Corporation, Carlsbad, California, United States of America; 5 ⁇ g poly-dA) are added and the tube is vortexed to mix thoroughly. Bubbles between the membrane and the tube can be removed and the membranes is incubated in the prehybridization solution at 42°C for at least 2 hours.
  • RESGENTM 5 ml MICROHYBTM solution
  • the probe is denatured by boiling, cooled, and pipetted into the roller tube containing the GENEFILTERS® membrane and prehybridization solution.
  • the now denatured probe-containing solution is mixed by vortexing. Hybridization can occur overnight, or alternatively for at least 12-18 hours, at 42°C.
  • Post-Hybridization Washes and Imaging After hybridization, the filters are washed in the roller tube. The following wash conditions can be used: first and second washes were in 2x SSC/1 % SDS/50°C for 20 minutes; third wash was in 0.5x SSC/1 % SDS/55°C for 15 minutes. After washing, the membrane is wrapped in plastic wrap and placed in a phosphorimaging cassette. Filters are exposed to imaging screens for 2-4 hours (short exposure) and then an additional 24 hours (long exposure) and screens are scanned using a PHOSPHORIMAGERTM apparatus (Molecular Dynamics, Piscataway, New Jersey, United States of America).
  • PHOSPHORIMAGERTM apparatus Molecular Dynamics, Piscataway, New Jersey, United States of America.
  • a nucleic acid sample can be used as a template for direct incorporation of fluorescent nucleotide analogs (e.g., Cy3-dUTP and Cy5-dUTP, available from Amersham Pharmacia Biotech of Piscataway, New Jersey, United States of America) by a polymerization reaction.
  • fluorescent nucleotide analogs e.g., Cy3-dUTP and Cy5-dUTP, available from Amersham Pharmacia Biotech of Piscataway, New Jersey, United States of America
  • a 50 ⁇ l labeling reaction can contain 2 ⁇ g of template DNA, 5 ⁇ l of 1OX buffer, 1.5 ⁇ l of fluorescent dUTP,
  • EXAMPLE 9 Noncovalent Binding of Nucleic Acid Probes onto Glass PCR fragments are suspended in a solution of 3 to 5M NaSCN and spotted onto amino-silanized slides using a GMS 417TM arrayerfrom Affymetrix of Santa Clara, California, United States of America. After spotting, the slides are heated at 80°C for 2 hours to dehydrate the spots. Prior to hybridization, the slides are washed in isopropanol for 10 minutes, followed by washing in boiling water for 5 minutes. The washing steps remove any nucleic acid that is not bound tightly to the glass and help to reduce background created by redistribution of loosely attached DNA during hybridization. Contaminants such as detergents and carbohydrates should be minimized in the spotting solution. See also Maitra & Thakur (1992) Curr Sci 62:586-588; Maitra & Thakur (1994) Indian J Biochem Biophys 31 :97-99.
  • Labeled nucleic acids from the sample are prepared in a solution of 4X SSC buffer, 0.7 ⁇ g/ ⁇ l tRNA, and 0.3% SDS to a total volume of 14.75 ⁇ l.
  • the hybridization mixture is denatured at 98 0 C for 2 minutes, cooled to 65°C, applied to the microarray, and covered with a 22-mm 2 cover slip.
  • the slide is placed in a waterproof hybridization chamber for hybridization in a 65°C water bath for 3 hours. Following hybridization, slides are washed in 1X SSC buffer with 0.06% SDS followed by 2 minutes in 0.06X SSC buffer.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Methods for detecting a multiple sclerosis (MS) syndrome in a subject. In some embodiments, the methods include the steps of obtaining a biological sample from the subject; determining expression levels for two or more genes in the biological sample, wherein the two or more genes are selected from among the genes represented by SEQ ID NOs: 1-9; and calculating a ratio of the expression levels of the two or more genes determined in step (b) to thereby detect the presence of a multiple sclerosis syndrome in the subject. The presently disclosed subject matter also provides reagents and kits that can be used in the practice of the disclosed methods. Also provided are methods for assigning an uncharacterized subject to one of two populations of subjects and computer program products for identifying one or more ratio values that can be employed for assigning an uncharacterized subject to one of two populations of subjects.

Description

DESCRIPTION MOLECULAR DIAGNOSIS OF AUTOIMMUNE DISEASES
RELATED APPLICATIONS
The presently disclosed subject matter claims the benefit of U.S.
Provisional Patent Application Serial Nos. 60/734,369 and 60/736,131 , filed November 7, 2005 and November 10, 2005, respectively. The disclosure of each of these U.S. Provisional Patent Applications is incorporated herein by reference in its entirety.
GOVERNMENT INTEREST
This work was supported by grant AI053984 from the U.S. National Institutes of Health. Thus, the U.S. government has certain rights in the presently disclosed subject matter.
TECHNICAL FIELD
The presently disclosed subject matter generally relates to the diagnosis of autoimmune disease. More specifically, the presently disclosed subject matter relates to a method for diagnosing an autoimmune disease, such as a multiple sclerosis syndrome.
BACKGROUND
Autoimmune diseases are heterogeneous diseases believed to arise from immune-mediated attack against self-antigens. For example, multiple sclerosis is the most common demyelinating disease of the central nervous system and develops from destruction of myelin sheaths. Both genetic and environmental factors play important roles in the onset and pathogenesis of autoimmune diseases. Epidemiologic data along with genetic linkage studies clearly support the presence of a genetic contribution to susceptibility to autoimmune disease.
Diagnosis of autoimmune diseases can present difficulties to the clinician. For example, there is no single definitive laboratory test for MS; it remains a clinical diagnosis. Abnormal brain magnetic resonance imaging (MRI) findings and immunological changes in cerebrospinal fluid (elevated IgG index, presence of oligoclonal bands) raise clinical suspicion, but are not disease specific. Therefore, patients who present with features highly suspicious for MS, or clinically isolated syndromes (CIS), present a diagnostic challenge. The identification of biomarkers characteristic of MS can aid in its diagnosis.
Thus, the identification of gene expression signatures that are common to several autoimmune diseases and/or are unique to an individual autoimmune disease would be extremely useful for the diagnosis of autoimmune diseases including, but not limited to MS. To address this need at least in part, the presently disclosed subject matter provides methods and products for diagnosis of autoimmune disease based in part on determinations of differential gene expression in a biological sample.
SUMMARY
This Summary lists several embodiments of the presently disclosed subject matter, and in many cases lists variations and permutations of these embodiments. This Summary is merely exemplary of the numerous and varied embodiments. Mention of one or more representative features of a given embodiment is likewise exemplary. Such an embodiment can typically exist with or without the feature(s) mentioned; likewise, those features can be applied to other embodiments of the presently disclosed subject matter, whether listed in this Summary or not. To avoid excessive repetition, this Summary does not list or suggest all possible combinations of such features.
The presently disclosed subject matter provides methods for detecting a multiple sclerosis (MS) syndrome in a subject. In some embodiments, the methods comprise (a) obtaining a biological sample from the subject; (b) determining expression levels for one or more genes in the biological sample, wherein the one or more genes are selected from among the genes represented by SEQ ID NOs: 1-39; and (c) comparing the expression levels of each of the one or more genes determined in step (b) with a standard, wherein the comparing detects a multiple sclerosis (MS) syndrome in the subject. In some embodiments of the presently disclosed methods, the comparing comprises (a) establishing an average expression level for each of the one or more genes in a population, wherein the population comprises statistically significant numbers of subjects with an MS syndrome and subjects that do not have an MS syndrome; (b) assigning a first value to each gene for which the expression level in the subject is higher than the average expression level in the population and a second value to each gene for which the expression level in the subject is lower than the average expression level in the population; and (c) adding the values assigned in step (b) to arrive at a sum, wherein the sum is indicative of a presence of an MS syndrome in the subject. In some embodiments, the comparing comprises calculating a ratio of the expression levels of the two or more genes represented by SEQ IS NOs. 1-9 to thereby detect the presence of a multiple sclerosis syndrome in the subject. /
In some embodiments of the presently disclosed methods , the multiple sclerosis syndrome is selected from among relapsing remitting multiple sclerosis (RRMS); secondary progressive multiple sclerosis (SPMS); primary progressive multiple sclerosis (PPMS); clinically isolated syndrome (CIS); and combinations thereof. In some embodiments, the biological sample is a cell present in whole blood or a fraction thereof isolated from the subject. In some embodiments, the biological sample comprises a peripheral blood mononuclear cell.
In some embodiments of the presently disclosed methods, the determining comprises a technique selected from the group consisting of a Northern blot, hybridization to a nucleic acid microarray, and a reverse transcription-polymerase chain reaction (RT-PCR). In some embodiments, the RT-PCR is quantitative RT-PCR (Q-RT-PCR). In some embodiments, the determining is of the expression levels of at least three, four, or five genes represented by SEQ ID NOs: 1-9.
In some embodiments of the presently disclosed methods, the calculating comprises calculating a ratio using an equation selected from among (a) the expression level of a gene product represented by SEQ ID NO: 3 divided by the expression level of a gene product represented by SEQ ID NO: 7; (b) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 3 divided by the expression level of a gene product represented by SEQ ID NO: 7 squared; (c) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 multiplied by the expression level of a gene product represented by SEQ ID NO: 6 divided by one thousand times the expression level of a gene product represented by SEQ ID NO: 7 cubed; and (d) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 squared multiplied by the expression level of a gene product represented by SEQ ID NO: 8 divided by the product of the expression level of a gene product represented by SEQ ID NO: 7 squared times the expression level of a gene product represented by SEQ ID NO: 9 squared.
The presently disclosed subject matter also provides methods for diagnosing a multiple sclerosis (MS) syndrome in a subject comprising (a) providing an array comprising a plurality of nucleic acid sequences, wherein the plurality of nucleic acid sequences correspond to at least two of the gene products represented by SEQ ID NOs: 1-9; (b) providing a nucleic acid sample isolated from or generated from a biological sample from the subject; (c) hybridizing the nucleic acid sample to the array; (d) detecting nucleic acids on the array to which the nucleic acid sample hybridizes; (e) determining an expression level for each nucleic acid detected; and (f) calculating a ratio of the expression levels of the two or more genes determined in step (e) to thereby detect the presence of a multiple sclerosis syndrome in the subject. In some embodiments, the multiple sclerosis syndrome is selected from among relapsing remitting multiple sclerosis (RRMS); secondary progressive multiple sclerosis (SPMS); primary progressive multiple sclerosis (PPMS); clinically isolated syndrome (CIS); and combinations thereof. In some embodiments the array is selected from the group consisting of a microarray. chip and a membrane-based filter array. In some embodiments, the array comprises nucleic acid sequences that correspond to at least three, four, five, six, or all nine genes represented by SEQ ID NOs: 1-9. In some embodiments, the array comprises more than one identifying location for at least one of the gene products represented by SEQ ID NOs: 1-9. In some embodiments, the array further comprises at least one internal control gene. In some embodiments, the biological sample comprises a cell present in whole blood or a fraction thereof isolated from the subject. In some embodiments, the cell is a peripheral blood mononuclear cell. In some embodiments of the presently disclosed methods, the determining comprises a technique selected from the group consisting of a Northern blot, hybridization to a nucleic acid microarray, and a reverse transcription-polymerase chain reaction (RT-PCR). In some embodiments, the RT-PCR is quantitative RT-PCR (Q-RT-PCR). In some embodiments, the determining is of the expression levels of at least two, three, four, five, six, or all nine genes represented by SEQ ID NOs: 1-9.
In some embodiments of the presently disclosed methods, the calculating comprises calculating a ratio using an equation selected from among: (a) the expression level of a gene product represented by SEQ ID NO: 3 divided by (the expression level of a gene product represented by SEQ ID NO: 7; (b) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 3 divided by the expression level of a gene product represented by SEQ ID NO: 7 squared; (c) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 multiplied by the expression level of a gene product represented by SEQ ID NO: 6 divided by one thousand times the expression level of a gene product represented by SEQ ID NO: 7 cubed; and (d) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 squared multiplied by the expression level of a gene product represented by SEQ ID NO: 8 divided by the product of the expression level of a gene product represented by SEQ ID NO: 7 squared times the expression level of a gene product represented by SEQ ID NO: 9 squared.
The subject matter described herein providing method for detecting a multiple sclerosis (MS) syndrome in a subject can be implemented using a computer program product comprising computer executable instructions embodied in a computer-readable medium. Exemplary computer-readable media suitable for implementing the subject matter described herein include chip memory devices, disk memory devices, programmable logic devices, application specific integrated circuits, and downloadable electrical signals. In addition, a computer program product that implements the subject matter described herein can be located on a single device or computing platform or can be distributed across multiple devices and/or computing platforms.
The presently disclosed subject matter also provides kits comprising a plurality of oligonucleotide primers and instructions for employing the plurality of oligonucleotide primers to determine the expression level of at least two of the genes represented by SEQ ID NOs: 1-9. In some embodiments, the kits comprise oligonucleotide primers to determine the expression level of at least three, four, five, six, or all nine of the genes represented by SEQ ID NOs: 1-9. In some embodiments, the kits further comprise oligonucleotide primers to determine the expression level of a control gene.
The presently disclosed subject matter also provides methods for assigning an uncharacterized subject to one of two populations of subjects. In some embodiments, the methods comprise (a) acquiring an first input data set comprising a plurality of first gene expression levels, each of the plurality of first gene expression levels corresponding to an expression level of a gene product in a first population of subjects; (b) acquiring a second input data set comprising a plurality of second gene expression levels, each of the plurality of second gene expression levels corresponding to an expression level of a gene product in a second population of subjects; (c) calculating a first deterministic series of ratios between and among various combinations of the first gene expression levels and a second deterministic series of ratios between and among various combinations of the second gene expression levels; and (d) identifying one or more ratio values that differ in the first deterministic series of ratios from one or more related ratio values in the second deterministic series of ratios to a degree sufficient that the one or more ratios can be used predict whether an uncharacterized subject would be appropriately characterized as being a member of the first population of subjects or the second population of subjects.
In some embodiments, the first population is a population of subjects that do not have a multiple sclerosis syndrome and the second population is a population of subjects that do have a multiple sclerosis syndrome. In some embodiments, the plurality of first gene expression levels and the plurality of second gene expression levels correspond to one or more of the genes represented by SEQ ID NOs: 1-9. The presently disclosed subject matter also provides computer program products comprising computer-executable instructions embodied in a computer- readable medium for performing steps comprising (a) acquiring an first input data set comprising a plurality of first gene expression levels, each of the plurality of first gene expression levels corresponding to an expression level of a gene product in a control population of subjects; (b) acquiring a second input data set comprising a plurality of second gene expression levels, each of the plurality of second gene expression levels corresponding to an expression level of a gene product in a test population of subjects; (c) calculating a first deterministic series of ratios between and among various combinations of the first gene expression levels and a second deterministic series of ratios between and among various combinations of the second gene expression levels; and (d) identifying one or more ratio values that differ in the first deterministic series of ratios from one ore more related ratio values in the second deterministic series of ratios to a degree sufficient that the one or more ratios can be used predict whether an uncharacterized subject would be appropriately characterized as being a member of the first population of subjects or the second population of subjects. In some embodiments, the first population is a population of subjects that do not have a multiple sclerosis syndrome and the second population is a population of subjects that do have a multiple sclerosis syndrome. In some embodiments, the plurality of first gene expression levels and the plurality of second gene expression levels correspond to one or more of the genes represented by SEQ ID NOs: 1-9.
It is an object of the presently disclosed subject matter to provide a method for diagnosing an autoimmune disease, such as a multiple sclerosis syndrome.
An object of the presently disclosed subject matter having been stated hereinabove, and which is achieved in whole or in part by the presently disclosed subject matter, other objects will become evident as the description proceeds when taken in connection with the accompanying examples and drawings as best described hereinbelow. BRIEF DESCRIPTION OF THE DRAWINGS
Figures 1A-1 C depict the results of analyses of the 4-ratio discriminator in the MS and Control groups.
Figure 1 A is a plot presenting the results of individual test scores within each cohort determined using the expression equation: [CTSS X LLGL22 X TGM2] I [TAF112 X TP532].
Figure 1 B is a plot of the sensitivity (circles) and the specificity (squares) of the test as scoring threshold increases.
Figure 1C is a plot depicting the accuracy of test results with varying threshold.
Figures 2A-2C depict the results of analyses of the 4-ratio discriminator in different autoimmune and chronic diseases.
Figure 2A is a plot presenting the results of individual test scores within each cohort determined using the expression equation: [CTSS X LLGL22 X TGM2] I [TAF112 X TP532].
Figure 2B is a plot of the sensitivity (circles) and the specificity (rectangles) of the test as scoring threshold increases.
Figure 2C is a plot depicting the accuracy of test results with varying threshold. Figures 3A and 3B are receiver operating characteristics (ROC) curves of the 1-, 2-, 3-, and 4-ratio discriminators. For both Figures 3A and 3B, the data points are as follows: circles: 1 -ratio discriminator; triangles: 2-ratio discriminator; squares: 3-ratio discriminator; and diamonds: 4-ratio discriminator. . Figure 3A is a ROC curve of MS versus control samples. True Positive
(TP) and False Positive (FP) fractions were calculated as a function of test score (from Figure 1 ).
Figure 3B is a ROC curve comparing MS to controls, and other disease groups (from Figure 2A). TP and FP fractions were calculated as a function of test score.
Figure 4 is a plot presenting the results of individual MS test scores within each clinical sub-type. Individual test scores within each MS clinical sub- type were determined using the expression equation: [CTSS X LLGL22 X
Figure imgf000011_0001
Figure 5 is a bar graph presenting average test scores as a function of type of therapy. The MS patient samples were segregated into all MS patients, patients receiving a /?-interferon, patients not receiving ^-interferon, and patients receiving Copaxone. Average scores ± standard deviation were calculated and P values were calculated using the Mann-Whitney test.
BRIEF DESCRIPTION OF THE SEQUENCE LISTING SEQ ID NO: 1 is a nucleic acid sequence of a cDNA corresponding to a human ARP1 actin-related protein 1 homolog A, centractin alpha (yeast) (ACTR1A) gene product (GENBANK® Accession No. NM_005736).
SEQ ID NO: 2 is a nucleic acid sequence of a cDNA corresponding to a human breast cancer 1 , early onset (BRCA1 ), transcript variant BRCA1 a gene product (GENBANK® Accession No. NM_007294). SEQ ID NO: 3 is a nucleic acid sequence of a cDNA corresponding to a human cathepsin S (CTSS) gene product (GENBANK® Accession No. NM_004079).
SEQ ID NO: 4 is a nucleic acid sequence of a cDNA corresponding to a human epoxide hydrolase 2, cytoplasmic (EPHX2) gene product (GENBANK® Accession No. NM_001979).
SEQ ID NO: 5 is a nucleic acid sequence of a cDNA corresponding to a human lethal giant larvae homolog 2 (LLGL2) gene product (GENBANK® Accession No. NM_004524).
SEQ ID NO: 6 is a nucleic acid sequence of a cDNA corresponding to a human Spi-B transcription factor (Spi-1/PU.1 related; SPIB) gene product (GENBANK® Accession No. NM_003121 ).
SEQ ID NO: 7 is a nucleic acid sequence of a cDNA corresponding to a human TATA box binding protein (TBP)-associated factor 11 (TAF11 ) RNA polymerase II, 28 kilodalton (kDa) gene product (GENBANK® Accession No. NM_005643).
SEQ ID NO: 8 is a nucleic acid sequence of a cDNA corresponding to a human transglutaminase 2 (TGM2) gene product (GENBANK® Accession No. NMJD04613). SEQ ID NO: 9 is a nucleic acid sequence of a cDNA corresponding to a human tumor protein p53 (TP53; Li-Fraumeni syndrome) gene product (GENBANK® Accession No. NM_000546).
DETAILED DESCRIPTION The presently disclosed subject matter relates to methods for detecting an autoimmune disorder (e.g., a multiple sclerosis syndrome) in a subject by analyzing gene expression profiles for selected genes in biological samples isolated from the subject and comparing the gene expression profiles to standards. In some embodiments, the methods involve determining the expression levels of a set of genes expressed biological samples (e.g. , whole blood or cells isolated therefrom) from a subject suspected of having an autoimmune disease and comparing the expression levels of these genes with the levels of expression of these genes in normal subjects and subjects with confirmed autoimmune diseases. Using the methods of the presently disclosed subject matter, it is possible to determine whether or not a subject has an autoimmune disease (for example, a multiple sclerosis syndrome) or whether the subject does not have autoimmune disease.
In some embodiments of the presently disclosed subject matter involving determining whether or not a subject has an autoimmune disease, the expression levels of many genes can be analyzed simultaneously using microarrays or membrane-based filter arrays. A representative filter array is the GF211 Human "Named Genes" GENEFILTERS® Microarrays Release 1 (available from RESGEN™, a division of Invitrogen Corporation, Carlsbad, California, United States of America), although other arrays can also be used. Using the GF211 array, it is possible to determine the expression levels of over 4000 genes simultaneously in a biological sample. Additionally, the presence on the GF211 filter of certain "housekeeping" genes allows for the comparison of data from experiment to experiment. This facilitates the comparison of newly obtained data to a standard (e.g. a previously generated standard). L General Considerations
Multiple sclerosis is a demyelinating disease of the central nervous system with a presumed autoimmune etiology. Quantitative real-time PCR (Q- RT-PCR) analysis was employed to identify a minimum number of genes for which transcript levels discriminated multiple sclerosis subjects from subjects with other chronic diseases and controls. A computer program was employed to search quantitative transcript levels to identify optimum ratios that distinguished among the different categories. A combination of a four-ratio equation using expression levels of five genes segregated the multiple sclerosis cohort (N = 55) from the control cohort (N = 49) with a sensitivity of 91% and specificity of 98%. When autoimmune and other chronic disease groups were included (N = 78), this discriminator still performed with a sensitivity of 79% and a specificity of 87%. This approach thus can be used for diagnosis not only of multiple sclerosis, but also of other clinically complex autoimmune diseases.
Accordingly, disclosed herein is a novel approach for analysis or scoring to try to identify genes for which transcript levels in whole blood discriminated
MS from the other autoimmune diseases. A focus was placed on MS because it is one of the more difficult autoimmune diseases to diagnose. Q-RT-PCR was employed to measure transcript levels of genes identified from microarray data that were either control genes (equivalent transcript levels in subjects with autoimmune disease and control individuals) or test genes (different transcript levels between autoimmune subjects and controls). A new algorithm that would give each gene in the analysis equal weight but would also provide more accurate weight to quantitative differences in transcript levels was developed. Using this analysis, it was possible to distinguish subjects with MS from control subjects and subjects with other diseases including autoimmune disease in a retrospective analysis. IL. Definitions All references listed in the instant disclosure, including but not limited to all patents, patent applications and publications thereof, scientific journal articles, and GENBANK® database entries (including all annotations available therein) are incorporated herein by reference in their entireties to the extent that they supplement, explain, provide a background for or teach methodology, techniques and/or compositions employed herein.
While the following terms are believed to be well understood by one of ordinary skill in the' art, the following definitions are set forth to facilitate explanation of the presently disclosed subject matter. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which the presently disclosed subject matter belongs. Although any methods, devices, and materials similar or equivalent to those described herein can be used in the practice or testing of the presently disclosed subject matter, representative methods, devices, and materials are now described.
Following long-standing patent law convention, the terms "a", "an", and "the" refer to "one or more" when used in this application, including the claims. Thus, for example, reference to "a cell" includes a plurality of such cells, and so forth.
Unless otherwise indicated, all numbers expressing quantities of ingredients, reaction conditions, and so forth used in the specification and claims are to be understood as being modified in all instances by the term "about". Accordingly, unless indicated to the contrary, the numerical parameters set forth in this specification " and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by the presently disclosed subject matter.
As used herein, the term "about," when referring to a value or to an amount of mass, weight, time, volume, concentration or percentage is meant to encompass variations of in some embodiments ±20%, in some embodiments ±10%, in some embodiments ±5%, in some embodiments ±1 %, in some embodiments ±0.5%, in some embodiments ±0.1%, and in some embodiments ±0.1 % from the specified amount, as such variations are appropriate to perform the disclosed methods or employ the disclosed compositions. As used herein, "significance" or "significant" relates to a statistical analysis of the probability that there is a non-random association between two or more entities. To determine whether or not a relationship is "significant" or has "significance", statistical manipulations of the data can be performed to calculate a probability, expressed as a "P value". Those P values that fall below a user-defined cutoff point are regarded as significant. In some embodiments, a P value less than or equal to 0.05, in some embodiments less than 0.01 , in some embodiments less than 0.005, and in some embodiments less than 0.001 , is regarded as significant. As used herein, the phrase "multiple sclerosis syndrome" refers generally to any disorder that would be classified as a multiple sclerosis (MS) or a precursor thereto. These disorders include various subtypes of MS including, but not limited to relapsing remitting MS (RRMS), primary progressive MS (PPMS), secondary progressive MS (SPMS), and pre-MS (also called clinically isolated syndrome; CIS). ll.A. Nucleic Acids
The nucleic acid molecules employed in accordance with the presently disclosed subject matter include any nucleic acid molecule for which expression is desired to be assessed in evaluating the presence or absence of an autoimmune disease. Representative nucleic acid molecules include, but are not limited to, the isolated nucleic acid molecules of any one of SEQ ID NOs: 1 - 9, complementary DNA molecules, sequences having 80%, 85%, 90%, 92%, 94%, 95%, 96%, 98%, 99%, or greater than 99% identity to a nucleic acid sequence of any one of SEQ ID NOs: 1 -9, sequences capable of hybridizing to any one of SEQ ID NOs: 1-9 under conditions disclosed herein, and corresponding RNA molecules.
As used herein, "nucleic acid" and "nucleic acid molecule" refer to any of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), oligonucleotides, fragments generated by the polymerase chain reaction (PCR), and fragments generated by any of ligation, scission, endonuclease action, and exonuclease action. Nucleic acids can comprise monomers that are naturally occurring nucleotides (such as deoxyribonucleotides and ribonucleotides), or analogs of naturally occurring nucleotides (e.g., α-enantiomeric forms of naturally occurring nucleotides), or a combination of both. Modified nucleotides can have modifications in sugar moieties and/or in pyrimidine or purine base moieties. Sugar modifications include, for example, replacement of one or more hydroxyl groups with halogens, alkyl groups, amines, and azido groups. Sugars can also be functionalized as ethers or esters. Moreover, the entire sugar moiety can be replaced with sterically and electronically similar structures, such as aza-sugars and carbocyclic sugar analogs. Examples of modifications in a base moiety include alkylated purines and pyrimidines, acylated purines or pyrimidines, or other well-known heterocyclic substitutes. Nucleic acid monomers can be linked by phosphodiester bonds or analogs of phosphodiester bonds. Analogs of phosphodiester linkages include phosphorothioate, phosphorodithioate, phosphoroselenoate, phosphorodiselenoate, phosphoroanilothioate, phosphoranilidate, phosphoramidate, and the like.
Unless otherwise indicated, a particular nucleotide sequence also implicitly encompasses complementary sequences, subsequences, elongated sequences, as well as the sequence explicitly indicated. The terms "nucleic acid molecule" or "nucleotide sequence" can also be used in place of "gene", "cDNA", or "mRNA". Nucleic acids can be derived from any source, including any organism. In some embodiments, a nucleic acid is derived from a biological sample isolated from a subject.
The term "subsequence" refers to a sequence of nucleic acids that comprises a part of a longer nucleic acid sequence. An exemplary subsequence is a probe, or a primer. The term "primer" as used herein refers to a contiguous sequence comprising in one example about 8 or more deoxyribonucleotides or ribonucleotides, in another example 10-20 nucleotides, and in yet another example 20-30 nucleotides of a selected nucleic acid molecule. The primers disclosed herein encompass oligonucleotides of sufficient length and appropriate sequence so as to provide initiation of polymerization on a target nucleic acid molecule.
The term "elongated sequence" refers to an addition of nucleotides (or other analogous molecules) incorporated into the nucleic acid. For example, a polymerase (e.g., a DNA polymerase) can add sequences at the 31 terminus of the nucleic acid molecule. In addition, the nucleotide sequence can be combined with other DNA sequences, such as promoters, promoter regions, enhancers, polyadenylation signals, intronic sequences, additional restriction enzyme sites, multiple cloning sites, and other coding segments.
As used herein, the phrases "open reading frame" and "ORF" are given their common meaning and refer to a contiguous series of deoxyribonucleotides or ribonucleotides that encode a polypeptide or a fragment of a polypeptide. In an organism that splices precursor RNAs to form mRNAs, the ORF will be discontinuous in the genome. Splicing produces a continuous ORF that can be translated to produce a polypeptide. In a full-length cDNA, the complete ORF includes those nucleic acid sequences beginning with the start codon and ending with the stop codon. In a cDNA molecule that is not full-length, the ORF includes those nucleic acid sequences present in the non-full-length cDNA that are included within the complete ORF of the corresponding full-length cDNA.
As used herein, the phrase "coding sequence" is used interchangeably with "open reading frame" and "ORF" and refers to a nucleic acid sequence that is transcribed into RNA including, but not limited to mRNA, rRNA, tRNA, snRNA, sense RNA, or antisense RNA. The RNA can then be translated in vitro or in vivo to produce a protein.
The terms "complementary" and "complementary sequences", as used herein, refer to two nucleotide sequences that comprise antiparallel nucleotide sequences capable of pairing with one another upon formation of hydrogen bonds between base pairs. As used herein, the term "complementary sequences" means nucleotide sequences which are substantially complementary, as can be assessed by the same nucleotide comparison set forth herein, or is defined as being capable of hybridizing to the nucleic acid segment in question under relatively stringent conditions such as those described herein. In some embodiments, a complementary sequence is at least 80% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at least 85% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at least 90% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at least 95% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at least 98% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at least 99% complementary to the nucleotide sequence with which is it capable of pairing. In some embodiments, a complementary sequence is at 100% complementary to the nucleotide sequence with which is it capable of pairing. A particular example of a complementary nucleic acid segment is an antisense oligonucleotide.
The term "gene" refers broadly to any segment of DNA associated with a biological function. A gene encompasses sequences including, but not limited to a coding sequence, a promoter region, a transcriptional regulatory sequence, a non-expressed DNA segment that is a specific recognition sequence for regulatory proteins, a non-expressed DNA segment that contributes to gene expression, a DNA segment designed to have desired parameters, or combinations thereof. A gene can be obtained by a variety of methods, including isolation or cloning from a biological sample, synthesis based on known or predicted sequence information, and recombinant derivation of an existing sequence.
As used herein, the terms "known gene" and "reference gene" are used interchangeably and refer to nucleic acid sequences that can be identified as corresponding to a particular expressed sequence tag (EST), partial cDNA, full- length cDNA, or gene. In some embodiments, a reference gene is a gene, a cDNA, or an EST for which the nucleic acid sequence has been determined (i.e. is known). In some embodiments, a reference gene is represented by one of the nucleic acid sequences disclosed in SEQ ID NOs: 1-9. In some embodiments, a reference gene is represented by a nucleic acid sequence complementary to one of the nucleic acid sequences disclosed in SEQ ID NOs: 1-9. In some embodiments, a reference gene is represented by a nucleic acid sequence having 80% or higher identity to any one of SEQ ID NOs: 1-9. In some embodiments, a reference gene is represented by a nucleic acid sequence capable of hybridizing to any one of SEQ ID NOs: 1-9 under conditions disclosed herein. In some embodiments, a reference gene is represented by an RNA molecule corresponding to any one of SEQ ID NOs: 1 - 9. In some embodiments, a reference gene is represented by a nucleic acid sequence present on an array. As used herein, the terms "corresponding to" and "representing",
"represented by", and grammatical derivatives of these terms, when used in the context of a nucleic acid sequence corresponding to or representing a gene, refers to a nucleic acid sequence that results from transcription, reverse transcription, or replication from a particular genetic locus, gene, or gene product (for example, an mRNA). In other words, an EST, partial cDNA, or full- length cDNA corresponding to a particular reference gene is a nucleic acid sequence that one of ordinary skill in the art would recognize as being a product of either transcription or replication of that reference gene (for example, a product produced by transcription of the reference gene). One of ordinary skill in the art would understand that the EST, partial cDNA, or full-length cDNA itself is produced by in vitro manipulation to convert the mRNA into an EST or cDNA, for example by reverse transcription of an isolated RNA molecule that was transcribed from the reference gene. One of ordinary skill in the art will also understand that the product of a reverse transcription is a double-stranded DNA molecule, and that a given strand of that double-stranded molecule can embody either the coding strand or the non-coding strand of the gene. The sequences presented in the Sequence Listing are single-stranded, however, and it is to be understood that the presently disclosed subject matter is intended to encompass the genes represented by the sequences presented in SEQ ID NOs: 1-9, including the specific sequences set forth as well as the reverse/complement of each of these sequences.
A known gene and/or reference gene also includes, but is not limited to those genes that have been identified as being differentially expressed in autoimmune patients versus normal patients, such as but not limited to those set forth in SEQ ID NOs: 1-9. A reference gene is also intended to include nucleic acid sequences that substantially hybridize to one of such genes, including but not limited to one of the nucleic acid sequences disclosed in SEQ ID NOs: 1-9. As such, a reference gene includes a nucleic acid sequence that has one or more polymorphisms such that while the particular nucleic acid sequence might diverge somewhat from one of such genes, including but not limited to one of those disclosed in SEQ ID NOs: 1-9, one of ordinary skill in the art would nonetheless recognize the particular nucleic acid sequence as corresponding to a gene represented by one of such genes, including but not limited to one of the sequences disclosed in SEQ ID NOs: 1-9. For example, the GENBANK® database has at least three accession numbers that are identified as corresponding to the human breast cancer 1 , early onset (BRCA1 ) mRNA. These three represent transcript variants a, a', and b, and have accession numbers NM_007294, NM_007295, and NM_007296, respectively. It is understood that the presently disclosed subject matter, which identifies NM_007294 as SEQ ID NO: 2, also encompasses the other transcript variants. In the context of the presently disclosed subject matter, a reference gene is also intended to include nucleic acid sequences that substantially hybridize to a nucleic acid corresponding to a gene represented by one of the nucleic acid sequences disclosed in SEQ ID NOs: 1-9. As such, a reference gene includes a nucleic acid sequence that has one or more polymorphisms such that while the particular nucleic acid sequence might diverge somewhat from those disclosed in SEQ ID NOs: 1 -9, one of ordinary skill in the art would nonetheless recognize the particular nucleic acid sequence as corresponding to a gene represented by one of the sequences disclosed in SEQ ID NOs: 1-9.
The term "gene expression" generally refers to the cellular processes by which a biologically active polypeptide is produced from a DNA sequence. Generally, gene expression comprises the processes of transcription and translation, along with those modifications that normally occur in the cell to modify the newly translated protein to an active form and to direct it to its proper subcellular or extracellular location. The terms "gene expression level" and "expression level" as used herein refer to an amount of gene-specific RNA or polypeptide that is present in a biological sample. When used in relation to an RNA molecule, the term "abundance" can be used interchangeably with the terms "gene expression level" and "expression level". While an expression level can be expressed in standard units such as "transcripts per cell" for RNA or "nanograms per microgram tissue" for RNA or a polypeptide, it is not necessary that expression level be defined as such. Alternatively, relative units can be employed to describe an expression level. For example, when the assay has an internal control (referred to herein as a "control gene"), which can be, for example, a known quantity of a nucleic acid derived from a gene for which the expression level is either known or can be accurately determined, unknown expression levels of other genes can be compared to the known internal control. More specifically, when the assay involves hybridizing labeled total RNA to a solid support comprising a known amount of nucleic acid derived from known genes, an appropriate internal control could be a housekeeping gene (e.g. glucose-6- phosphate dehydrogenase or elongation factor-1), a ideal housekeeping gene being defined as a gene for which the expression level in all cell types and under all conditions is the same. Use of such an internal control allows relative expression levels to be determined (e.g. relative to the expression of the housekeeping gene) both for the nucleic acids present on the solid support and also between different experiments using the same solid support. This discrete expression level can then be normalized to a value relative to the expression level of the control gene (for example, a housekeeping gene).
As used herein, the term "normalized", and grammatical derivatives thereof, refers to a manipulation of discrete expression level data wherein the expression level of a reference gene is expressed relative to the expression level of a control gene. For example, the expression level of the control gene can be set at 1 , and the expression levels of all reference genes can be expressed in units relative to the expression of the control gene.
The term "average expression level" as used herein refers to the mean expression level, in whatever units are chosen, of a gene in a particular biological sample of a population. To determine an average expression level, a population is defined, and the expression level of the gene in that population is determined for each member of the population by analyzing the same biological sample from each member of the population. The determined expression levels are then added together, and the sum is divided by the number of members in the population. The term "average expression level" is also used to refer to a calculated value that can be used to compare two populations. For example, the average expression level in a population consisting of all patients regardless of autoimmune disease status can be calculated using the method above for a population that consists of statistically significant numbers of patients with and without autoimmune disease (the latter can also be referred to as the "unaffected subpopulation"). However, when the population is made up of unequal numbers of patients with and without autoimmune disease, the calculated value for all genes differentially expressed in these two subpopulations will likely be skewed towards the expression level determined for the subpopulation having the greater number of members. In order to remove this skewing effect, the average expression level in the described population can also be calculated by: (a) determining the average expression level of a gene in the autoimmune patient subpopulation; (b) determining the average expression level of the same gene in the unaffected subpopulation; (c) adding the two determined values together; and (d) dividing the sum of the two determined values by 2 to achieve a value: this value also being defined herein as an "average expression level". Once an expression level is determined for a gene, a profile can be created. As used herein, the term "profile" refers to a repository of the expression level data that can be used to compare the expression levels of different genes among various subjects. For example, for a given subject, the term "profile" can encompass the expression levels of all genes detected in whatever units (as described herein above) are chosen.
The term "profile" is also intended to encompass manipulations of the expression level data derived from a subject. For example, once relative expression levels are determined for a given set of genes in a subject, the relative expression levels for that subject can be compared to a standard to determine if the expression levels in that subject are higher or lower than for the same genes in the standard. Standards can include any data deemed to be relevant for comparison. In some embodiments, a standard is prepared by determining the average expression level of a gene in a normal population, a normal population being defined as subjects that do not have autoimmune disease. In some embodiments, a standard is prepared by determining the average expression level of a gene in a population of subjects that have an autoimmune disease (for example, RA, MS, IDDM, and/or SLE). In a third embodiment, a standard is prepared by determining the average expression level of a gene in the population as a whole (i.e. subjects are grouped together irrespective of autoimmune disease status). In some embodiments, a standard is prepared by determining the average expression level of a gene in a normal population, the average expression level of a gene in an autoimmune population, adding those two values, and dividing the sum by two to determine the midpoint of the average expression in these populations. In this latter embodiment, a profile for a "new" subject can be compared to the standard, and the profile can further comprise data indicating whether for each gene, the expression level in the new subject is higher or lower than the expression level of that gene in the standard. For example, a new subject's profile can comprise a score of "1" for each gene for which the expression in the subject is higher than in the standard, and a score of "0" for each gene for which the expression in the subject is lower than in the standard. In this way, a profile can comprise an overall "score", the score being defined as the sum total of all the ones and zeroes present in the profile. These scores can then be used to predict the presence or absence of autoimmune disease in the new subject. It is understood that the use of 1s and Os is exemplary only, and any convenient value can be assigned in the practice of the methods of the presently disclosed subject matter. The term "isolated", as used in the context of a nucleic acid molecule, indicates that the nucleic acid molecule exists apart from its native environment and is not a product of nature. An isolated DNA molecule can exist in a purified form or can exist in a non-native environment such as, for example, in a host cell transformed with a vector comprising the DNA molecule. The phrases "percent identity" and "percent identical," in the context of two nucleic acid or protein sequences, refer to two or more sequences or subsequences that have in some embodiments at least 60%, in some embodiments at least 70%, in some embodiments at least 80%, in some embodiments at least 85%, in some embodiments at least 90%, in some embodiments at least 95%, in some embodiments at least 98%, and in some embodiments at least 99% nucleotide or amino acid residue identity, when compared and aligned for maximum correspondence, as measured using one of the following sequence comparison algorithms or by visual inspection. The percent identity exists in some embodiments over a region of the sequences that is at least about 50 residues in length, in some embodiments over a region of at least about 100 residues, and in some embodiments the percent identity exists over at least about 150 residues. In some embodiments, the percent identity exists over the entire length of a given region, such as a coding region. In some embodiments, a nucleic acid is at least 80% identical to one of SEQ ID NOs: 1-9.
For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters. Optimal alignment of sequences for comparison can be conducted, for example, by the local homology algorithm described in Smith & Waterman (1981 ) Adv Appl Math 2:482-489, by the homology alignment algorithm described in Needleman &Wunsch (1970) J MoI Biol 48:443-453, by the search for similarity method described in Pearson & Lipman (1988) Proc Natl Acad Sci U S A 85:2444-2448, by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the GCG Wisconsin Package, available from Accelrys, Inc., San Diego, California, United States of America), or by visual inspection. See generally, Ausubel et al. (1994) Current Protocols in Molecular Biology. Wiley, New York, United States of America. One example of an algorithm that is suitable for determining percent sequence identity and sequence similarity is the BLAST algorithm, which is described in Altschul et al. (1990) J MoI Biol 215:403-410. Software for performing BLAST analyses is publicly available through the website of the United States National Center for Biotechnology Information. This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al. (1990) J MoI Biol 215:403-410). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always < 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when the cumulative alignment score falls off by the quantity X from its maximum achieved value, the cumulative score goes to zero or below due to the accumulation of one or more negative-scoring residue alignments, or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11 , an expectation (E) of 10, a cutoff of 100, M = 5, N = -4, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix. See Henikoff & Henikoff (1992) Proc Λ/afMcad Sc/ 1/ S /I 89: 10915-10919. In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences. See e.g., Karlin & Altschul (1993) Proc Natl Acad Sci U S A 90:5873-5877. One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a test nucleic acid sequence is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid sequence to the reference nucleic acid sequence is in some embodiments less than about 0.1 , in some embodiments less than about 0.01 , and in some embodiments less than about 0.001.
The term "substantially identical", in the context of two nucleotide sequences, refers to two or more sequences or subsequences that have in some embodiments at least about 80% nucleotide identity, in some embodiments at least about 85% nucleotide identity, in some embodiments at least about 90% nucleotide identity, in some embodiments at least about 95% nucleotide identity, in some embodiments at least about 98% nucleotide identity, and in some embodiments at least about 99% nucleotide identity, when compared and aligned for maximum correspondence, as measured using one of the following sequence comparison algorithms or by visual inspection. In one example, the substantial identity exists in nucleotide sequences of at least 50 residues, in another example in nucleotide sequence of at least about 100 residues, in another example in nucleotide sequences of at least about 150 residues, and in yet another example in nucleotide sequences comprising complete coding sequences. In one aspect, polymorphic sequences can be substantially identical sequences. The term "polymorphic" refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population. An allelic difference can be as small as one base pair. Nonetheless, one of ordinary skill in the art would recognize that the polymorphic sequences correspond to the same gene. For example, SEQ ID NO: 9 is a cDNA sequence representing a human TP53 gene product that is present in the GENBANK® database under Accession Number NM_000546. According to the description presented therein, the TP53 gene is characterized by polymorphisms at nucleotide positions 390, 466, 1470, 1927, 1950, 1976, 1977, 2075, 2076, 2497, and 2498. Nucleic acid sequences comprising any or all of these polymorphisms are substantially identical to SEQ ID NO: 9, and thus are intended to be encompassed within the claimed subject matter.
Another indication that two nucleotide sequences are substantially identical is that the two molecules specifically or substantially hybridize to each other under stringent conditions. In the context of nucleic acid hybridization, two nucleic acid sequences being compared can be designated a "probe sequence" and a "target sequence". A "probe sequence" is a reference nucleic acid molecule, and a "'target sequence" is a test nucleic acid molecule, often found within a heterogeneous population of nucleic acid molecules. A "target sequence" is synonymous with a "test sequence".
An exemplary nucleotide sequence employed for hybridization studies or assays includes probe sequences that are complementary to or mimic in some embodiments at least an about 14 to 40 nucleotide sequence of a nucleic acid molecule of the presently disclosed subject matter. In one example, probes comprise 14 to 20 nucleotides, or even longer where desired, such as 30, 40, 50, 60, 100, 200, 300, or 500 nucleotides or up to the full length of any of the genes represented by SEQ ID NOs: 1-9. Such fragments can be readily prepared by, for example, directly synthesizing the fragment by chemical synthesis, by application of nucleic acid amplification technology, or by introducing selected sequences into recombinant vectors for recombinant production. The phrase "hybridizing specifically to" refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex nucleic acid mixture (e.g., total cellular DNA or RNA).
The phrase "hybridizing substantially to" refers to complementary hybridization between a probe nucleic acid molecule and a target nucleic acid molecule and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired hybridization.
"Stringent hybridization conditions" and "stringent hybridization wash conditions" in the context of nucleic acid hybridization experiments such as Southern and Northern blot analysis are both sequence- and environment- dependent. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biologv- Hybridization with Nucleic Acid Probes. Elsevier, New York, United States of America. Generally, highly stringent hybridization and wash conditions are selected to be about 50C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. Typically, under "stringent conditions" a probe will hybridize specifically to its target subsequence, but to no other sequences.
The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very stringent conditions are selected to be equal to the Tm for a particular probe. An example of stringent hybridization conditions for Southern or Northern Blot analysis of complementary nucleic acids having more than about 100 complementary residues is overnight hybridization in 50% formamide with 1 mg of heparin at 420C. An example of highly stringent wash conditions is 15 minutes in 0.1x SSC, SM NaCI at 65°C. An example of stringent wash conditions is 15 minutes in 0.2x SSC buffer at 65°C (see Sambrook & Russell (2001 ) Molecular Cloning: A Laboratory Manual, 3rd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, United States of America, for a description of SSC buffer). Often, a high stringency wash is preceded by a low stringency wash to remove background probe signal. An example of medium stringency wash conditions for a duplex of more than about 100 nucleotides is 15 minutes in 1X SSC at 45°C. An example of low stringency wash for a duplex of more than about 100 nucleotides is 15 minutes in 4-6X SSC at 400C. For short probes (e.g., about 10 to 50 nucleotides), stringent conditions typically involve salt concentrations of less than about 1 M Na+ ion, typically about 0.01 to 1 M Na+ ion concentration (or other salts) at pH 7.0-8.3, and the temperature is typically at least about 300C. Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide. In general, a signal to noise ratio of 2-fold (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization.
The following are examples of hybridization and wash conditions that can be used to clone homologous nucleotide sequences that are substantially identical to reference nucleotide sequences of the presently disclosed subject matter: a probe nucleotide sequence hybridizes in one example to a target nucleotide sequence in 7% sodium dodecyl sulfate (SDS), 0.5M NaPO4, 1 mm EDTA at 5O0C followed by washing in 2X SSC, 0.1 % SDS at 5O0C; in another example, a probe and target sequence hybridize in 7% SDS, 0.5M NaPO4, 1 mm EDTA at 500C followed by washing in 1X SSC, 0.1 % SDS at 50°C; in another example, a probe and target sequence hybridize in 7% SDS, 0.5M NaPO4, 1 mm EDTA at 5O0C followed by washing in 0.5X SSC, 0.1 % SDS at 50°C; in another example, a probe and target sequence hybridize in 7% SDS, 0.5M NaPO4, 1 mm EDTA at 50°C followed by washing in 0.1 X SSC, 0.1 % SDS at 50°C; in yet another example, a probe and target sequence hybridize in 7% SDS, 0.5M NaPO4, 1 mm EDTA at 5O0C followed by washing in 0.1 X SSC, 0.1 % SDS at 65°C. In some embodiments, hybridization conditions comprise hybridization in a roller tube for at least 12 hours at 420C. Pre-made hybridization solutions are also commercially available from various suppliers. In some embodiments, a hybridization solution comprises M1CROHYB™ (RESGEN™), and in some embodiments a hybridization solution comprises MICROHYB™ further comprising 5.0 μg COT-1® DNA (Invitrogen Corporation, Carlsbad, California, United States of America) and 5.0 μg poly- dA. In some embodiments, post-hybridization wash conditions comprise two washes in 2X SSC/1% SDS at 500C for 20 minutes each followed by a third wash in 0.5X SSC/1 % SDS at 55°C for 15 minutes.
As used herein, the term "purified", when applied to a nucleic acid or protein, denotes that the nucleic acid or protein is essentially free of other cellular components with which it is associated in the natural state. It can be in a homogeneous state although it also can be in either a dry or aqueous solution. Purity and homogeneity are typically determined using analytical chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid chromatography. A protein that is the predominant species present in a preparation is substantially purified. The term "purified" denotes that a nucleic acid or protein gives rise to essentially one band in an electrophoretic gel. Particularly, it means that the nucleic acid or protein is in some embodiments at least about 50% pure, in some embodiments at least about 85% pure, and in some embodiments at least about 99% pure. II. B. Biological Samples
The presently disclosed subject matter provides methods that can be used to detect the expression level of a gene in a biological sample. The term "biological sample" as used herein refers to a sample that comprises a biomolecule that permits the expression level of a gene to be determined. Representative biomolecules include, but are not limited to total RNA, mRNA, and polypeptides. As such, a biological sample can comprise a cell or a group of cells. Any cell or group of cells can be used with the methods of the presently disclosed subject matter, although cell-types and organs that would be predicted to show differential gene expression in subjects with autoimmune disease versus normal subjects are best suited. In some embodiments, gene expression levels are determined where the biological sample comprises a cell isolated from a biological fluid including, but not limited to whole blood or a fraction thereof. In some embodiments, a biological sample comprises PBMCs that have been isolated from a subject. In some embodiments, the biological sample comprises one or more of the constituent cell types that make up a PBMC preparation, including but not limited to T cells, B cells, monocytes, and NK/NKT cells. A representative PMBC preparation can comprise about 75% T cells, about 5% to about 10% B cells, about 5% to about 10% monocytes, and a small percentage of NK/NKT cells. In some embodiments, the biological sample comprises epithelial cells, such as cheek epithelial cells. Also encompassed within the phrase "biological sample" are biomolecules that are derived from a cell or group of cells that permit gene expression levels to be determined, e.g. nucleic acids and polypeptides.
The expression level of the gene can be determined using molecular biology techniques that are well known in the art. For example, if the expression level is to be determined by analyzing RNA isolated from the biological sample, techniques for determining the expression level include, but are not limited to Northern blotting, quantitative PCR (e.g., Q-RT-PCR), and the use of nucleic acid arrays and microarrays.
In some embodiments, the expression level of a gene is determined by hybridizing 33P-labeled cDNA generated from total RNA isolated from a biological sample to one or more DNA sequences representing one or more genes that has been affixed to a solid support, e.g. a membrane. When a membrane comprises nucleic acids representing many genes (including internal controls), the relative expression level of many genes can be determined. The presence of internal control sequences on the membrane also allows experiment-to-experiment variations to be detected, yielding a strategy whereby the raw expression data derived from each experiment can be compared from experiment-to-experiment.
Alternatively, gene expression can be determined by analyzing protein levels in a biological sample using antibodies. Representative antibody-based techniques include, but are not limited to immunoprecipitation, Western blotting, and the use of immunoaffinity columns. HL Isolation and Analysis of Nucleic Acids III.A. Enrichment of Nucleic Acids
The presently disclosed subject matter encompasses use of a sufficiently large biological sample to enable a comprehensive survey of low abundance nucleic acids in the sample. Thus, the sample can optionally be concentrated prior to isolation of nucleic acids. Several protocols for concentration have been developed that alternatively use slide supports (Kohsaka & Carson (1994)
J Clin Lab Anal 8:452-455; Millar et al. (1995) Anal Biochem 226:325-330), filtration columns (Bej et al. (1991 ) Appl Environ Microbiol 57:3529-3534), or immunomagnetic beads (Albert et al. (1992) J Virol 66:5627-5630; Chiodi ef al.
(1992) J CHn Microbiol 30:1768-1771 ). Such approaches can significantly increase the sensitivity of subsequent detection methods.
As one example, SEPHADEX® matrix (Sigma, St. Louis, Missouri,
United States of America) is a matrix of diatomaceous earth and glass suspended in a solution of chaotropic agents and has been used to bind nucleic acid material (Boom et al. (1990) J Clin Microbiol 28:495-503; Buffone et al.
(1991) CHn Chem 37:1945-1949). After the nucleic acid is bound to the solid support material, impurities and inhibitors are removed by washing and centrifugation, and the nucleic acid is then eluted into a standard buffer. Target capture also allows the target sample to be concentrated into a minimal volume, facilitating the automation and reproducibility of subsequent analyses (Lanciotti et al. (1992) J CHn Microbiol 30:545-551 ).
III. B. Nucleic Acid Isolation
Methods for nucleic acid isolation can comprise simultaneous isolation of total nucleic acid, or separate and/or sequential isolation of individual nucleic acid types (e.g., genomic DNA, cDNA, organelle DNA, genomic RNA, mRNA, polyA+ RNA, rRNA, tRNA) followed by optional combination of multiple nucleic acid types into a single sample.
When total RNA or purified mRNA is selected as a biological sample, the disclosed method enables an assessment of a level of gene expression. For example, detecting a level of gene expression in a biological sample can comprise determination of the abundance of a given mRNA species in the biological sample. RNA isolation methods are known to one of skill in the art. See
Sambrook & Russell (2001) Molecular Cloning: A Laboratory Manual. 3rd
Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York,
United States of America. A representative procedure for RNA isolation from a biological sample is set forth in EXAMPLE 2.
Simple and semi-automated extraction methods can also be used for nucleic acid isolation, including for example, the SPLIT SECOND™ system (Boehringer Mannheim, Indianapolis, Indiana, United States of America), the TRIZOL™ Reagent system (Life Technologies, Gaithersburg, Maryland, United States of America), and the FASTPREP™ system (Bio 101 , La JoIIa, California, United States of America). See also Paladichuk (1999) The Scientist 13(16):20-23.
Nucleic acids that are used for subsequent amplification and labeling can be analytically pure as determined by spectrophotometric measurements or by visual inspection following electrophoretic resolution. The nucleic acid sample can be free of contaminants such as polysaccharides, proteins, and inhibitors of enzyme reactions. When an RNA sample is intended for use as probe, it can be free of nuclease contamination. Contaminants and inhibitors can be removed or substantially reduced using resins for DNA extraction (e.g., CHELEX™ 100 from BioRad Laboratories, Hercules, California, United States of America) or by standard phenol extraction and ethanol precipitation. Isolated nucleic acids can optionally be fragmented by restriction enzyme digestion or shearing prior to amplification. lll.C. PCR Amplification of Nucleic Acids The terms "template nucleic acid" and "target nucleic acid" as used herein each refers to nucleic acids isolated from a biological sample as described herein above. The terms "template nucleic acid pool", "template pool", "target nucleic acid pool", and "target pool" each refers to an amplified sample of "template nucleic acid". Thus, a target pool comprises amplicons generated by performing an amplification reaction using the template nucleic acid. In some embodiments, a target pool is amplified using a random amplification procedure as described herein. The term "target-specific primer" refers to a primer that hybridizes selectively and predictably to a target sequence, for example a sequence that shows differential expression in a patient with an autoimmune disease relative to a normal patient, in a target nucleic acid sample. A target-specific primer can be selected or synthesized to be complementary to known nucleotide sequences of target nucleic acids.
The term "random primer" refers to a primer having an arbitrary sequence. The nucleotide sequence of a random primer can be known, although such sequence is considered arbitrary in that it is not designed for complementarity to a nucleotide sequence of the target-specific probe. The term "random primer" encompasses selection of an arbitrary sequence having increased probability to be efficiently utilized in an amplification reaction. For example, the Random Oligonucleotide Construction Kit (ROCK; available from http://www.sru.edu/depts/artsci/bio/ROCK.htm) is a macro-based program that facilitates the generation and analysis of random oligonucleotide primers (Strain & Chmielewski (2001 ) BioTechniques 30:1286-1293). Representative primers include, but are not limited to random hexamers and rapid amplification of polymorphic DNA (RAPD)-type primers as described in Williams et al. 1990.
A random primer can also be degenerate or partially degenerate as described in Telenius et al. (1992) Genomics 13:718-725. Briefly, degeneracy can be introduced by selection of alternate oligonucleotide sequences that can encode a same amino acid sequence.
In some embodiments, random primers can be prepared by shearing or digesting a portion of the template nucleic acid sample. Random primers so- constructed comprise a sample-specific set of random primers.
The term "heterologous primer" refers to a primer complementary to a sequence that has been introduced into the template nucleic acid pool. For example, a primer that is complementary to a linker or adaptor is a heterologous primer. Representative heterologous primers can optionally include a poly(dT) primer, a poly(T) primer, or as appropriate, a poly(dA) primer or a poly(A) primer.
The term "primer" as used herein refers to a contiguous sequence comprising in some embodiments about 6. or more nucleotides, in some embodiments about 10-20 nucleotides (e.g. 15-mer), and in some embodiments about 20-30 nucleotides (e.g. a 22-mer). Primers used to perform the method of the presently disclosed subject matter encompass oligonucleotides of sufficient length and appropriate sequence so as to provide initiation of polymerization on a nucleic acid molecule. 111.C.1. Quantitative RT-PCR
In some embodiments of the presently disclosed subject matter, the abundance of specific imRNA species present in a biological sample (for example, mRNA extracted from peripheral blood mononuclear cells) is assessed by quantitative RT-PCR. In this embodiment, standard molecular biological techniques are used in conjunction with specific PCR primers to quantitatively amplify those mRNA molecules corresponding to the genes of interest. Methods for designing specific PCR primers and for performing quantitative amplification of nucleic acids including mRNA are well known in the art. See e.g. Sambrook & Russell (2001 ) Molecular Cloning: A Laboratory Manual, 3rd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, United States of America.
III.C.2.Amplified Antisense RNA (aaRNA)
Several procedures have been developed specifically for random amplification of RNA, including but not limited to Amplified Antisense RNA (aaRNA) and Global RNA Amplification, also described further herein below. A population of RNA can be amplified using a technique referred to as Amplified Antisense RNA (aaRNA). See Van Gelder et al. (1990) Proc Natl Acad Sci U S A 87:1663-1667; Wang et al. (2000) Nat Biotechnol 18:457-459. Briefly, an oligo(dT) primer is synthesized such that the 5' end of the primer includes a T7 RNA polymerase promoter. This oligonucleotide can be used to prime the poly(A)+ mRNA population to generate cDNA. Following first strand cDNA synthesis, second strand cDNA is generated using RNA nicking and priming (Sambrook & Russell (2001 ) Molecular Cloning: A Laboratory Manual, 3rd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, United States of America). The resulting cDNA is treated briefly with S1 nuclease and blunt-ended with T4 DNA polymerase. The cDNA is then used as a template for transcription-based amplification using the T7 RNA polymerase promoter to direct RNA synthesis.
Eberwine et al. adapted the aaRNA procedure for in situ random amplification of RNA followed by target-specific amplification. The successful amplification of under represented transcripts suggests that the pool of transcripts amplified by aaRNA is representative of the initial mRNA population (Eberwine et al. (1992) Proc Natl Acad Sci U S A 89:3010-3014). III.C.3.Global RNA Amplification. U.S. Patent No. 6,066,457 to Hampson et al. describes a method for substantially uniform amplification of a collection of single stranded nucleic acid molecules such as RNA. Briefly, the nucleic acid starting material is anchored and processed to produce a mixture of directional shorter random size DNA molecules suitable for amplification of the sample.
In accordance with the methods of the presently disclosed subject matter, any one of the above-mentioned PCR techniques or related techniques can be employed to perform the step of amplifying the nucleic acid sample. In addition, such methods can be optimized for amplification of a particular subset of nucleic acid (e.g., specific mRNA molecules versus total mRNA), and representative optimization criteria and related guidance can be found in the art. See Cha & Thilly (1993) PCR Methods Appl 3:S18-29; Linz et al. (1990) J CHn Chem Clin Biochem 28:5-13; Robertson & Walsh-Weller (1998) Methods MoI Biol 98:121-154; Roux (1995) PCR Methods /\pp/ 4:S185-194; Williams (1989) Biotechniques 7:762-769; McPherson et al. (1995) PCR 2: A Practical Approach, IRL Press, New York, New York, United States of America. III. C.4. Kits for Gene Expression Analysis
The presently disclosed subject matter also provides for kits comprising a plurality of oligonucleotide primers that can be used in the methods of the presently disclosed subject matter to assess gene expression levels of genes of interest. In non-limiting embodiments, the kit can comprise oligonucleotide primers designed to be used to determine the expression level of one or more (e.g. 1 , 2, 3, 4, 5, 6, 7, 8, or all) of the genes set forth in SEQ ID NOs: 1-9. Additionally, the kit can comprise instructions for using the primers, including T/US2006/043272
but not limited to information regarding proper reaction conditions and the sizes of the expected amplified fragments. IV1 Nucleic Acid Labeling
In some embodiments, the expression level of a gene in a biological sample is determined by hybridizing total RNA isolated from the biological sample to an array containing known quantities of nucleic acid sequences corresponding to known genes. For example, the array can comprise single- stranded nucleic acids (also referred to herein as "probes" and/or "probe sets") in known amounts for specific genes, which can then be hybridized to nucleic acids isolated from the biological sample. The array can be set up such that the nucleic acids are present on a solid support in such a manner as to allow the identification of those genes on the array to which the total RNA hybridizes.
In this embodiment, the total RNA is hybridized to the array, and the genes to which the total. RNA hybridizes are detected using standard techniques. In some embodiments of the presently disclosed subject matter, the amplified nucleic acids are labeled with a radioactive nucleotide prior to hybridization to the array, and the genes on the array to which the RNA hybridizes are detected by autoradiography or phosphorimage analysis.
Alternatively, nucleic acids isolated from a biological sample are hybridized with a set of probes without prior labeling of the nucleic acids. For example, unlabeled total RNA isolated from the biological sample can be detected by hybridization to one or more labeled probes, the labeled probes being specific for those genes found to be useful in the methods of the presently disclosed subject matter (e.g. those genes represented by SEQ ID NOs: 1-9). In some embodiments, both the nucleic acids and the one or more probes include a label, wherein the proximity of the labels following hybridization enables detection. An exemplary procedure using nucleic acids labeled with chromophores and fluorophores to generate detectable photonic structures is described in U.S. Patent No. 6,162,603. The nucleic acids or probes/probe sets can be labeled using any detectable label. It will be understood to one of skill in the art that any suitable method for labeling can be used, and no particular detectable label or technique for labeling should be construed as a limitation of the disclosed methods. Direct labeling techniques include incorporation of radioisotopic (e.g.32P, 33P, or 35S) or fluorescent nucleotide analogues into nucleic acids by enzymatic synthesis in the presence of labeled nucleotides or labeled PCR primers. A radio-isotopic label can be detected using autoradiography or phosphorimaging. A fluorescent label can be detected directly using emission and absorbance spectra that are appropriate for the particular label used. Any detectable fluorescent dye can be used, including but not limited to fluorescein isothiocyanate (FITC), FLUOR X™, ALEXA FLUOR® 488, OREGON GREEN® 488, 6-JOE (θ-carboxy^'.δ'-dichloro^'^'-dimethoxyfluorescein, succinimidyl ester), ALEXA FLUOR® 532, Cy3, ALEXA FLUOR® 546, TMR (tetramethylrhodamine), ALEXA FLUOR® 568, ROX (X-rhodamine), ALEXA FLUOR® 594, TEXAS RED®, BODIPY® 630/650, and Cy5 (available from Amersham Pharmacia Biotech, Piscataway, New Jersey, United States of America, or from Molecular Probes Inc., Eugene, Oregon, United States of America). Fluorescent tags also include sulfonated cyanine dyes (available from Li-Cor, Inc., Lincoln, Nebraska, United States of America) that can be detected using infrared imaging. Methods for direct labeling of a heterogeneous nucleic acid sample are known in the art and representative protocols can be found in, for example, DeRisi ef a/. (1996) /Vaf GeneM 4:457- 460; Sapolsky & Lipshutz (1996) Genomics 33:445-456; Schena et al. (1995) Science 270:467-470; Schena etal. (1996) Proc Natl Acad Sci U S A 93: 10614- 10619; Shalon et al. (1996) Genome Res 6:639-645; Shoemaker et al. (1996) Nat Genet 14:450-456. A representative procedure is set forth herein as EXAMPLE 8. Indirect labeling techniques can also be used in accordance with the methods of the presently disclosed subject matter, and in some cases, can facilitate detection of rare target sequences by amplifying the label during the detection step. Indirect labeling involves incorporation of epitopes, including recognition sites for restriction endonucleases, into amplified nucleic acids prior to hybridization with a set of probes. Following hybridization, a protein that binds the epitope is used to detect the epitope tag.
In some embodiments, a biotinylated nucleotide can be included in the amplification reactions to produce a biotin-labeled nucleic acid sample. Following hybridization of the biotin-labeled sample with probes as described herein, the label can be detected by binding of an avidin-conjugated fluorophore, for example streptavidin-phycoerythrin, to the biotin label. Alternatively, the label can be detected by binding of an avidin-horseradish peroxidase (HRP) streptavidin conjugate, followed by colorimetric detection of an HRP enzymatic product.
The quality of probe or nucleic acid sample labeling can be approximated by determining the specific activity of label incorporation. For example, in the case of a fluorescent label, the specific activity of incorporation can be determined by the absorbance at 260 nm and 550 nm (for Cy3) or 650 nm (for Cy5) using published extinction coefficients (Randolph &. Waggoner (1995) Nucleic Acids Res 25:2923-2929). Very high label incorporation (specific activities of >1 fluorescent molecule/20 nucleotides) can result in a decreased hybridization signal compared with probe with lower label incorporation. Very low specific activity (<1 fluorescent molecule/100 nucleotides) can give unacceptably low hybridization signals. See Worley et al. (2000) in Schena M, ed, Microarray Biochip Technology, pp. 65-86, Eaton Publishing, Natick, Massachusetts, United States of America. Thus, it will be understood to one of skill in the art that labeling methods can be optimized for performance in various hybridization assays, and that optimal labeling can be unique to each label type. V1 Microarravs
In some embodiments of the presently disclosed subject matter, nucleic acids isolated from a biological sample are hybridized to a microarray, wherein the microarray comprises nucleic acids corresponding to those genes to be tested (optionally also including one or more internal control genes). The genes are immobilized on a solid support, such that each position on the support identifies a particular gene, and each gene the expression level of which is to be analyzed is represented one or more times on the solid support. Solid supports include, but are not limited to nitrocellulose and nylon membranes. Solid supports can also be glass or silicon-based (i.e. "gene chips"). Any solid support can be used in the methods of the presently disclosed subject matter, so long as the support provides a substrate for the localization of a known amount of a nucleic acid in a specific position that can be identified subsequent to the hybridization and detection steps. In some embodiments, a microarray comprises a nylon membrane (for example, the GF211 Human "Named Genes" GENEFILTERS® Microarrays Release 1 available from RESGEN™). A microarray can be assembled using any suitable method known to one of skill in the art, and any one microarray configuration or method of construction is not considered to be a limitation of the presently disclosed subject matter. Representative microarray formats that can be used in accordance with the methods of the presently disclosed subject matter are described herein below.
VA Array Substrate and Configuration
The substrate for printing the array should be substantially rigid and amenable to DNA immobilization and detection methods {e.g., in the case of fluorescent detection, the substrate must have low background fluorescence in the region of the fluorescent dye excitation wavelengths). The substrate can be nonporous or porous as determined most suitable for a particular application. Representative substrates include, but are not limited to a glass microscope slide, a glass coverslip, silicon, plastic, a polymer matrix, an agar gel, a polyacrylamide gel, and a membrane, such as a nylon, nitrocellulose or ANAPORE™ (Whatman, Maidstone, United Kingdom) membrane.
Porous substrates (membranes and polymer matrices) are preferred in that they permit immobilization of relatively large amount of probe molecules and provide a three-dimensional hydrophilic environment for biomolecular interactions to occur (Dubiley et al. (1997) Nucleic Acids Res 25:2259-2265; Yershov et al. (1996) Proc Natl Acad ScI U S A 93:4913-4918). A BIOCHIP ARRAYER™ dispenser (Packard Instrument Company, Meriden, Connecticut, United States of America) can effectively dispense probes onto membranes such that the spot size is consistent among spots whether one, two, or four droplets were dispensed per spot (Englert (2000) in Schena M, ed, Microarrav Biochip Technology, pp. 231-246, Eaton Publishing, Natick, Massachusetts,
United States of America). The array can also comprise a dot blot or a slot blot.
A microarray substrate for use in accordance with the methods of the presently disclosed subject matter can have either a two-dimensional (planar) or a three-dimensional (non-planar) configuration. An exemplary three- dimensional microarray is the FLOW-THRU™ chip (Gene Logic, Inc., Gaithersburg, Maryland, United States of America), which has implemented a gel pad to create a third dimension. Such a three-dimensional microarray can be constructed of any suitable substrate, including glass capillary, silicon, metal oxide filters, or porous polymers. See Yang et al. (1998) Science 282:2244- 2246; Steel et al. (2000) jn. Schena M, ed, Microarrav Biochip Technology, pp. 87-118, Eaton Publishing, Natick, Massachusetts, United States of America. Briefly, a FLOW-THRU™ chip (Gene Logic, Inc.) comprises a uniformly porous substrate having pores or microchannels connecting upper and lower faces of the chip. Probes are immobilized on the walls of the microchannels and a hybridization solution comprising sample nucleic acids can flow through the microchannels. This configuration increases the capacity for probe and target binding by providing additional surface relative to two-dimensional arrays. See U.S. Patent No. 5,843,767. V.B. Surface Chemistry
The particular surface chemistry employed is inherent in the microarray substrate and substrate preparation. Immobilization of nucleic acids probes post-synthesis can be accomplished by various approaches, including adsorption, entrapment, and covalent attachment. Preferably, the binding technique does not disrupt the activity of the probe.
For substantially permanent immobilization, covalent attachment is preferred. Since few organic functional groups react with an activated silica surface, an intermediate layer is advisable for substantially permanent probe immobilization. Functionalized organosilanes can be used as such an intermediate layer on glass and silicon substrates (Liu & Hlady (1996) Colloids and Surfaces B. Biointerfaces 8:25-37; Shriver-Lake (1998) jn Cass T & Ligler FS, eds, Immobilized Biomolecules in Analysis, pp. 1-14, Oxford Press, Oxford, United Kingdom). A hetero-bifunctional cross-linker requires that the probe have a different chemistry than the surface, and is preferred to avoid linking reactive groups of the same type. A representative hetero-bifunctional cross- linker comprises gamma-maleimidobutyryloxy-succimide (GMBS) that can bind maleimide to a primary amine of a probe. Procedures for using such linkers are known to one of skill in the art and are summarized in Hermanson (1990) Bioconiuqate Techniques, Academic Press, San Diego, California, United States of America. A representative protocol for covalent attachment of DNA to silicon wafers is described in O'Donnell et al. (1997) Anal Chem 69:2438-2443. When using a glass substrate, the glass should be substantially free of debris and other deposits and have a substantially uniform coating. Pretreatment of slides to remove organic compounds that can be deposited during their manufacture can be accomplished, for example, by washing in hot nitric acid. Cleaned slides can then be coated with 3- aminopropyltrimethoxysilane using vapor-phase techniques. After silane deposition, slides are washed with deionized water to remove any silane that is not attached to the glass and to catalyze unreacted methoxy groups to crosslink to neighboring silane moieties on the slide. The uniformity of the coating can be assessed by known methods, for example electron spectroscopy for chemical analysis (ESCA) or ellipsometry (Ratner & Castner (1997) in Vickerman JC, ed, Surface Analysis: The Principal Techniques, John Wiley & Sons, New York, New York, United States of America; Schena et al. (1995) Science 270:467-470). See also Worley et al. (2000) in Schena M, ed, Microarrav Biochip Technology, pp. 65-86, Eaton Publishing, Natick, Massachusetts, United States of America.
For attachment of probes greater than about 300 base pairs, noncovalent binding is suitable. A representative technique for noncovalent linkage involves use of sodium isothiocyanate (NaSCN) in the spotting solution, as described in EXAMPLE 9. When using this method, amino-silanized slides can be used since this coating improves nucleic acid binding when compared to bare glass. This method works well for spotting applications that use about 100 ng/μl (Worley et al. (2000) in Schena M, ed, Microarrav Biochip Technology, pp.
65-86, Eaton Publishing, Natick, Massachusetts, United States of America).
In the case of nitrocellulose or nylon membranes, the chemistry of nucleic acid binding to these membranes has been well characterized (Southern (1975) J MoI Biol 98:503-517; Sambrook & Russell (2001 ) Molecular Cloning: A Laboratory Manual, 3rd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, United States of America). One such nylon filter array is the GF211 Human "Named Genes" GENEFILTERS® Microarrays Release 1 (available from RESGEN™, a division of Invitrogen Corporation, Calsbad, California, United States of America), although other arrays can also be used. V.C. Arraying Techniques
A microarray for the detection of gene expression levels in a biological sample can be constructed using any one of several methods available in the art including, but not limited to photolithographic and microfluidic methods, further described herein below. In some embodiments, the method of construction is flexible, such that a microarray can be tailored for a particular purpose.
As is standard in the art, a technique for making a microarray should create consistent and reproducible spots. Each spot can be uniform, and appropriately spaced away from other spots within the configuration. A solid support for use in the presently disclosed subject matter comprises in some embodiments about 10 or more spots, in some embodiments about 100 or more spots, in some embodiments about 1 ,000 or more spots, and in some embodiments about 10,000 or more spots. In some embodiments, the volume deposited per spot is about 10 picoliters to about 10 nanoliters, and in some embodiments about 50 picoliters to about 500 picoliters. The diameter of a spot is in some embodiments about 50 μm to about 1000 μm, and in some embodiments about 100 μm to about 250 μm.
Light-directed synthesis. This technique was developed by Fodor et al. (Fodor et al. (1991 ) Science 251 :767-773; Fodor et al. (1993) Nature 364:555- 556; U.S. Patent No. 5,445,934), and commercialized by Affymetrix, Inc. of Santa Clara, California, United States of America. Briefly, the technique uses precision photolithographic masks to define the positions at which single, specific nucleotides are added to growing single-stranded nucleic acid chains. Through a stepwise series of defined nucleotide additions and light-directed chemical linking steps, high-density arrays of defined oligonucleotides are synthesized on a solid substrate. A variation of the method, called Digital Optical Chemistry, employs mirrors to direct light synthesis in place of photolithographic masks (PCT International Patent Application Publication No. WO 99/63385). This approach is generally limited to probes of about 25 nucleotides in length or less. See also Warrington et al. (2000) in Schena M, ed, Microarrav Biochip Technology, pp. 119-148, Eaton Publishing, Natick, Massachusetts, United States of America.. Contact Printing. Several procedures and tools have been developed for printing microarrays using rigid pin tools. In surface contact printing, the pin tools are dipped into a sample solution, resulting in the transfer of a small volume of fluid onto the tip of the pins. Touching the pins or pin samples onto a microarray surface leaves a spot, the diameter of which is determined by the surface energies of the pin, fluid, and microarray surface. Typically, the transferred fluid comprises a volume in the nanoliter or picoliter range.
One common contact printing technique uses a solid pin replicator. A replicator pin is a tool for picking up a sample from one stationary location and transporting it to a defined location on a solid support. A typical configuration for a replicating head is an array of solid pins, generally in an 8 x 12 format, spaced at 9-mm centers that are compatible with 96- and 384-well plates. The pins are dipped into the wells, lifted, moved to a position over the microarray substrate, lowered to touch the solid support, whereby the sample is transferred. The process is repeated to complete transfer of all the samples. See Maier et al: (1994) J Biotechnol 35: 191 -203. A recent modification of solid pins involves the use of solid pin tips having concave bottoms, which print more efficiently than flat pins in some circumstances. See Rose (2000) in Schena M ed, Microarrav Biochip Technology, pp. 19-38, Eaton Publishing, Natick, Massachusetts, United States of America. Solid pins for microarray printing can be purchased, for example, from
TeleChem International, Inc. of Sunnyvale, California in a wide range of tip dimensions. The CHIPMAKER™ and STEALTH™ pins from TeleChem contain a stainless steel shaft with a fine point. A narrow gap is machined into the point to serve as a reservoir for sample loading and spotting. The pins have a loading volume of 0.2 μl to 0.6 μl to create spot sizes ranging from 75 μm to 360 μm in diameter.
To permit the printing of multiple arrays with a single sample loading, quill-based et al. tools, including printing capillaries, tweezers, and split pins T/US2006/043272
have been developed. These printing tools hold larger sample volumes than solid pins and therefore allow the printing of multiple arrays following a single sample loading. Quill-based arrayers withdraw a small volume of fluid into a depositing device from a microwell plate by capillary action. See Schena et a/. 1995) Science 270:467-470. The diameter of the capillary typically ranges from about 10 μm to about 100 μm. A robot then moves the head with quills to the desired location for dispensing. The quill carries the sample to all spotting locations, where a fraction of the sample is deposited. The forces acting on the fluid held in the quill must be overcome for the fluid to be released. Accelerating and then decelerating by impacting the quill on a microarray substrate accomplishes fluid release. When the tip of the quill hits the solid support, the meniscus is extended beyond the tip and transferred onto the substrate. Carrying a large volume of sample fluid minimizes spotting variability between arrays. Because tapping on the surface is required for fluid transfer, a relatively rigid support, for example a glass slide, is appropriate for this method of sample delivery.
A variation of the pin printing process is the PIN-AND-RING™ technique developed by Genetic Microsystems Inc. of Woburn, Massachusetts, United States of America. This technique involves dipping a small ring into the sample well and removing it to capture liquid in the ring. A solid pin is then pushed through the sample in the ring, and the sample trapped on the flat end of the pin is deposited onto the surface. See Mace et al. (2000) in Schena M ed, Microarray Biochip Technology, pp. 39-64, Eaton Publishing, Natick, Massachusetts, United States of America. The PIN-AND-RING™ technique is suitable for spotting onto rigid supports or soft substrates such as agar, gels, nitrocellulose, and nylon. A representative instrument that employs the PIN- AND-RING™ technique is the 417™ Arrayer available from Affymetrix, Inc. of Santa Clara, California, United States of America.
Additional procedural considerations relevant to contact printing methods, including array layout options, print area, print head configurations, sample loading, preprinting, microarray surface properties, sample solution properties, pin velocity, pin washing, printing time, reproducibility, and printing throughput are known in the art, and are summarized in Rose (2000) in Schena M ed, Microarrav Biochip Technology, pp. 19-38, Eaton Publishing, Natick, Massachusetts, United States of America.
Noncontact Ink-Jet Printing. A representative method for noncontact ink- jet printing uses a piezoelectric crystal closely apposed to the fluid reservoir. One configuration places the piezoelectric crystal in contact with a glass capillary that holds the sample fluid. The sample is drawn up into the reservoir and the crystal is biased with a voltage, which causes the crystal to deform, squeeze the capillary, and eject a small amount of fluid from the tip. Piezoelectric pumps offer the capability of controllable, fast jetting rates and consistent volume deposition. Most piezoelectric pumps are unidirectional pumps that need to be directly connected, for example by flexible capillary tubing, to a source of sample supply or wash solution. The capillary and jet orifices should be of sufficient inner diameter so that molecules are not sheared. The void volume of fluid contained in the capillary typically ranges from about 100 μl to about 500 μl and generally is not recoverable. See U.S. Patent No. 5,965,352.
Devices, that provide thermal pressure, sonic pressure, or oscillatory pressure on a liquid stream or surface can also be used for ink-jet printing. See Theriault et al. (1999) |n Schena M, ed, DNA Microarravs: A Practical Approach, pp. 101-120, Oxford University Press Inc., New York, New York, United States of America.
Syringe-Solenoid Printing. Syringe-solenoid technology combines a syringe pump with a microsolenoid valve to provide quantitative dispensing of nanoliter sample volumes. A high-resolution syringe pump is connected to both a high-speed microsolenoid valve and a reservoir through a switching valve. For printing microarrays, the system is filled with a system fluid, typically water, and the syringe is connected to the microsolenoid valve. Withdrawing the syringe causes the sample to move upward into the tip. The syringe then pressurizes the system such that opening the microsolenoid valve causes droplets to be ejected onto the surface. With this configuration, a minimum dispense volume is on the order of 4 nl to 8 nl. The positive displacement nature of the dispensing mechanism creates a substantially reliable system. See U.S. Patent Nos. 5,743,960 and 5,916,524. Electronic Addressing. This method involves placing charged molecules at specific positions on a blank microarray substrate, for example a NANOCHIP™ substrate (Nanogen Inc., San Diego, California, United States of America). A nucleic acid probe is introduced to the microchip, and the negatively-charged probe moves to the selected charged position, where it is concentrated and bound. Serial application of different probes can be performed to assemble an array of probes at distinct positions. See U.S. Patent No. 6,225,059 and PCT International Patent Application Publication No. WO 01/23082. Nanoelectrode Synthesis. An alternative array that can also be used in accordance with the methods of the presently disclosed subject matter provides ultra small structures (nanostructures) of a single or a few atomic layers synthesized on a semiconductor surface such as silicon. The nanostructures can be designed to correspond precisely to the three-dimensional shape and electro-chemical properties of molecules, and thus can be used to recognize nucleic acids of a particular nucleotide sequence. See U.S. Patent No. 6,123,819. VL Hybridization
VLA. General Considerations It is understood that in order to determine a gene expression level by hybridization, a full-length cDNA need not be employed. To determine the expression level of a gene represented by one of SEQ ID NOs: 1-9, any representative fragment or subsequence of the sequences set forth in SEQ ID NOs: 1-9 can be employed in conjunction with the hybridization conditions disclosed herein. As a result, a nucleic acid sequence used to assay a gene expression level can comprise sequences corresponding to the open reading frame (or a portion thereof), the 5' untranslated region, and/or the 3' untranslated region. It is understood that any nucleic acid sequence that allows the expression level of a reference gene to be specifically determined can be employed with the methods and compositions of the presently disclosed subject matter. P T/US2006/043272
VLB. Hybridization on a Solid Support
In some embodiments of the presently disclosed subject matter, an amplified and labeled nucleic acid sample is hybridized to probes or probe sets that are immobilized on a continuous solid support comprising a plurality of identifying positions.
Representative hybridization conditions are set forth herein. For some high-density glass-based microarray experiments, hybridization at 65°C is too stringent for typical use, at least in part because the presence of fluorescent labels destabilizes the nucleic acid duplexes (Randolph & Waggoner (1997) Nucleic Acids Res 25:2923-2929). Alternatively, hybridization can be performed in a formamide-based hybridization buffer as described in Pietu etal. (1996) Genome Res 6:492-503.
A microarray format can be selected for use based on its suitability for electrochemical-enhanced hybridization. Provision of an electric current to the microarray, or to one or more discrete positions on the microarray facilitates localization of a target nucleic acid sample near probes immobilized on the microarray surface. Concentration of target nucleic acid near arrayed probe accelerates hybridization of a nucleic acid of the sample to a probe. Further, electronic stringency control allows the removal of unbound and nonspecifically bound DNA after hybridization. See U.S. Patent Nos. 6,017,696 and
6,245,508.
Vl. C. Hybridization in Solution
In some embodiments of the presently disclosed subject matter, an amplified and labeled nucleic acid sample is hybridized to one or more probes in solution. Representative stringent hybridization conditions for complementary nucleic acids having more than about 100 complementary residues are overnight hybridization in 50% formamide with 1 mg of heparin at 42°C. An example of highly stringent wash conditions is 15 minutes in 0.1X SSC, 5M NaCI at 65°C. An example of stringent wash conditions is 15 minutes in 0.2X SSC buffer at 65°C {See Sambrook & Russell (2001) Sambrook & Russell (2001) Molecular Cloning: A Laboratory Manual, 3rd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, United States of America, for a description of SSC buffer). A high stringency wash can be preceded by a low stringency wash to remove background probe signal. An example of medium stringency wash conditions for a duplex of more than about 100 nucleotides, is 15 minutes in 1X SSC at 45°C. An example of low stringency wash for a duplex of more than about 100 nucleotides, is 15 minutes in 4-6X SSC at 400C. Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide.
For short probes (e.g., about 10 to 50 nucleotides), stringent conditions typically involve salt concentrations of less than about 1 M Na+ ion, typically about 0.01 M to 1 M Na+ ion concentration (or other salts) at pH 7.0-8.3, and the temperature is typically at least about 3O0C.
Optionally, nucleic acid duplexes or hybrids can be captured from the solution for subsequent analysis, including detection assays. For example, in a simple assay, a single probe set is hybridized to an amplified and labeled RNA sample derived from a target nucleic acid sample. Following hybridization, an antibody that recognizes DNA: RNA hybrids is used to precipitate the hybrids for subsequent analysis. The expression level of the gene is determined by detection of the label in the precipitate.
Alternate capture techniques can be used as will be understood to one of skill in the art, for example, purification by a metal affinity column when using probes comprising a histidine tag. As another example, the hybridized sample can be hydrolyzed by alkaline treatment wherein the double-stranded hybrids are protected while non-hybridizing single-stranded template and excess probe are hydrolyzed. The hybrids are then collected using any nucleic acid purification technique for further analysis. To determine the expression levels of multiple genes simultaneously, probes or probe sets can be distinguished by differential labeling of probes or probe sets. Alternatively, probes or probe sets can be spatially separated in different hybridization vessels. Representative embodiments of each approach are described herein below. In some embodiments, a probe or probe set having a unique label is prepared for each gene to be analyzed. For example, a first probe or probe set can be labeled with a first fluorescent label, and a second probe or probe set can be labeled with a second fluorescent label. Multi-labeling experiments should consider label characteristics and detection techniques to optimize detection of each label. Representative first and second fluorescent labels are Cy3 and Cy5 (Amersham Pharmacia Biotech, Piscataway, New Jersey, United States of America), which can be analyzed with good contrast and minimal signal leakage.
A unique label for each probe or probe set can further comprise a labeled microsphere to which a probe or probe set is attached. A representative system is LabMAP (Luminex Corporation, Austin, Texas, United States of America). Briefly, LabMAP (Laboratory Multiple Analyte Profiling) technology involves performing molecular reactions, including hybridization reactions, on the surface of color-coded microscopic beads called microspheres. When used in accordance with the methods of the presently disclosed subject matter, an individual probe or probe set is attached to beads having a single color-code such that they can be identified throughout the assay. Successful hybridization is measured using a detectable label of the amplified nucleic acid sample, wherein the detectable label can be distinguished from each color-code used to identify individual microspheres. Following hybridization of the amplified, labeled nucleic acid sample with a set of microspheres comprising probe sets, the hybridization mixture is analyzed to detect the signal of the color-code as well as the label of a sample nucleic acid bound to the microsphere. See Vignali (2000) J Immunol Methods 243:243- 255; Smith et a/. (1998) CHn Chem 44:2054-2056; PCT International Patent Application Publication Nos. WO 01/13120, WO 01/14589, WO 99/19515, and WO 97/14028. VIL Detection
Methods for detecting a hybridization duplex or triplex are selected according to the label employed.
In the case of a radioactive label (e.g., 32P-, 33P-, or 35S-dNTP) detection can be accomplished by autoradiography or by using a phosphorimager as is known to one of skill in the art. In some embodiments, a detection method can be automated and is adapted for simultaneous detection of numerous samples. Common research equipment has been developed to perform high- throughput fluorescence detecting, including instruments from GSI Lumonics (Watertown, Massachusetts, United States of America), Amersham Pharmacia Biotech/Molecular Dynamics (Sunnyvale, California, United States of America), Applied Precision Inc. (Issauah, Washington, United States of America), Genomic Solutions Inc. (Ann Arbor, Michigan, United States of America), Genetic Microsystems Inc. (Woburn, Massachusetts, United States of America), Axon (Foster City, California, United States of America), Hewlett Packard (Palo Alto, California, United States of America), and Virtek (Woburn, Massachusetts, United States of America). Most of the commercial systems use some form of scanning technology with photomultiplier tube detection. Criteria for consideration when analyzing fluorescent samples are summarized by Alexay et al. (1996) in Menzel ER, ed, Fluorescence Detection IV. Proc SPIE 2705:63-72.
In some embodiments, a nucleic acid sample or probes are labeled with far infrared, near infrared, or infrared fluorescent dyes. Following hybridization, the mixture of amplified nucleic acids and probes is scanned photoelectrical^ with a laser diode and a sensor, wherein the laser scans with scanning light at a wavelength within the absorbance spectrum of the fluorescent label, and light is sensed at the emission wavelength of the label. See U.S. Patent Nos. 6,086,737; 5,571 ,388; 5,346,603; 5,534,125; 5,360,523; 5,230,781 ; 5,207,880; and 4,729,947. An ODYSSEY™ infrared imaging system (Li-Cor, Inc., Lincoln, Nebraska, United States of America) can be used for data collection and analysis.
If an epitope label has been used, a protein or compound that binds the epitope can be used to detect the epitope. For example, an enzyme-linked protein can be subsequently detected by development of a colorimetric or luminescent reaction product that is measurable using a spectrophotometer or luminometer, respectively.
In some embodiments, INVADER® technology (Third Wave Technologies, Madison, Wisconsin, United States of America) is used to detect target nucleic acid/probe complexes. Briefly, a nucleic acid cleavage site (such as that recognized by a variety of enzymes having 5' nuclease activity) is created on a target sequence, and the target sequence is cleaved in a site-specific manner, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. See U.S. Patent Nos. 5,846,717; 5,985,557; 5,994,069; 6,001 ,567; and 6,090,543.
In some embodiments, target nucleic acid/probe complexes are detected using an amplifying molecule, for example a poly-dA oligonucleotide as described in Lisle etal. (2001 ) Biotechniques 30:1268-1272. Briefly, a tethered probe is employed against a target nucleic acid having a complementary nucleotide sequence. A target nucleic acid having a poly-dT sequence, which can be added to any nucleic acid sequence using methods known to one of skill in the art, hybridizes with an amplifying molecule comprising a poly-dA oligonucleotide. Short oligo-dT40 signaling moieties are labeled with any suitable label (e.g., fluorescent, chemiluminescent, radioisotopic labels). The short oligo-dT40 signaling moieties are subsequently hybridized along the molecule, and the label is detected.
Surface plasmon resonance spectroscopy can also be used to detect hybridization duplexes formed between a randomly amplified nucleic acid and a probe as disclosed herein. See e.g., Heaton etal. (2001 ) Proc Natl Acad Sci U S A 98:3701-3704; Nelson et a/. (2001) Anal Chem 73:1-7; Guedon et a/. (2000) Anal Chem 72:6003-6009. VIII. Gene Expression Equations Employed for Diagnosis of MS Syndromes VIII.A. General Description of the Equations
Genes that were the differentially expressed in patients with a multiple sclerosis syndrome compared to a control population were chosen to determine if they could be used to classify individuals with a multiple sclerosis syndrome. The genes that were employed include those listed in Table 1.
Table 1 Genes Used in the Equations
Gene SEQ ID Symbol Gene Name NO:
ACTR1A ARP1 actin-related protein 1 homolog A, centractin alpha (yeast) BRCA1 breast cancer 1 , early onset, transcript variant
BRCAIa
CTSS cathepsin S 3 EPHX2 Epoxide hydrolase 2, cytoplasmic 4 LLGL2 Lethal giant larvae homolog 2 5 SPIB Spi-B transcription factor (Spi-1/PU.1 related 6 TAF11 TAF11 RNA polymerase II, TATA box binding 7 protein-associated factor, 28 kilodalton
TGM2 transglutaminase 2 8 TP53 human tumor protein p53 9
Additional genes and gene products that can be employed in the methods of the presently disclosed subject matter include, but are not limited to the following: ANP32B/SSP29 (GENBANK® Accession No.: NM_006401 ; SEQ ID NO: 10); TNFAIP2 (GEN BANK® Accession No.: NM_006291 ; SEQ ID NO: 11 ); SIP1 (GENBANK® Accession No.: NMJ303616; SEQ ID NO: 12); BPHL (GENBANK® Accession No.: NMJD04332; SEQ ID NO: 13); CCDC85B (GENBANK® Accession No.: NM_006848; SEQ ID NO: 14); ASL (GENBANK® Accession No.: NM_000048; SEQ ID NO: 15); GNB5 (GENBANK® Accession No.: NM_000048; SEQ ID NO: 16); MAN1A1 (GENBANK® Accession No.: NM_005907; SEQ ID NO: 17); XDH (GENBANK® Accession No.: NM_000379); SEQ ID NO: 18); TMBIM4 (GENBANK® Accession No.: NM_016056; SEQ ID NO: 19); BMP8B (GENBANK® Accession No.: NM_001720; SEQ ID NO: 20); CYB5B (GENBANK® Accession No.: NM_030579; SEQ ID NO: 21 ); ORCIL (GENBANK® Accession No.: NM_004153; SEQ ID NO: 22); CDH1 (GENBANK® Accession No.: NM_004360; SEQ ID NO: 24); RIOK3 (GENBANK® Accession No.: NM_003831; SEQ ID NO: 25); STOM (GENBANK® Accession No.: NM_004099; SEQ ID NO: 26); CDKN1B (GENBANK® Accession No.: NM_004064; SEQ ID NO: 27); CASP6 (GENBANK® Accession No.: NM_001226; SEQ ID NO: 28); TXK (GENBANK® Accession No.: NM_003328; SEQ ID NO: 29); MYO1C (GENBANK® Accession No.: NM_033375; SEQ ID NO: 30); LIF (GENBANK® Accession No.: NM_002309; SEQ ID NO: 31 ); DNAJA1 (GENBANK® Accession No.: NM_001539; SEQ ID NO: 32); GUCY1B3 (GENBANK® Accession No.: NM_000857; SEQ ID NO: 33); AP3S2 (GENBANK® Accession No.: NM_005829; SEQ ID NO: 34); RTN4 (GENBANK® Accession No.: NM_007008; SEQ ID NO: 35); SC65 (GENBANK® Accession No.: NM_006455; SEQ ID NO: 36); UBE2G2 (GENBANK® Accession No.: NM_003343; SEQ ID NO: 37); SLC16A4 (GENBANK® Accession No.: NM_004696; SEQ ID NO: 38); and MMP17 (GENBANK® Accession No.: NM_016155; SEQ ID NO: 39). It is understood that any of these genes and gene products, alone or in combinations, can be employed in the practice of the presently disclosed subject matter. Thus, the presently disclosed methods can consider the differential expression of 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, or 39 of these genes
VIII.B. Use of the Equations to Predict the Presence of an MS Syndrome
The expression level of the genes listed in Table 1 was determined as described herein. Computer-based methods were employed for identifying which ratios of gene expression could be employed discriminating subjects with an MS syndrome from subjects without an MS syndrome. The analysis included numbers of genes to include in any single equation that was limited to between 1 and 5.
The computational method generated several different equations as set forth in EXAMPLE 4 below. Summarily, the following 1-, 2-, 3-, and 4-ratio equations were found to accurately discriminate subjects with an MS syndrome from subjects that did not have an MS syndrome: CTSS a.
TAFU BRCAI x CTSS b.
TAFU 2
CTSSx LSLLI x SPIB c. ; and
Σ4Fl l3 χ l000
CTSS x LGLL22 x TGM2 TAFU2 x TP532 wherein each term in the equations is a gene expression level as defined herein. I)C Kits
The presently disclosed subject matter also provides kits comprising one or more reagents for performing the presently disclosed methods. In some embodiments, the kits comprise a plurality of oligonucleotide primers and instructions for employing the plurality of oligonucleotide primers to determine the expression level of at least one, two, three, four, five, six, seven, eight, or all nine of the genes represented by SEQ ID NOs: 1-9. In some embodiments, the kits also comprise a plurality of oligonucleotide primers that can be employed for determining an expression level for one or more additional genes such as a control gene. X1 Computer Program Products and Computer-implemented Methods
The presently disclosed subject matter provides computer program products comprising computer-executable instructions embodied in a computer- readable medium. In some embodiments, the computer program products perform steps comprising (a) acquiring an first input data set comprising a plurality of first gene expression levels, each of the plurality of first gene expression levels corresponding to an expression level of a gene product in a control population of subjects; (b) acquiring a second input data set comprising a plurality of second gene expression levels, each of the plurality of second gene expression levels corresponding to an expression level of a gene product in a test population of subjects; (c) calculating a first deterministic series of ratios between and among various combinations of the first gene expression levels and a second deterministic series of ratios between and among various combinations of the second gene expression levels; and (d) identifying one or more ratio values that differ in the first deterministic series of ratios from one ore more related ratio values in the second deterministic series of ratios to a degree sufficient that the one or more ratios can be used predict whether an uncharacterized subject would be appropriately characterized as being a member of the first population of subjects or the second population of subjects.
In some embodiments, the first population is a population of subjects that do not have a multiple sclerosis syndrome and the second population is a population of subjects that do have a multiple sclerosis syndrome. In some embodiments, the plurality of first gene expression levels and the plurality of second gene expression levels correspond to one or more of the genes represented by SEQ ID NOs: 1-9.
The presently disclosed subject matter also provides methods for assigning an uncharacterized subject to one of two populations of subjects. In some embodiments, the methods comprise (a) acquiring an first input data set comprising a plurality of first gene expression levels, each of the plurality of first gene expression levels corresponding to an expression level of a gene product in a first population of subjects; (b) acquiring a second input data set comprising a plurality of second gene expression levels, each of the plurality of second gene expression levels corresponding to an expression level of a gene product in a second population of subjects; (c) calculating a first deterministic series of ratios between and among various combinations of the first gene expression levels and a second deterministic series of ratios between and among various combinations of the second gene expression levels; and (d) identifying one or more ratio values that differ in the first deterministic series of ratios from one or more related ratio values in the second deterministic series of ratios to a degree sufficient that the one or more ratios can be used predict whether an uncharacterized subject would be appropriately characterized as being a member of the first population of subjects or the second population of subjects.
In some embodiments, the first population is a population of subjects that do not have a multiple sclerosis syndrome and the second population is a population of subjects that do have a multiple sclerosis syndrome. In some embodiments, the plurality of first gene expression levels and the plurality of second gene expression levels correspond to one or more of the genes represented by SEQ ID NOs: 1-9.
The first and second data sets can include expression data for any number of genes from any number of subjects. For example, the gene expression data can be data resulting from gene expression analysis techniques as disclosed herein, and can include, but are not limited to data generated by RT-PCR (e.g., Q-RT-PCR), from Northern blots, and from analyses of gene arrays (e.g., gene chips, filter-based arrays, etc.). In some embodiments, the gene expression data is normalized, for example, to an expression level of a control gene such as a housekeeping gene. Genes that can be employed as control genes are known in the art and include, but are not limited to /?-actin (ACTB), aldolase A, fructose-bisphosphate (ALDOA), glyceraldehyde-3-phosphate dehydrogenase (GAPD), phosphoglycerate kinase 1 (PGK1 ), and lactate dehydrogenase A (LDHA). Other such genes are set forth in Eisenberg & Levanon (2003) Trends in Genetics 19:362-365.
EXAMPLES
The following Examples provide illustrative embodiments. In light of the present disclosure and the general level of skill in the art, those of skill will appreciate that the following Examples are intended to be exemplary only and that numerous changes, modifications, and alterations can be employed without departing from the scope of the presently disclosed subject matter.
EXAMPLE 1 Subjects
A total of 179 subjects were analyzed in this study (see Table 2). The control group was age/gender matched, and ascertained for absence of diagnosed autoimmune disease or symptoms by interview. To examine an unbiased cohort of patients, the single criterion of diagnosis of a disease using established methods by a specialist in the field was employed for inclusion in the study. The MS patients were further classified into relapsing remitting (RRMS)1 primary progressive (PPMS), secondary progressive (SPMS), and pre-MS (clinically isolated syndrome, CIS) disease sub-types. An initial group of 29 patients with MS (A in Table 2) and a second independent group of 26 patients with MS (B in Table 2) were analyzed. The Vanderbilt University Institutional Review Board (Nashville, Tennessee, United States of America) approved this protocol and each patient provided written consent.
Table 2
Clinical Characteristics of Patients Analyzed
Study Group No. of Subjects Age Range (Yr) Female:Male Ratio
A
Controls 46 22-58 3.2:1
MS-1 29 26-53 3.6:1
- RRMS 13
- SPMS 14
- PPMS 1
- Pre-MS
1 (CIS)
B
MS-2 26 29-48 2.7:1
- RRMS 12
12 - SPMS 12 12 - PPMS 1
I
- Pre-MS
I
(CIS)
RA 35 42-69 2.8:1
SLE 21 33-57 4.2:1
Other Diseases 22 22-69 2.6:1
Cumulative 179 22-69 3.0:1
RRMS=relapsing remitting MS; SPMS=secondary progressive MS; PPMS=primary progressive MS; CIS=clinically isolated syndrome; Other= optic neuritis (N=4), inflammatory arthritis (N = 3), type 1 diabetes (N = 5), type 2 diabetes (N = 5), cardiovascular disease (N = 5). EXAMPLE 2 Sample Isolation, Reverse Transcription, and Expression Level Analysis
Peripheral blood was collected into PAXGENE™ tubes (Qiagen Inc.,
Valencia, California, United States of America). Total RNA was isolated using the Versagene PAXGENE™-compatible isolation kit according to the manufacturer's recommendations (Gentra Systems, Inc., Minneapolis,
Minnesota). Two μg of RNA was reverse-transcribed using the
SUPERSCRIPT™ III first-strand cDNA synthesis kit (Invitrogen Corporation,
Carlsbad, California, United States of America). For each sample, cDNA equivalent to 100 ng of total RNA was used in replicate Q-RT-PCR reactions for each gene assay. Patient clinical history was blinded during processing and data collection.
The relative expression levels of 9 genes were determined using
TAQMAN® gene expression assays (Applied Biosystems, Inc., Foster City, California) and detected on an ABI7700/SDS platform (Applied Biosystems,
Inc). The 9 genes assayed were as follows: ACTR1 A (SEQ ID NO: 1 ); BRCA1
(SEQ ID NO: 2); CTSS (SEQ ID NO: 3); EPHX2 (SEQ ID NO: 4); LLGL2 (SEQ
ID NO: 5); SPIB (SEQ ID NO: 6); TAF11 (SEQ ID NO: 7); TGM2 (SEQ ID NO:
8); and TP53 (SEQ ID NO: 9). Relative expression levels were determined from the observed Cτ data. Each Cτ was subtracted from 30, such that (30 -
CT) = X, and 2X calculated the linear expression value.
EXAMPLE 3 Statistical Analysis A computer program was designed to identify the most discriminatory combination of ratios (ranging between 1 and 5). All gene expression ratios (e.g., ACTR1A/BRCA1, TAF11/ACTR1A, etc) in the control and MS data sets were searched to first find an optimal ratio. The search was entirely deterministic, since every possible ratio using two gene expression levels was computed. The value of a test ratio was used to separate the MS data set from the control set. For each control individual and MS individual the test ratio was computed. {C1 ,C2,...,Cn} denoted the test ratios for each of the n control individuals and {MS1 ,MS2,...,MSk} denoted the test ratios for each of the k MS individuals. Perfect separation of the two sets (an optimal ratio) would occur if the largest test ratio for the control individuals was less than the smallest test ratio for the MS individuals, but this optimal ratio was not identified.
However, test ratios where the majority of the MS individuals scored greater than the second largest control individual were identified. Therefore, the optimal test ratio separated the two data sets such that second largest ratio in {C1 ,C2,...,Cn} was less than the largest number of ratios in {MS1 ,MS2,...,MSk}. This optimal ratio was used to identify a cutoff value that produced the highest sensitivity and specificity for the two data sets. This was accomplished by plotting sensitivity and specificity curves as functions of the cutoff value and identifying the intersection of the curves. The cutoff value at this intersection was designated the optimal ratio discriminator.
Sensitivity and specificity of the discriminators were determined using standard calculations. Overall test accuracy was calculated as follows: let TP = true positives; FP = false positives; TN = true negatives; FN = false negatives then:
TP + TN Accuracy =
TP + FP + TN + FN '
Receiver operating characteristics (ROC) curves were used to examine performance characteristics of the tests. Briefly, the number of true positives (TP) and false positives (FP) were determined for a range of cutoff scores. The fraction of TP (TPF) was determined by dividing the TP by the total number of cases and the fraction of FP (FPF) was determined by dividing the FP by the total number of controls. A nonlinear regression curve was calculated using a function in Mathematica's Statistics' NonlinearFit package (Wolfram Research, Inc., Champaign, Illinois, United States of America). The nonlinear regression was integrated to determine the area under the curve (AUC). The significance of observed differences (P) was determined using the Mann-Whitney test.
EXAMPLE 4 Initial Discriminant Analysis
Initial analysis focused on identifying a pattern of gene expression that discriminated MS and control subjects. The data set comprised 29 patients with different clinical forms of MS (see Table 2) and 49 controls subjects. Nine genes were selected from the microarray data set for analysis (see EXAMPLE 2) and examined their expression profile using TAQMAN® gene expression assays. Genes were selected from the microarray data set for which expression level did not vary (control genes: LLGL2, CTSS, TGM2), and for which expression level varied significantly among control and autoimmune subjects (test genes: ACTR1A, BRCA1, EPHX2, SPIB, TAF11, and TP53).
Initially, two approaches were employed to determine relative gene expression levels. The first was by plasmid standard curve analysis. The second was to determine the relative expression level of a given gene from the
CT. Because consistent patterns of normalized gene expression were observed with each method of quantitation (plasmid standard curve analysis versus linear), linear expression values were used in all subsequent analyses because this approach removed one variable from the overall analysis. The average expression values were determined for each gene in the
MS and control group (see Table 3). A wide range of individual gene expression levels was observed from the lowest value of 0.35 {BRCA1), to the highest value of 22,851 (CTSS). The average expression values were significantly different between the MS and control groups for 5 genes (see Table 3).
Table 3 Average Relative Linear Expression Values
Subject Cohort Gene Symbol Control MS MS/C P value
ACTR1A 127 ± 48* 71 ± 31 0.56
BRCA1 7 ± 4 5 ± 4 0.69 0.03
CTSS 3882 ± 3102 1310 ± 3935 1.12 NS
EPHX2 24 ± 11 12 ± 7 0.51 2e 7
LLGL2 91 ± 41 78 ± 75 0.85 NS
SPIB 126 ± 63 96 ± 52 0.76 0.02
TAF11 38 ± 18 14 + 11 0.38 5e 10
TGM2 36 ± 45 31 ± 41 0.85 NS
TP53 610 ± 250 487 ± 416 0.80 NS Results are average linear expression values ± standard deviation from a total of 46 control subjects and 29 MS subjects. P values were determined by the Mann-Whitney test. NS = not significant.
Linear expression values were entered into a computer program, which searched all possible 1-, 2-, 3-, and 4- ratio combinations of gene expression levels. These ratio combinations, or discriminators, generated individual scores, which were then analyzed to determine sensitivity and specificity (see
Table 4). The sensitivity and specificity of discriminators increased with the number of ratios. For example, a one-ratio combination correctly identified 86% of the patients with MS with a specificity of 91 %; the best two-ratio combination identified 83% of the MS patients with a specificity of 93%. The best discriminator was derived from a 4-ratio combination, [CTSS X LLGL22 X
TGM2] I [TAF1 f X TP532]. With this combination, the average MS score was
17.5 ± 15.0 and the average control score was 2.2 ± 4.1 (P < 0.0001). Discriminators with 5 ratios produced identical specificity and sensitivity to the best 4-ratio discriminator.
Table 4
Discriminator Performance with Increasing Components Ratios Test Equation Average Score Sens. Spec. P Value
Control MS 1 -ratio CTSS 19 ± 35 ± 86% 91% <0.0001
TAF11 11 10
2-ratio BRCAUCTSS 20 ± 131 ± 83% 93% <0.0001
TAFIf 17 107
3-ratio CTSSxLLGL2xSPIB 1.2 ± 19.9 ± 93% 93% 0.0002
TAFIf (WOO) 2.8 23.5
4-ratio CTSSxLLGL22xTGM2 2.2 ± 17.5 ± 93% 98% <0.0001
TAF112xTP532 4.1 15.0
Scores were determined for each test equation as outlined in methods and are expressed as average score ± standard deviation. Sensitivity (Sens.) and specificity (Spec.) were determined by standard calculations as described in the methods section. P values were determined using the Mann-Whitney test.
At each stage of the analysis, the computer program identified more than one 1-, 2-, 3-, or 4-ratio discriminator that performed with equal sensitivity and specificity. For example, at the 1 -component stage, several genes functioned equally well in the numerator but TAF11 was always in the denominator, consistent with its low expression value in the MS cohort. Several 4-ratio discriminators also performed equally well and these varied in two ways. First,
"5 several genes in the numerator yielded equivalent sensitivity and specificity. The two genes in the denominator were always TAF 11 and TP53. However, ratios were identified that used {TAF113 X TP53) and ratios that used {TAF112 X TP532).
MS scores ranged from 0.6 - 69.1 , while control scores ranged between 0 0.2 and 27.7 (see Figure 1A; MS-1). One control subject received a score of 27.7, which was 5.6 times higher than the next highest control score of 4.9, and 12.6 fold higher than the average control score of 2.2. The initial results were validated using only the best 4-ratio discriminator [CTSS X LLGLf X TGM2] I [TAF112 X TP532] by determining gene expression levels in whole blood from 5 an independent cohort of 26 MS patients (see Figure 1 A; MS-2) and performing the same analysis to produce scores. The average score of the second group was 14.4 ± 13.6 and the sensitivity was 88%. Combined sensitivity of the test and validation groups (MS-1 and MS-2) was 91 %.
Different parameters were evaluated by varying the threshold or cut-off 0 score. A cutoff or threshold of 5.2 classified 48 out of 49 controls as non-MS (98 % specificity) and correctly identified 50 of 55 MS patients as disease- positive (91% sensitivity). The highest overall accuracy (96%; see Figure 1C) was reached with a cut-off or threshold of 5.2.
5 EXAMPLE 5
Analysis of the 4-Ratio Test Equation in Different Autoimmune Diseases
The 4-ratio discriminator, [CTSSX LLGLt X TGM2] I [TAF112 X TP532], was applied to individuals with different autoimmune and other chronic diseases. The same parameters of performance were evaluated. The scores 0 of the different disease groups showed a greater degree of overlap with each other (see Figure 2A). Using a designated cutoff of 5.2, 31 % of RA patients (11 of 35), 24% of SLE patients (5 of 21 ), and 9% of patients with other diseases (2 of 23) scored positive in the test. All patients scoring positive in the test had a known autoimmune disease. Therefore, the overall specificity was reduced to 85% using a cutoff of 5.2 (see Figure 2B). The overall accuracy was 85% when all subjects were included in the analysis (see Figure 2C).
The performance of the 4-ratio discriminator [(CTSSX LLGL22 X TGM2) / (TAF112 X TP532)] was also evaluated using the receiver operating characteristics (ROC) curve. To determine ROCs, the TPF and the FPF were compared using the MS and control samples applying the best 1-, 2-, 3-, and 4- ratio discriminators (from Table 4 and Figure 1A). This comparison yielded an area under the curve (AUC) of 0.96 fro the 4-ratio discriminator (see Figure 3A). AUCs for the 3-, 2-, and 1 -ratio discriminators were less than the AUC for the 4-ratio discriminator. Next, the TPF and FPF were determined using the MS samples and all other samples (combined data from Figure 1A and 2A). This comparison produced an AUC of 0.89 for the 4-ratio discriminator (see Figure 3B). AUC for the 3-, 2-, and 1 -ratio discriminators were calculated and it was determined that the AUC was less than observed for the 4-ratio discriminator, as were overall sensitivities and specificities (see Table 4). Thus, AUC, sensitivity, specificity, and overall accuracy decreased when all samples, controls and those with other diseases, including other autoimmune diseases, were included in the analysis. The MS cohort comprised individuals with different MS sub-types (see
Table 2). However, no scoring pattern could be identified that correlated with a sub-type (see Figure 4). At a threshold of 5.2, two patients with clinically definite MS received negative scores. One patient with SPMS received a score of 4.2, and one RRMS patient scored 0.6, one of the lowest scores observed in any group.
EXAMPLE 6
Effects of Different Treatments on Scores
Since this was a retrospective analysis, all MS patients were under a clinician's care. Common therapies for MS include beta-interferons, Copaxone, methotrexate, and prednisone. Therefore, whether scores varied among MS patients receiving different therapies was investigated. Average score ± standard deviation for the control group was compared to all MS patients, and the MS patients were separated into those receiving a beta-interferon (N = 19), MS patients not on a beta-interferon (N = 33) and patients receiving Copaxone (N = 9). As shown in Figure 5, average scores in the different treatment groups ranged from 16.6 to 25.9. These differences were not statistically significant (P > 0.05). However, the difference between each treatment group and the control group was statistically significant. These results clearly demonstrate that test scores are independent of a specific therapy and therefore support the notion that scores are dependent upon the presence or absence of MS or another autoimmune disease. Discussion of EXAMPLES 1-6
In the present Examples, the results from a genome-wide microarray analysis were employed to design a sensitive and specific Q-RT-PCR-based assay capable of distinguishing subjects with MS from control subjects and subjects with other chronic diseases including autoimmune diseases. This assay discriminates individuals with MS from controls with a specificity of 98% and an overall sensitivity of 91%. This level of specificity and sensitivity is among the highest reported in laboratory testing for MS (specificity -95% for MRI; -85% for OGCB). Unlike microarray analysis, this Q-RT-PCR assay can easily be adapted into a clinical molecular genetics laboratory. The search algorithm was designed with three desirable goals in mind.
First, it was reasoned that if expression ratios of two genes rather than expression levels of a single gene were searched, this would control for any sample-to-sample variations such as differences in cDNA amounts or integrity. Second, by multiplying component ratios rather than using another mathematical function, each gene would receive equal weight in the analysis. Third, it was desirable that the search be capable of being performed with a personal computer. The parameters were set so that the algorithm would identify optimum one-, two-, three-, four-, and five-ratio discriminators based upon sensitivity and specificity. At each stage, more than one ratio was identified that yielded equivalent sensitivities, specificities, and P values. Only one ratio was reported at each stage. Several four-ratio discriminators yield identical sensitivity, specificity, and P value but these only vary slightly by having a different control gene in the numerator or by the function, for example TAF11Z x TP53 in the denominator works as well as TAF112 x TP532.
Five-ratio discriminators were also searched, but these ratios did not improve sensitivity or specificity. In fact, the performance of the 3-, 2- or 1 -ratio discriminators were quite acceptable even when compared to the 4-ratio discriminator. A larger sample size would be necessary to clearly establish, for example, whether 93% sensitivity or 98% specificity (4-ratios) is statistically different from 86% sensitivity or 91 % specificity (1 -ratio), respectively. Second, analysis of additional patient cohorts or prospective rather than retrospective analyses might demonstrate that a 4-ratio discriminator is or is not superior to a
3-ratio discriminator. Nevertheless, using this approach, it is possible to distinguish subjects with MS from control subjects with a high degree of accuracy based upon expression levels of at most five genes in whole blood.
In previous microarray analyses, the instant co-inventors had identified a conserved pattern of gene expression in individuals with different forms of autoimmune disease. Using these data, it was possible to design a scoring system that accurately discriminated subjects with an autoimmune disease from control subjects or from subjects undergoing an immune response after influenza vaccination. When the data set from the Q-RT-PCR assays was analyzed with different autoimmune subjects included, the discriminator still performed with a specificity of 87%, NDP of 98%, and overall accuracy of 86%. The greatest degree of overlap was identified with RA patients, but it is important to note that distinguishing between MS and RA is not traditionally a clinical dilemma. In the area of molecular diagnosis, much attention has been directed towards the development of optimal Q-RT-PCR assays and appropriate endogenous or control gene selection, but no standardized method exists to evaluate the data. Disclosed herein are a group of genes that were either under-expressed (test genes) or did not vary between control and autoimmune subjects (control genes) based upon microarray results. Standard control genes, such as GAPDH or ACTB, were not employed because they showed some degree of statistically significant difference in microarray datasets between the subject groups. Rather, a computer algorithm was developed to randomly and exhaustively search all possible combinations of gene expression ratios and evaluated a series of optimal 1-, 2-, 3- and 4-ratio discriminators using expression values of 2-5 genes. In the analysis, TAF11 was the most under-expressed gene in the majority of MS patients and was represented in all the component ratios. TP53 was moderately underexpressed on average in the MS population and its inclusion in the discriminator did not improve sensitivity. Its inclusion did increase specificity from 93% to 98%, however.
TAF11 encodes a small subunit of transcription factor ND that is present in all TFIID complexes and interacts with TATA-binding protein. Its function in mammalian systems is not well understood. Its function is better understood in yeast. In yeast, promoters of genes have been grouped based upon their interactions with TAFs such that deletion or mutation of an individual TAF can alter transcription of a class of genes and change the transcriptional profile of a cell. Therefore, reduced TAF11 expression could have a pleiotropic effect altering normal transcriptional regulation of multiple genes involved in the MS phenotype.
TP53 encodes the tumor suppressor protein, p53, which regulates cell proliferation, DNA damage/repair, and apoptosis. Lymphocytes from RA and MS patients have reduced TP53 transcript levels, p53 protein levels, and defects in lymphocyte apoptosis induced by gamma radiation, a process known to be dependent upon p53. Defects in apoptosis are hypothesized to contribute to autoimmunity.
These alterations in gene expression might result from the disease process, might reflect a family or genetic trait, or other factors might lead to these differences in gene expression. Disclosed herein is the discovery that MS patients demonstrated similar scoring patterns independent of treatment status or current therapy regimen. Scoring patterns that segregated the different subtypes were not identified. However, the pre-MS (CIS) patients did receive positive scores. If assay-positive pre-MS patients develop MS over time, these results could contribute to earlier clinical intervention. An early- onset differential pattern of expression could support a contributory role for these identified genes in disease pathogenesis. in summary, disclosed herein are a sensitive and highly specific Q-RT- PCR assay and a mathematic approach to evaluate the data. The assay allowed for the discrimination of patients with MS from controls and other autoimmune diseases. For MS, this approach offers a non-invasive, rapid test that provides diagnostic utility to assist in the clinical decision making process for this complex and challenging disease.
EXAMPLE 7 Sample Preparation Peripheral blood mononuclear cells (PBMC) are isolated from heparinized blood drawn from a subject by centrifugation on a Ficoll-Hypaque (Sigma-Aldrich, St. Louis, Missouri, United States of America) gradient. Leukocyte distribution in PBMC was determined by flow cytometry. Total RNA is isolated with TRI REAGENT® according to the manufacturer's protocol (Molecular Research Center, Cincinnati, Ohio, United States of America).
RNA Labeling. RNA labeling includes three steps: priming, elongation, and probe purification. For priming, 1-10 μg of total RNA (in a volume of less than 8.0 μl diethylpyrocarbonate (DEPC)-treated water) and 2.0 μg oligo-dT (10-20 mer mixture; 1 μg/μl) are mixed in a total volume of 10 μl (balance DEPC-treated water) in a 1.5 ml microcentrifuge tube. The tube is placed at 70°C for 10 minutes and then briefly chilled on ice. For elongation, 6.0 μl 5x First Strand Buffer (Invitrogen catalogue number Y00146), 1.0 μl 0.1 M DTT, 1.5 μl dNTP mixture (each dNTP at 20 mM), and 1.5 μl SUPERSCRIPT™ Il reverse transcriptase (Invitrogen) is added to the microcentrifuge tube. 10 μl 33P-dCTP (10 mCi/ml; specific activity 3000 Ci/mmol; ICN Biomedicals Inc., Irvine, California, United States of America) is added to the microcentrifuge tube, the contents mixed thoroughly, and the tube is incubated at 37°C for 90 minutes. Probe purification is accomplished by passing the elongation reaction mixture through a Bio-Spin 6 chromatography column (Bio-Rad Laboratories, Hercules, California, United States of America).
Hybridization of the Labeled RNA to the Membrane. 5 μg of 33P-labeled total RNA isolated from PBMCs is hybridized to GF211 GENEFILTERS® membranes (RESGEN™, a division of Invitrogen Corporation, Carlsbad, California, United States of America; the genes present on the GF211 membrane can be found at RESGEN™'s ftp site). Prior to hybridization, the filter is pre-treated with 0.5% SDS. The SDS solution is heated to boiling and poured over the membrane, which is then incubated in the SDS solution with gentle agitation for 5 minutes.
After pre-treatment, the filter is prehybridized by placing the filter in a hybridization roller tube (35 x 150 mm; DNA side facing the interior of the tube) and 5 ml MICROHYB™ solution (RESGEN™) is added to the tube. Additional blocking agents (5 μg COT-1® DNA, Invitrogen Corporation, Carlsbad, California, United States of America; 5 μg poly-dA) are added and the tube is vortexed to mix thoroughly. Bubbles between the membrane and the tube can be removed and the membranes is incubated in the prehybridization solution at 42°C for at least 2 hours. For hybridization, the probe is denatured by boiling, cooled, and pipetted into the roller tube containing the GENEFILTERS® membrane and prehybridization solution. The now denatured probe-containing solution is mixed by vortexing. Hybridization can occur overnight, or alternatively for at least 12-18 hours, at 42°C.
Post-Hybridization Washes and Imaging. After hybridization, the filters are washed in the roller tube. The following wash conditions can be used: first and second washes were in 2x SSC/1 % SDS/50°C for 20 minutes; third wash was in 0.5x SSC/1 % SDS/55°C for 15 minutes. After washing, the membrane is wrapped in plastic wrap and placed in a phosphorimaging cassette. Filters are exposed to imaging screens for 2-4 hours (short exposure) and then an additional 24 hours (long exposure) and screens are scanned using a PHOSPHORIMAGER™ apparatus (Molecular Dynamics, Piscataway, New Jersey, United States of America). Data are normalized to yield an average intensity of 1.0 for each clone (4329 clones total) represented on the microarray. Reproducibility of the method can be established by performing replicate hybridizations to separate microarrays. Linear regression analysis can be employed to demonstrate that separate hybridizations yielded R2 values that are acceptably high. Different exposure lengths of identical filters also can produce high R2 values. EXAMPLE 8
Fluorescent Labeling of Nucleic Acids
A nucleic acid sample can be used as a template for direct incorporation of fluorescent nucleotide analogs (e.g., Cy3-dUTP and Cy5-dUTP, available from Amersham Pharmacia Biotech of Piscataway, New Jersey, United States of America) by a polymerization reaction. In brief, a 50 μl labeling reaction can contain 2 μg of template DNA, 5 μl of 1OX buffer, 1.5 μl of fluorescent dUTP,
0.5 μl each of dATP, dCTP, and dGTP, 1 μl of hexamers and decamers (i.e. primers, whether random or derived from a gene of interest), and 2 μl of Klenow (£. coli DNA polymerase 3' to 5' exo- from New England Biolabs of Beverly,
Massachusetts, United States of America).
EXAMPLE 9 Noncovalent Binding of Nucleic Acid Probes onto Glass PCR fragments are suspended in a solution of 3 to 5M NaSCN and spotted onto amino-silanized slides using a GMS 417™ arrayerfrom Affymetrix of Santa Clara, California, United States of America. After spotting, the slides are heated at 80°C for 2 hours to dehydrate the spots. Prior to hybridization, the slides are washed in isopropanol for 10 minutes, followed by washing in boiling water for 5 minutes. The washing steps remove any nucleic acid that is not bound tightly to the glass and help to reduce background created by redistribution of loosely attached DNA during hybridization. Contaminants such as detergents and carbohydrates should be minimized in the spotting solution. See also Maitra & Thakur (1992) Curr Sci 62:586-588; Maitra & Thakur (1994) Indian J Biochem Biophys 31 :97-99.
EXAMPLE 10
Hybridization to a Microarrav Comprising Gene-specific Probes
Labeled nucleic acids from the sample are prepared in a solution of 4X SSC buffer, 0.7 μg/μl tRNA, and 0.3% SDS to a total volume of 14.75 μl. The hybridization mixture is denatured at 980C for 2 minutes, cooled to 65°C, applied to the microarray, and covered with a 22-mm2 cover slip. The slide is placed in a waterproof hybridization chamber for hybridization in a 65°C water bath for 3 hours. Following hybridization, slides are washed in 1X SSC buffer with 0.06% SDS followed by 2 minutes in 0.06X SSC buffer.
It will be understood that various details of the presently disclosed subject matter may be changed without departing from the scope of the presently disclosed subject matter. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation.

Claims

CLAIMS What is claimed is:
1. A method for detecting a multiple sclerosis (MS) syndrome in a subject, the method comprising: (a) obtaining a biological sample from the subject;
(b) determining expression levels for one or more genes in the biological sample, wherein the two or more genes are selected from among the genes represented by SEQ ID NOs: 1-39; and
(c) comparing the expression levels of each of the one or more genes determined in step (b) with a standard, wherein the comparing detects a multiple sclerosis (MS) syndrome in the subject.
2. The method of claim 1 , wherein the comparing comprises:
(a) establishing an average expression level for each of the one or more genes in a population, wherein the population comprises statistically significant numbers of subjects with an MS syndrome and subjects that do not have an MS syndrome;
(b) assigning a first value to each gene for which the expression level in the subject is higher than the average expression level in the population and a second value to each gene for which the expression level in the subject is lower than the average expression level in the population; and
(c) adding the values assigned in step (b) to arrive at a sum, wherein the sum is indicative of a presence of an MS syndrome in the subject.
3. The method of claim 1 , wherein the comparing comprises calculating a ratio of the expression levels of the two or more genes represented by SEQ IS NOs. 1-9 to thereby detect the presence of a multiple sclerosis syndrome in the subject. 4. The method of claim 1 , wherein the multiple sclerosis syndrome is selected from among relapsing remitting multiple sclerosis (RRMS); secondary progressive multiple sclerosis (SPMS); primary progressive multiple sclerosis (PPMS); clinically isolated syndrome (CIS); and combinations thereof.
5. The method of claim 1 , wherein the biological sample is a cell present in whole blood or a fraction thereof isolated from the subject.
6. The method of claim 5, wherein the biological sample comprises a peripheral blood mononuclear cell. 7. The method of claim 1 , wherein the determining comprises a technique selected from the group consisting of a Northern blot, hybridization to a nucleic acid microarray, and a reverse transcription-polymerase chain reaction (RT-PCR).
8. The method of claim 7, wherein the RT-PCR is quantitative RT- PCR (Q-RT-PCR).
9. The method of claim 1 , wherein the determining is of the expression levels of at least three genes represented by SEQ ID NOs: 1-9.
10. The method of claim 9, wherein the determining is of the expression levels of at least four genes represented by SEQ ID NOs: 1-9. 11. The method of claim 10, wherein the determining is of the expression levels of at least five genes represented by SEQ ID NOs: 1-9.
12. The method of claim 1 , wherein the calculating comprises calculating a ratio using an equation selected from among:
(a) the expression level of a gene product represented by SEQ ID NO: 3 divided by the expression level of a gene product represented by SEQ ID NO: 7;
(b) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 3 divided by the expression level of a gene product represented by SEQ ID NO: 7 squared;
(c) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 multiplied by the expression level of a gene product represented by SEQ ID NO: 6 divided by one thousand times the expression level of a gene product represented by SEQ ID NO: 7 cubed; and
(d) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 squared multiplied by the expression level of a gene product represented by SEQ ID NO: 8 divided by the product of the expression level of a gene product represented by SEQ ID NO: 7 squared times the expression level of a gene product represented by SEQ ID NO: 9 squared.
13. A method of diagnosing a multiple sclerosis (MS) syndrome in a subject, the method comprising:
(a) providing an array comprising a plurality of nucleic acid sequences, wherein the plurality of nucleic acid sequences correspond to at least two of the gene products represented by
SEQ ID NOs: 1-9;
(b) providing a nucleic acid sample isolated from or generated from a biological sample from the subject;
(c) hybridizing the nucleic acid sample to the array; (d) detecting nucleic acids on the array to which the nucleic acid sample hybridizes;
(e) determining an expression level for each nucleic acid detected; and
(f) calculating a ratio of the expression levels of the two or more genes determined in step (e) to thereby detect the presence of a multiple sclerosis syndrome in the subject.
14. The method of claim 13, wherein the multiple sclerosis syndrome is selected from among relapsing remitting multiple sclerosis (RRMS); secondary progressive multiple sclerosis (SPMS); primary progressive multiple sclerosis (PPMS); clinically isolated syndrome (CIS); and combinations thereof.
15. The method of claim 13, wherein the array is selected from the group consisting of a microarray chip and a membrane-based filter array.
16. The method of claim 13, wherein the array comprises nucleic acid sequences that correspond to at least three genes represented by SEQ ID NOs: 1-9.
17. The method of claim 16, wherein the array comprises nucleic acid sequences that correspond to at least four genes represented by SEQ ID NOs: 1-9. 1b. I he method of claim 17, wherein the array comprises nucleic acid sequences that correspond to at least five genes represented by SEQ ID NOs: 1-9.
19. The method of claim 18, wherein the array comprises nucleic acid sequences that correspond to at least six genes represented by SEQ ID NOs:
1 -9.
20. The method of claim 19, wherein the array comprises nucleic acid sequences that correspond to the nine genes represented by SEQ ID NOs: 1 -9.
21. The method of claim 13, wherein the array comprises more than one identifying location for at least one of the gene products represented by
SEQ ID NOs: 1-9.
22. The method of claim 13, wherein the array further comprises at least one internal control gene.
23. The method of claim 13, wherein the biological sample comprises a cell present in whole blood or a fraction thereof isolated from the subject.
24. The method of claim 23, wherein the cell is a peripheral blood mononuclear cell.
25. The method of claim 13, wherein the determining comprises a technique selected from the group consisting of a Northern blot, hybridization to a nucleic acid microarray, and a reverse transcription-polymerase chain reaction (RT-PCR).
26. The method of claim 25, wherein the RT-PCR is quantitative RT- PCR (Q-RT-PCR).
27. The method of claim 13, wherein the determining is of the expression levels of at least two genes represented by SEQ ID NOs: 1-9.
28. The method of claim 27, wherein the determining is of the expression levels of at least three genes represented by SEQ ID NOs: 1-9.
29. The method of claim 28, wherein the determining is of the expression levels of at least four genes represented by SEQ ID NOs: 1-9. 30. The method of claim 29, wherein the determining is of the expression levels of at least five genes represented by SEQ ID NOs: 1-9.
31. The method of claim 30, wherein the determining is of the expression levels of at least six genes represented by SEQ ID NOs: 1-9.
32. The method of claim 31 , wherein the determining is of the expression levels of nine of the genes represented by SEQ ID NOs: 1-9.
33. The method of claim 13, wherein the calculating comprises calculating a ratio using an equation selected from among: (a) the expression level of a gene product represented by SEQ ID
NO: 3 divided by (the expression level of a gene product represented by SEQ ID NO: 7;
(b) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 3 divided by the expression level of a gene product represented by SEQ'ID NO: 7 squared;
(c) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 multiplied by the expression level of a gene product represented by SEQ ID NO: 6 divided by one thousand times the expression level of a gene product represented by SEQ ID NO: 7 cubed; and
(d) the expression level of a gene product represented by SEQ ID NO: 2 multiplied by the expression level of a gene product represented by SEQ ID NO: 5 squared multiplied by the expression level of a gene product represented by SEQ ID NO: 8 divided by the product of the expression level of a gene product represented by SEQ ID NO: 7 squared times the expression level of a gene product represented by SEQ ID NO: 9 squared. 34. A kit comprising a plurality of oligonucleotide primers and instructions for employing the plurality of oligonucleotide primers to determine the expression level of at least two of the genes represented by SEQ ID NOs: 1-9.
35. The kit of claim 34, comprising oligonucleotide primers to determine the expression level of at least three of the genes represented by SEQ ID NOs: 1-9.
36. The kit of claim 35, comprising oligonucleotide primers to determine the expression level of at least four of the genes represented by SEQ ID NOs: 1-9.
37. The kit of claim 36, comprising oligonucleotide primers to determine the expression level of at least five of the genes represented by SEQ
ID NOs: 1-9.
38. The kit of claim 37, comprising oligonucleotide primers to determine the expression level of at least six of the genes represented by SEQ ID NOs: 1-9. 39. The kit of claim 38, comprising oligonucleotide primers to determine the expression level of all nine of the genes represented by SEQ ID
NOs: 1-9.
40. The kit of claim 34, further comprising oligonucleotide primers to determine the expression level of a control gene. 41. A method for assigning an uncharacterized subject to one of two populations of subjects, the method comprising:
(a) acquiring an first input data set comprising a plurality of first gene expression levels, each of the plurality of first gene expression levels corresponding to an expression level of a gene product in a first population of subjects;
(b) acquiring a second input data set comprising a plurality of second gene expression levels, each of the plurality of second gene expression levels corresponding to an expression level of a gene product in a second population of subjects; (c) calculating a first deterministic series of ratios between and among various combinations of the first gene expression levels and a second deterministic series of ratios between and among various combinations of the second gene expression levels; and
(d) identifying one or more ratio values that differ in the first deterministic series of ratios from one or more related ratio values in the second deterministic series of ratios to a degree sufficient that the one or more ratios can be used predict whether an uncharacterized subject would be appropriately characterized as Dθing a member of the first population of subjects or the second population of subjects.
42. The method of claim 41 , wherein the first population is a population of subjects that do not have a multiple sclerosis syndrome and the second population is a population of subjects that do have a multiple sclerosis syndrome.
43. The method of claim 42, wherein the plurality of first gene expression levels and the plurality of second gene expression levels correspond to one or more of the genes represented by SEQ ID NOs: 1-9. 44. A computer program product comprising computer-executable instructions embodied in a computer-readable medium for performing steps comprising:
(a) acquiring an first input data set comprising a plurality of first gene expression levels, each of the plurality of first gene expression levels corresponding to an expression level of a gene product in a control population of subjects;
(b) acquiring a second input data set comprising a plurality of second gene expression levels, each of the plurality of second gene expression levels corresponding to an expression level of a gene product in a test population of subjects;
(c) calculating a first deterministic series of ratios between and among various combinations of the first gene expression levels and a second deterministic series of ratios between and among various combinations of the second gene expression levels; and (d) identifying one or more ratio values that differ in the first deterministic series of ratios from one ore more related ratio values in the second deterministic series of ratios to a degree sufficient that the one or more ratios can be used predict whether an uncharacterized subject would be appropriately characterized as being a member of the first population of subjects or the second population of subjects.
45. The computer program product of claim 44, wherein the first population is a population of subjects that do not have a multiple sclerosis syndrome and the second population is a population of subjects that do have a multiple sclerosis syndrome.
46. The computer program product of claim 45, wherein the plurality of first gene expression levels and the plurality of second gene expression levels correspond to one or more of the genes represented by SEQ ID NOs: 1- 9.
PCT/US2006/043272 2005-11-07 2006-11-07 Molecular diagnosis of autoimmune diseases WO2007056332A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US73436905P 2005-11-07 2005-11-07
US60/734,369 2005-11-07
US73613105P 2005-11-10 2005-11-10
US60/736,131 2005-11-10

Publications (2)

Publication Number Publication Date
WO2007056332A2 true WO2007056332A2 (en) 2007-05-18
WO2007056332A3 WO2007056332A3 (en) 2009-05-14

Family

ID=38023913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/043272 WO2007056332A2 (en) 2005-11-07 2006-11-07 Molecular diagnosis of autoimmune diseases

Country Status (1)

Country Link
WO (1) WO2007056332A2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010024405A1 (en) * 2008-08-28 2010-03-04 独立行政法人理化学研究所 Ifn type-1 production inhibitor and method for searching for same
EP2164991A1 (en) * 2007-05-25 2010-03-24 DioGenix Inc. Methods, systems, and kits for evaluating multiple sclerosis
EP2756103A4 (en) * 2011-09-12 2015-06-03 Univ Vanderbilt Characterizing multiple sclerosis
US9267945B2 (en) 2008-11-12 2016-02-23 Yeda Research And Development Co. Ltd. Diagnosis of multiple sclerosis
EP3443125A4 (en) * 2016-04-15 2020-06-24 Octave Bioscience, Inc. Methods for assessment of multiple sclerosis activity
US11053550B2 (en) 2014-10-14 2021-07-06 The University Of North Carolina At Chapel Hill Gene-expression based subtyping of pancreatic ductal adenocarcinoma
WO2022240597A1 (en) * 2021-05-12 2022-11-17 Genentech, Inc. Logic-based typing of multiple sclerosis subjects based on coded data
US12000003B2 (en) 2020-04-01 2024-06-04 The University Of North Carolina At Chapel Hill Platform and sample type independent single sample classifier for treatment decision making in pancreatic ductal adenocarcinoma cancer

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020068707A1 (en) * 1996-12-05 2002-06-06 Tetsuyoshi Ishiwata Iga nephropathy-related genes

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020068707A1 (en) * 1996-12-05 2002-06-06 Tetsuyoshi Ishiwata Iga nephropathy-related genes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ACHIRON ET AL. AMERICAN NEUROLOGICAL ASSOCIATON 2004, pages 410 - 417 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2164991A1 (en) * 2007-05-25 2010-03-24 DioGenix Inc. Methods, systems, and kits for evaluating multiple sclerosis
EP2164991A4 (en) * 2007-05-25 2011-02-23 Diogenix Inc Methods, systems, and kits for evaluating multiple sclerosis
WO2010024405A1 (en) * 2008-08-28 2010-03-04 独立行政法人理化学研究所 Ifn type-1 production inhibitor and method for searching for same
US9267945B2 (en) 2008-11-12 2016-02-23 Yeda Research And Development Co. Ltd. Diagnosis of multiple sclerosis
EP2756103A4 (en) * 2011-09-12 2015-06-03 Univ Vanderbilt Characterizing multiple sclerosis
US11053550B2 (en) 2014-10-14 2021-07-06 The University Of North Carolina At Chapel Hill Gene-expression based subtyping of pancreatic ductal adenocarcinoma
EP3443125A4 (en) * 2016-04-15 2020-06-24 Octave Bioscience, Inc. Methods for assessment of multiple sclerosis activity
US11773446B2 (en) 2016-04-15 2023-10-03 Octave Bioscience, Inc. Methods for assessment of multiple sclerosis activity
US12000003B2 (en) 2020-04-01 2024-06-04 The University Of North Carolina At Chapel Hill Platform and sample type independent single sample classifier for treatment decision making in pancreatic ductal adenocarcinoma cancer
WO2022240597A1 (en) * 2021-05-12 2022-11-17 Genentech, Inc. Logic-based typing of multiple sclerosis subjects based on coded data

Also Published As

Publication number Publication date
WO2007056332A3 (en) 2009-05-14

Similar Documents

Publication Publication Date Title
EP1511690A2 (en) Method for predicting autoimmune diseases
JP4606879B2 (en) Gene expression profiling of EGFR positive cancer
WO2007056332A2 (en) Molecular diagnosis of autoimmune diseases
WO2004097051A2 (en) Methods for diagnosing aml and mds differential gene expression
JP2006506093A5 (en)
WO2004071572A2 (en) Gene expression markers for response to egfr inhibitor drugs
EP1848818A1 (en) Identification of molecular diagnostic markers for endometriosis in blood lymphocytes
CN108368554B (en) Method for subtype typing diffuse large B-cell lymphoma (DLBCL)
US20070264635A1 (en) Probe, probe set and information acquisition method using the same
WO2010048415A1 (en) Methods of using jak3 genetic variants to diagnose and predict crohn&#39;s disease
US20130005597A1 (en) Methods and compositions for analysis of clear cell renal cell carcinoma (ccrcc)
WO2008148115A1 (en) Methods, systems, and kits for evaluating multiple sclerosis
WO2008079303A2 (en) Detection of organ rejection
TWI582105B (en) Probe, microarray, probe group, β-thalassemia detection kit, kit for detecting mutation of β-globin gene, evaluation method of probe pair in microarray for polymorphism detection and display program for distinguishing genotype
US20120004127A1 (en) Gene expression markers for colorectal cancer prognosis
KR20210038585A (en) Small RNA predictors for Alzheimer&#39;s disease
US20090203547A1 (en) Gene and Cognate Protein Profiles and Methods to Determine Connective Tissue Markers in Normal and Pathologic Conditions
US20070231791A1 (en) Gene Equation to Diagnose Rheumatoid Arthritis
EP1920069A1 (en) Differential expression gene profiles and applications in molecular staging of human gastric cancer
US20120264639A1 (en) Methods and compositions for predicting survival in subjects with cancer
EP2768976A2 (en) Methods for assessing endometrial receptivity of a patient after controlled ovarian hyperstimulation
US20110281750A1 (en) Identifying High Risk Clinically Isolated Syndrome Patients
EP3000897B1 (en) Colorectal cancer prognosis agent kit
US6716579B1 (en) Gene specific arrays, preparation and use
WO2015131095A1 (en) Methods and compositions for prognostic risk analysis of clear cell renal cell carcinoma

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06837015

Country of ref document: EP

Kind code of ref document: A2